arxiv arXiv cs.AI · 7d ago · research

OneCanvas: 3D Scene Understanding via Panoramic Reprojection

from English

OneCanvas enables 3D scene understanding in Vision-Language Models by aggregating patch features onto a panoramic canvas using 3D world coordinates. It achieves state-of-the-art results on SQA3D and VSI-Bench, with strong generalization on SPBench, using significantly less training compute than prior methods.

Importance 3/3 Beats a top-lab benchmark New feature vs. leaders arXiv cs.AI Google DeepMind Mistral AI OpenAI AI agents Multimodal Reasoning models

Read original