arxiv arXiv cs.LG · 7d ago · research

OneCanvas: 3D Scene Understanding via Panoramic Reprojection

from English

OneCanvas enables 3D scene understanding in Vision-Language Models by aggregating patch features onto a single panoramic canvas using 3D world coordinates. It achieves state-of-the-art performance on SQA3D and VSI-Bench, and generalizes to out-of-distribution data on SPBench, using significantly less training compute than existing methods.

Importance 3/3 Beats a top-lab benchmark New feature vs. leaders arXiv cs.LG Google DeepMind Allen AI Hugging Face Evaluation & benchmarks Multimodal Reasoning models

Read original