×
Sep 4, 2024 · A large-scale multi-modal situated reasoning dataset, scalably collected leveraging 3D scene graphs and vision-language models (VLMs) across a diverse range of ...
We introduce a novel interleaved multi-modal input setting in our benchmark to provide both texts, images, and point clouds for situation and question ...
People also ask
Sep 4, 2024 · The questions with accurate and unique answers such as existence and counting can be scored according to the 3D scene, situation and question.
Poster. Multi-modal Situated Reasoning in 3D Scenes. Xiongkun Linghu · Xuesong Niu · Jiangyong Huang · Xiaojian (Shawn) Ma · Baoxiong Jia · Siyuan Huang.
SQA3D imposes a significant challenge to current multi-modal especially 3D reasoning models. We evaluate various state-of-the-art approaches and find that the ...
Sep 4, 2024 · This paper explores multi-modal situated reasoning in 3D scenes, which involves understanding and reasoning about the spatial and semantic ...
Sep 5, 2024 · Situation awareness is essential for understanding and reasoning about 3D scenes in embodied AI agents. However, existing datasets and ...
The research introduces a new large dataset called Multi-modal Situated Question Answering (MSQA) for teaching AI how to understand and navigate 3D scenes, ...
SQA3D imposes a significant challenge to current multi-modal especially 3D reasoning models. We evaluate various state-of-the-art approaches and find that the ...