PTR: A Benchmark for Part-based Conceptual, Relational, and Physical Reasoning

Hong, Yining; Yi, Li; Tenenbaum, Joshua B.; Torralba, Antonio; Gan, Chuang

Computer Science > Computer Vision and Pattern Recognition

arXiv:2112.05136 (cs)

[Submitted on 9 Dec 2021]

Title:PTR: A Benchmark for Part-based Conceptual, Relational, and Physical Reasoning

Authors:Yining Hong, Li Yi, Joshua B. Tenenbaum, Antonio Torralba, Chuang Gan

View PDF

Abstract:A critical aspect of human visual perception is the ability to parse visual scenes into individual objects and further into object parts, forming part-whole hierarchies. Such composite structures could induce a rich set of semantic concepts and relations, thus playing an important role in the interpretation and organization of visual signals as well as for the generalization of visual perception and reasoning. However, existing visual reasoning benchmarks mostly focus on objects rather than parts. Visual reasoning based on the full part-whole hierarchy is much more challenging than object-centric reasoning due to finer-grained concepts, richer geometry relations, and more complex physics. Therefore, to better serve for part-based conceptual, relational and physical reasoning, we introduce a new large-scale diagnostic visual reasoning dataset named PTR. PTR contains around 70k RGBD synthetic images with ground truth object and part level annotations regarding semantic instance segmentation, color attributes, spatial and geometric relationships, and certain physical properties such as stability. These images are paired with 700k machine-generated questions covering various types of reasoning types, making them a good testbed for visual reasoning models. We examine several state-of-the-art visual reasoning models on this dataset and observe that they still make many surprising mistakes in situations where humans can easily infer the correct answer. We believe this dataset will open up new opportunities for part-based reasoning.

Comments:	NeurIPS 2021. Project page: this http URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2112.05136 [cs.CV]
	(or arXiv:2112.05136v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2112.05136

Submission history

From: Chuang Gan [view email]
[v1] Thu, 9 Dec 2021 18:59:34 UTC (3,514 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:PTR: A Benchmark for Part-based Conceptual, Relational, and Physical Reasoning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:PTR: A Benchmark for Part-based Conceptual, Relational, and Physical Reasoning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators