CLEVR-Ref+: Diagnosing Visual Reasoning with Referring Expressions

Liu, Runtao; Liu, Chenxi; Bai, Yutong; Yuille, Alan

Computer Science > Computer Vision and Pattern Recognition

arXiv:1901.00850 (cs)

[Submitted on 3 Jan 2019 (v1), last revised 6 Apr 2019 (this version, v2)]

Title:CLEVR-Ref+: Diagnosing Visual Reasoning with Referring Expressions

Authors:Runtao Liu, Chenxi Liu, Yutong Bai, Alan Yuille

View PDF

Abstract:Referring object detection and referring image segmentation are important tasks that require joint understanding of visual information and natural language. Yet there has been evidence that current benchmark datasets suffer from bias, and current state-of-the-art models cannot be easily evaluated on their intermediate reasoning process. To address these issues and complement similar efforts in visual question answering, we build CLEVR-Ref+, a synthetic diagnostic dataset for referring expression comprehension. The precise locations and attributes of the objects are readily available, and the referring expressions are automatically associated with functional programs. The synthetic nature allows control over dataset bias (through sampling strategy), and the modular programs enable intermediate reasoning ground truth without human annotators.
In addition to evaluating several state-of-the-art models on CLEVR-Ref+, we also propose IEP-Ref, a module network approach that significantly outperforms other models on our dataset. In particular, we present two interesting and important findings using IEP-Ref: (1) the module trained to transform feature maps into segmentation masks can be attached to any intermediate module to reveal the entire reasoning process step-by-step; (2) even if all training data has at least one object referred, IEP-Ref can correctly predict no-foreground when presented with false-premise referring expressions. To the best of our knowledge, this is the first direct and quantitative proof that neural modules behave in the way they are intended.

Comments:	To appear in CVPR 2019. All data and code concerning CLEVR-Ref+ and IEP-Ref have been released at this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:1901.00850 [cs.CV]
	(or arXiv:1901.00850v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1901.00850

Submission history

From: Chenxi Liu [view email]
[v1] Thu, 3 Jan 2019 18:58:06 UTC (5,178 KB)
[v2] Sat, 6 Apr 2019 19:59:25 UTC (5,196 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:CLEVR-Ref+: Diagnosing Visual Reasoning with Referring Expressions

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:CLEVR-Ref+: Diagnosing Visual Reasoning with Referring Expressions

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators