Computer Science > Computer Vision and Pattern Recognition
[Submitted on 24 Nov 2021 (v1), last revised 5 Jan 2022 (this version, v2)]
Title:Two-stage Rule-induction Visual Reasoning on RPMs with an Application to Video Prediction
View PDFAbstract:Raven's Progressive Matrices (RPMs) are frequently used in evaluating human's visual reasoning ability. Researchers have made considerable efforts in developing systems to automatically solve the RPM problem, often through a black-box end-to-end convolutional neural network for both visual recognition and logical reasoning tasks. Based on the two intrinsic natures of RPM problem, visual recognition and logical reasoning, we propose a Two-stage Rule-Induction Visual Reasoner (TRIVR), which consists of a perception module and a reasoning module, to tackle the challenges of real-world visual recognition and subsequent logical reasoning tasks, respectively. For the reasoning module, we further propose a "2+1" formulation that models human's thinking in solving RPMs and significantly reduces the model complexity. It derives a reasoning rule from each RPM sample, which is not feasible for existing methods. As a result, the proposed reasoning module is capable of yielding a set of reasoning rules modeling human in solving the RPM problems. To validate the proposed method on real-world applications, an RPM-like Video Prediction (RVP) dataset is constructed, where visual reasoning is conducted on RPMs constructed using real-world video frames. Experimental results on various RPM-like datasets demonstrate that the proposed TRIVR achieves a significant and consistent performance gain compared with the state-of-the-art models.
Submission history
From: Wentao He [view email][v1] Wed, 24 Nov 2021 06:51:38 UTC (25,254 KB)
[v2] Wed, 5 Jan 2022 04:40:43 UTC (45,504 KB)
References & Citations
Bibliographic and Citation Tools
Bibliographic Explorer (What is the Explorer?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Code, Data and Media Associated with this Article
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower (What are Influence Flowers?)
Connected Papers (What is Connected Papers?)
CORE Recommender (What is CORE?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.