What is Learned in Visually Grounded Neural Syntax Acquisition

Noriyuki Kojima, Hadar Averbuch-Elor, Alexander Rush, Yoav Artzi


Abstract
Visual features are a promising signal for learning bootstrap textual models. However, blackbox learning models make it difficult to isolate the specific contribution of visual components. In this analysis, we consider the case study of the Visually Grounded Neural Syntax Learner (Shi et al., 2019), a recent approach for learning syntax from a visual training signal. By constructing simplified versions of the model, we isolate the core factors that yield the model’s strong performance. Contrary to what the model might be capable of learning, we find significantly less expressive versions produce similar predictions and perform just as well, or even better. We also find that a simple lexical signal of noun concreteness plays the main role in the model’s predictions as opposed to more complex syntactic reasoning.
Anthology ID:
2020.acl-main.234
Volume:
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Month:
July
Year:
2020
Address:
Online
Editors:
Dan Jurafsky, Joyce Chai, Natalie Schluter, Joel Tetreault
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2615–2635
Language:
URL:
https://aclanthology.org/2020.acl-main.234
DOI:
10.18653/v1/2020.acl-main.234
Bibkey:
Cite (ACL):
Noriyuki Kojima, Hadar Averbuch-Elor, Alexander Rush, and Yoav Artzi. 2020. What is Learned in Visually Grounded Neural Syntax Acquisition. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 2615–2635, Online. Association for Computational Linguistics.
Cite (Informal):
What is Learned in Visually Grounded Neural Syntax Acquisition (Kojima et al., ACL 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.acl-main.234.pdf
Video:
 http://slideslive.com/38929126
Code
 lil-lab/vgnsl_analysis
Data
MS COCO