Structured Scene Memory for Vision-Language Navigation

Wang, Hanqing; Wang, Wenguan; Liang, Wei; Xiong, Caiming; Shen, Jianbing

Computer Science > Computer Vision and Pattern Recognition

arXiv:2103.03454 (cs)

[Submitted on 5 Mar 2021]

Title:Structured Scene Memory for Vision-Language Navigation

Authors:Hanqing Wang, Wenguan Wang, Wei Liang, Caiming Xiong, Jianbing Shen

View PDF

Abstract:Recently, numerous algorithms have been developed to tackle the problem of vision-language navigation (VLN), i.e., entailing an agent to navigate 3D environments through following linguistic instructions. However, current VLN agents simply store their past experiences/observations as latent states in recurrent networks, failing to capture environment layouts and make long-term planning. To address these limitations, we propose a crucial architecture, called Structured Scene Memory (SSM). It is compartmentalized enough to accurately memorize the percepts during navigation. It also serves as a structured scene representation, which captures and disentangles visual and geometric cues in the environment. SSM has a collect-read controller that adaptively collects information for supporting current decision making and mimics iterative algorithms for long-range reasoning. As SSM provides a complete action space, i.e., all the navigable places on the map, a frontier-exploration based navigation decision making strategy is introduced to enable efficient and global planning. Experiment results on two VLN datasets (i.e., R2R and R4R) show that our method achieves state-of-the-art performance on several metrics.

Comments:	Accepted on CVPR2021; Implementation will be available at this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2103.03454 [cs.CV]
	(or arXiv:2103.03454v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2103.03454

Submission history

From: Wenguan Wang [view email]
[v1] Fri, 5 Mar 2021 03:41:00 UTC (14,259 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CV

< prev | next >

new | recent | 2021-03

Change to browse by:

cs
cs.AI

References & Citations

DBLP - CS Bibliography

listing | bibtex

Hanqing Wang
Wenguan Wang
Wei Liang
Caiming Xiong
Jianbing Shen

export BibTeX citation

Computer Science > Computer Vision and Pattern Recognition

Title:Structured Scene Memory for Vision-Language Navigation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Structured Scene Memory for Vision-Language Navigation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators