Google Scholar

Structured scene memory for vision-language navigation

H Wang, W Wang, W Liang… - Proceedings of the …, 2021 - openaccess.thecvf.com

H Wang, W Wang, W Liang, C Xiong, J Shen

Proceedings of the IEEE/CVF conference on Computer Vision and …, 2021•openaccess.thecvf.com

Abstract

Recently, numerous algorithms have been developed to tackle the problem of vision-language navigation (VLN), ie, entailing an agent to navigate 3D environments through following linguistic instructions. However, current VLN agents simply store their past experiences/observations as latent states in recurrent networks, failing to capture environment layouts and make long-term planning. To address these limitations, we propose a crucial architecture, called Structured Scene Memory (SSM). It is compartmentalized enough to accurately memorize the percepts during navigation. It also serves as a structured scene representation, which captures and disentangles visual and geometric cues in the environment. SSM has a collect-read controller that adaptively collects information for supporting current decision making and mimics iterative algorithms for long-range reasoning. As SSM provides a complete action space, ie, all the navigable places on the map, a frontier-exploration based navigation decision making strategy is introduced to enable efficient and global planning. Experiment results on two VLN datasets (ie, R2R and R4R) show that our method achieves state-of-the-art performance on several metrics.

openaccess.thecvf.com

Show moreShow less

Save Cite Cited by 127 Related articles All 12 versions View as HTML

Showing the best result for this search. See all results

Cite

Advanced search

Saved to My library

Structured scene memory for vision-language navigation