QueryMatch: A Query-based Contrastive Learning Framework for Weakly Supervised Visual Grounding

S Chen, G Luo, Y Zhou, X Sun, G Jiang… - Proceedings of the 32nd …, 2024 - dl.acm.org
Proceedings of the 32nd ACM International Conference on Multimedia, 2024dl.acm.org
Visual grounding is a task of locating the object referred by a natural language description.
To reduce annotation costs, recent researchers are devoted into one-stage weakly
supervised methods for visual grounding, which typically adopt the anchor-text matching
paradigm. Despite the efficiency, we identify that anchor representations are often noisy and
insufficient to describe object information, which inevitably hinders the vision-language
alignments. In this paper, we propose a novel query-based one-stage framework for weakly …
Visual grounding is a task of locating the object referred by a natural language description. To reduce annotation costs, recent researchers are devoted into one-stage weakly supervised methods for visual grounding, which typically adopt the anchor-text matching paradigm. Despite the efficiency, we identify that anchor representations are often noisy and insufficient to describe object information, which inevitably hinders the vision-language alignments. In this paper, we propose a novel query-based one-stage framework for weakly supervised visual grounding, namely QueryMatch. Different from previous work, QueryMatch represents candidate objects with a set of query features, which inherently establish accurate one-to-one associations with visual objects. In this case, QueryMatch re-formulates weakly supervised visual grounding as a query-text matching problem, which can be optimized via the query-based contrastive learning. Based on QueryMatch, we further propose an innovative strategy for effective weakly supervised learning, namely Active Query Selection (AQS). In particular, AQS aims to enhance the effectiveness of query-based contrastive learning by actively selecting high-quality query features. Through this strategy, AQS can greatly benefit the weakly supervised learning of QueryMatch. To validate our approach, we conduct extensive experiments on three benchmark datasets of two grounding tasks, i.e., referring expression comprehension (REC) and segmentation (RES). Experimental results not only show the state-of-art performance of QueryMatch in two tasks, e.g., over +5% [email protected] on RefCOCO in REC and over +20% mIOU on RefCOCO in RES, but also confirm the effectiveness of AQS in weakly supervised learning. Source codes are available at https://github.com/TensorThinker/QueryMatch.
ACM Digital Library
Showing the best result for this search. See all results