VRP-SAM: SAM with Visual Reference Prompt

Sun, Yanpeng; Chen, Jiahui; Zhang, Shan; Zhang, Xinyu; Chen, Qiang; Zhang, Gang; Ding, Errui; Wang, Jingdong; Li, Zechao

Computer Science > Computer Vision and Pattern Recognition

arXiv:2402.17726 (cs)

[Submitted on 27 Feb 2024 (v1), last revised 30 Mar 2024 (this version, v3)]

Title:VRP-SAM: SAM with Visual Reference Prompt

Authors:Yanpeng Sun, Jiahui Chen, Shan Zhang, Xinyu Zhang, Qiang Chen, Gang Zhang, Errui Ding, Jingdong Wang, Zechao Li

View PDF HTML (experimental)

Abstract:In this paper, we propose a novel Visual Reference Prompt (VRP) encoder that empowers the Segment Anything Model (SAM) to utilize annotated reference images as prompts for segmentation, creating the VRP-SAM model. In essence, VRP-SAM can utilize annotated reference images to comprehend specific objects and perform segmentation of specific objects in target image. It is note that the VRP encoder can support a variety of annotation formats for reference images, including \textbf{point}, \textbf{box}, \textbf{scribble}, and \textbf{mask}. VRP-SAM achieves a breakthrough within the SAM framework by extending its versatility and applicability while preserving SAM's inherent strengths, thus enhancing user-friendliness. To enhance the generalization ability of VRP-SAM, the VRP encoder adopts a meta-learning strategy. To validate the effectiveness of VRP-SAM, we conducted extensive empirical studies on the Pascal and COCO datasets. Remarkably, VRP-SAM achieved state-of-the-art performance in visual reference segmentation with minimal learnable parameters. Furthermore, VRP-SAM demonstrates strong generalization capabilities, allowing it to perform segmentation of unseen objects and enabling cross-domain segmentation. The source code and models will be available at \url{this https URL}

Comments:	Accepted by CVPR 2024; The camera-ready version
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2402.17726 [cs.CV]
	(or arXiv:2402.17726v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2402.17726

Submission history

From: Yanpeng Sun [view email]
[v1] Tue, 27 Feb 2024 17:58:09 UTC (5,418 KB)
[v2] Tue, 26 Mar 2024 08:38:52 UTC (5,423 KB)
[v3] Sat, 30 Mar 2024 09:35:47 UTC (5,423 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:VRP-SAM: SAM with Visual Reference Prompt

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:VRP-SAM: SAM with Visual Reference Prompt

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators