FS-DETR: Few-Shot DEtection TRansformer with prompting and without re-training

Bulat, Adrian; Guerrero, Ricardo; Martinez, Brais; Tzimiropoulos, Georgios

Computer Science > Computer Vision and Pattern Recognition

arXiv:2210.04845 (cs)

[Submitted on 10 Oct 2022 (v1), last revised 20 Aug 2023 (this version, v2)]

Title:FS-DETR: Few-Shot DEtection TRansformer with prompting and without re-training

Authors:Adrian Bulat, Ricardo Guerrero, Brais Martinez, Georgios Tzimiropoulos

View PDF

Abstract:This paper is on Few-Shot Object Detection (FSOD), where given a few templates (examples) depicting a novel class (not seen during training), the goal is to detect all of its occurrences within a set of images. From a practical perspective, an FSOD system must fulfil the following desiderata: (a) it must be used as is, without requiring any fine-tuning at test time, (b) it must be able to process an arbitrary number of novel objects concurrently while supporting an arbitrary number of examples from each class and (c) it must achieve accuracy comparable to a closed system. Towards satisfying (a)-(c), in this work, we make the following contributions: We introduce, for the first time, a simple, yet powerful, few-shot detection transformer (FS-DETR) based on visual prompting that can address both desiderata (a) and (b). Our system builds upon the DETR framework, extending it based on two key ideas: (1) feed the provided visual templates of the novel classes as visual prompts during test time, and (2) ``stamp'' these prompts with pseudo-class embeddings (akin to soft prompting), which are then predicted at the output of the decoder. Importantly, we show that our system is not only more flexible than existing methods, but also, it makes a step towards satisfying desideratum (c). Specifically, it is significantly more accurate than all methods that do not require fine-tuning and even matches and outperforms the current state-of-the-art fine-tuning based methods on the most well-established benchmarks (PASCAL VOC & MSCOCO).

Comments:	Accepted at ICCV 2023
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2210.04845 [cs.CV]
	(or arXiv:2210.04845v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2210.04845

Submission history

From: Adrian Bulat [view email]
[v1] Mon, 10 Oct 2022 17:03:03 UTC (5,478 KB)
[v2] Sun, 20 Aug 2023 12:23:49 UTC (5,472 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:FS-DETR: Few-Shot DEtection TRansformer with prompting and without re-training

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:FS-DETR: Few-Shot DEtection TRansformer with prompting and without re-training

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators