Tell, Don't Show!: Language Guidance Eases Transfer Across Domains in Images and Videos

Kalluri, Tarun; Majumder, Bodhisattwa Prasad; Chandraker, Manmohan

Computer Science > Computer Vision and Pattern Recognition

arXiv:2403.05535 (cs)

[Submitted on 8 Mar 2024 (v1), last revised 6 Jun 2024 (this version, v3)]

Title:Tell, Don't Show!: Language Guidance Eases Transfer Across Domains in Images and Videos

Authors:Tarun Kalluri, Bodhisattwa Prasad Majumder, Manmohan Chandraker

View PDF HTML (experimental)

Abstract:We introduce LaGTran, a novel framework that utilizes text supervision to guide robust transfer of discriminative knowledge from labeled source to unlabeled target data with domain gaps. While unsupervised adaptation methods have been established to address this problem, they show limitations in handling challenging domain shifts due to their exclusive operation within the pixel-space. Motivated by our observation that semantically richer text modality has more favorable transfer properties, we devise a transfer mechanism to use a source-trained text-classifier to generate predictions on the target text descriptions, and utilize these predictions as supervision for the corresponding images. Our approach driven by language guidance is surprisingly easy and simple, yet significantly outperforms all prior approaches on challenging datasets like GeoNet and DomainNet, validating its extreme effectiveness. To further extend the scope of our study beyond images, we introduce a new benchmark called Ego2Exo to study ego-exo transfer in videos and find that our language-aided approach LaGTran yields significant gains in this highly challenging and non-trivial transfer setting. Code, models, and proposed datasets are publicly available at this https URL.

Comments:	ICML 2024 Camera-Ready. Project Page and Code: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:2403.05535 [cs.CV]
	(or arXiv:2403.05535v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2403.05535

Submission history

From: Tarun Kalluri [view email]
[v1] Fri, 8 Mar 2024 18:58:46 UTC (6,368 KB)
[v2] Tue, 14 May 2024 03:20:55 UTC (6,360 KB)
[v3] Thu, 6 Jun 2024 01:44:48 UTC (6,375 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Tell, Don't Show!: Language Guidance Eases Transfer Across Domains in Images and Videos

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Tell, Don't Show!: Language Guidance Eases Transfer Across Domains in Images and Videos

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators