MedCLIP: Contrastive Learning from Unpaired Medical Images and Text

Wang, Zifeng; Wu, Zhenbang; Agarwal, Dinesh; Sun, Jimeng

Computer Science > Computer Vision and Pattern Recognition

arXiv:2210.10163 (cs)

[Submitted on 18 Oct 2022]

Title:MedCLIP: Contrastive Learning from Unpaired Medical Images and Text

Authors:Zifeng Wang, Zhenbang Wu, Dinesh Agarwal, Jimeng Sun

View PDF

Abstract:Existing vision-text contrastive learning like CLIP aims to match the paired image and caption embeddings while pushing others apart, which improves representation transferability and supports zero-shot prediction. However, medical image-text datasets are orders of magnitude below the general images and captions from the internet. Moreover, previous methods encounter many false negatives, i.e., images and reports from separate patients probably carry the same semantics but are wrongly treated as negatives. In this paper, we decouple images and texts for multimodal contrastive learning thus scaling the usable training data in a combinatorial magnitude with low cost. We also propose to replace the InfoNCE loss with semantic matching loss based on medical knowledge to eliminate false negatives in contrastive learning. We prove that MedCLIP is a simple yet effective framework: it outperforms state-of-the-art methods on zero-shot prediction, supervised classification, and image-text retrieval. Surprisingly, we observe that with only 20K pre-training data, MedCLIP wins over the state-of-the-art method (using around 200K data). Our code is available at this https URL.

Comments:	EMNLP 2022
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
Cite as:	arXiv:2210.10163 [cs.CV]
	(or arXiv:2210.10163v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2210.10163

Submission history

From: Zifeng Wang [view email]
[v1] Tue, 18 Oct 2022 21:06:29 UTC (1,154 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:MedCLIP: Contrastive Learning from Unpaired Medical Images and Text

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:MedCLIP: Contrastive Learning from Unpaired Medical Images and Text

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators