MAKE: Vision-Language Pre-training based Product Retrieval in Taobao Search

Zheng, Xiaoyang; Wang, Zilong; Xu, Ke; Li, Sen; Zhuang, Tao; Liu, Qingwen; Zeng, Xiaoyi

doi:10.1145/3543873.3584627

Computer Science > Information Retrieval

arXiv:2301.12646 (cs)

[Submitted on 30 Jan 2023 (v1), last revised 18 Feb 2023 (this version, v2)]

Title:MAKE: Vision-Language Pre-training based Product Retrieval in Taobao Search

Authors:Xiaoyang Zheng, Zilong Wang, Ke Xu, Sen Li, Tao Zhuang, Qingwen Liu, Xiaoyi Zeng

View PDF

Abstract:Taobao Search consists of two phases: the retrieval phase and the ranking phase. Given a user query, the retrieval phase returns a subset of candidate products for the following ranking phase. Recently, the paradigm of pre-training and fine-tuning has shown its potential in incorporating visual clues into retrieval tasks. In this paper, we focus on solving the problem of text-to-multimodal retrieval in Taobao Search. We consider that users' attention on titles or images varies on products. Hence, we propose a novel Modal Adaptation module for cross-modal fusion, which helps assigns appropriate weights on texts and images across products. Furthermore, in e-commerce search, user queries tend to be brief and thus lead to significant semantic imbalance between user queries and product titles. Therefore, we design a separate text encoder and a Keyword Enhancement mechanism to enrich the query representations and improve text-to-multimodal matching. To this end, we present a novel vision-language (V+L) pre-training methods to exploit the multimodal information of (user query, product title, product image). Extensive experiments demonstrate that our retrieval-specific pre-training model (referred to as MAKE) outperforms existing V+L pre-training methods on the text-to-multimodal retrieval task. MAKE has been deployed online and brings major improvements on the retrieval system of Taobao Search.

Comments:	5 pages, accepted to The Industry Track of the Web Conference 2023
Subjects:	Information Retrieval (cs.IR)
Cite as:	arXiv:2301.12646 [cs.IR]
	(or arXiv:2301.12646v2 [cs.IR] for this version)
	https://doi.org/10.48550/arXiv.2301.12646
Related DOI:	https://doi.org/10.1145/3543873.3584627

Submission history

From: Xiaoyang Zheng [view email]
[v1] Mon, 30 Jan 2023 03:59:36 UTC (2,283 KB)
[v2] Sat, 18 Feb 2023 09:30:47 UTC (2,283 KB)

Computer Science > Information Retrieval

Title:MAKE: Vision-Language Pre-training based Product Retrieval in Taobao Search

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Information Retrieval

Title:MAKE: Vision-Language Pre-training based Product Retrieval in Taobao Search

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators