Joint Learning of Deep Retrieval Model and Product Quantization based Embedding Index

Zhang, Han; Shen, Hongwei; Qiu, Yiming; Jiang, Yunjiang; Wang, Songlin; Xu, Sulong; Xiao, Yun; Long, Bo; Yang, Wen-Yun

doi:10.1145/3404835.3462988

Computer Science > Information Retrieval

arXiv:2105.03933 (cs)

[Submitted on 9 May 2021 (v1), last revised 28 May 2021 (this version, v3)]

Title:Joint Learning of Deep Retrieval Model and Product Quantization based Embedding Index

Authors:Han Zhang, Hongwei Shen, Yiming Qiu, Yunjiang Jiang, Songlin Wang, Sulong Xu, Yun Xiao, Bo Long, Wen-Yun Yang

View PDF

Abstract:Embedding index that enables fast approximate nearest neighbor(ANN) search, serves as an indispensable component for state-of-the-art deep retrieval systems. Traditional approaches, often separating the two steps of embedding learning and index building, incur additional indexing time and decayed retrieval accuracy. In this paper, we propose a novel method called Poeem, which stands for product quantization based embedding index jointly trained with deep retrieval model, to unify the two separate steps within an end-to-end training, by utilizing a few techniques including the gradient straight-through estimator, warm start strategy, optimal space decomposition and Givens rotation. Extensive experimental results show that the proposed method not only improves retrieval accuracy significantly but also reduces the indexing time to almost none. We have open sourced our approach for the sake of comparison and reproducibility.

Comments:	4 pages, 4 figures; accepted by SIGIR2021
Subjects:	Information Retrieval (cs.IR)
ACM classes:	H.3.3
Cite as:	arXiv:2105.03933 [cs.IR]
	(or arXiv:2105.03933v3 [cs.IR] for this version)
	https://doi.org/10.48550/arXiv.2105.03933
Related DOI:	https://doi.org/10.1145/3404835.3462988

Submission history

From: Han Zhang [view email]
[v1] Sun, 9 May 2021 13:17:31 UTC (5,168 KB)
[v2] Tue, 11 May 2021 13:09:21 UTC (5,168 KB)
[v3] Fri, 28 May 2021 08:32:08 UTC (5,168 KB)

Computer Science > Information Retrieval

Title:Joint Learning of Deep Retrieval Model and Product Quantization based Embedding Index

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Information Retrieval

Title:Joint Learning of Deep Retrieval Model and Product Quantization based Embedding Index

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators