HAT: Hierarchical Aggregation Transformers for Person Re-identification

Zhang, Guowen; Zhang, Pingping; Qi, Jinqing; Lu, Huchuan

Computer Science > Computer Vision and Pattern Recognition

arXiv:2107.05946 (cs)

[Submitted on 13 Jul 2021 (v1), last revised 14 Jul 2021 (this version, v2)]

Title:HAT: Hierarchical Aggregation Transformers for Person Re-identification

Authors:Guowen Zhang, Pingping Zhang, Jinqing Qi, Huchuan Lu

View PDF

Abstract:Recently, with the advance of deep Convolutional Neural Networks (CNNs), person Re-Identification (Re-ID) has witnessed great success in various applications. However, with limited receptive fields of CNNs, it is still challenging to extract discriminative representations in a global view for persons under non-overlapped cameras. Meanwhile, Transformers demonstrate strong abilities of modeling long-range dependencies for spatial and sequential data. In this work, we take advantages of both CNNs and Transformers, and propose a novel learning framework named Hierarchical Aggregation Transformer (HAT) for image-based person Re-ID with high performance. To achieve this goal, we first propose a Deeply Supervised Aggregation (DSA) to recurrently aggregate hierarchical features from CNN backbones. With multi-granularity supervisions, the DSA can enhance multi-scale features for person retrieval, which is very different from previous methods. Then, we introduce a Transformer-based Feature Calibration (TFC) to integrate low-level detail information as the global prior for high-level semantic information. The proposed TFC is inserted to each level of hierarchical features, resulting in great performance improvements. To our best knowledge, this work is the first to take advantages of both CNNs and Transformers for image-based person Re-ID. Comprehensive experiments on four large-scale Re-ID benchmarks demonstrate that our method shows better results than several state-of-the-art methods. The code is released at this https URL.

Comments:	This work has been accepted by ACM International Conference on Multimedia 2021
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2107.05946 [cs.CV]
	(or arXiv:2107.05946v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2107.05946

Submission history

From: Pingping Zhang Dr [view email]
[v1] Tue, 13 Jul 2021 09:34:54 UTC (5,846 KB)
[v2] Wed, 14 Jul 2021 01:42:35 UTC (5,846 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:HAT: Hierarchical Aggregation Transformers for Person Re-identification

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:HAT: Hierarchical Aggregation Transformers for Person Re-identification

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators