Deep Learning for Visual Tracking: A Comprehensive Survey

Marvasti-Zadeh, Seyed Mojtaba; Cheng, Li; Ghanei-Yakhdan, Hossein; Kasaei, Shohreh

doi:10.1109/TITS.2020.3046478

Computer Science > Computer Vision and Pattern Recognition

arXiv:1912.00535 (cs)

[Submitted on 2 Dec 2019 (v1), last revised 26 Jan 2021 (this version, v2)]

Title:Deep Learning for Visual Tracking: A Comprehensive Survey

Authors:Seyed Mojtaba Marvasti-Zadeh, Li Cheng, Hossein Ghanei-Yakhdan, Shohreh Kasaei

View PDF

Abstract:Visual target tracking is one of the most sought-after yet challenging research topics in computer vision. Given the ill-posed nature of the problem and its popularity in a broad range of real-world scenarios, a number of large-scale benchmark datasets have been established, on which considerable methods have been developed and demonstrated with significant progress in recent years -- predominantly by recent deep learning (DL)-based methods. This survey aims to systematically investigate the current DL-based visual tracking methods, benchmark datasets, and evaluation metrics. It also extensively evaluates and analyzes the leading visual tracking methods. First, the fundamental characteristics, primary motivations, and contributions of DL-based methods are summarized from nine key aspects of: network architecture, network exploitation, network training for visual tracking, network objective, network output, exploitation of correlation filter advantages, aerial-view tracking, long-term tracking, and online tracking. Second, popular visual tracking benchmarks and their respective properties are compared, and their evaluation metrics are summarized. Third, the state-of-the-art DL-based methods are comprehensively examined on a set of well-established benchmarks of OTB2013, OTB2015, VOT2018, LaSOT, UAV123, UAVDT, and VisDrone2019. Finally, by conducting critical analyses of these state-of-the-art trackers quantitatively and qualitatively, their pros and cons under various common scenarios are investigated. It may serve as a gentle use guide for practitioners to weigh when and under what conditions to choose which method(s). It also facilitates a discussion on ongoing issues and sheds light on promising research directions.

Comments:	Accepted Manuscript in IEEE Transactions on Intelligent Transportation Systems
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Cite as:	arXiv:1912.00535 [cs.CV]
	(or arXiv:1912.00535v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1912.00535
Related DOI:	https://doi.org/10.1109/TITS.2020.3046478

Submission history

From: Seyed Mojtaba Marvasti-Zadeh [view email]
[v1] Mon, 2 Dec 2019 01:05:54 UTC (7,601 KB)
[v2] Tue, 26 Jan 2021 09:05:50 UTC (8,713 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Deep Learning for Visual Tracking: A Comprehensive Survey

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Deep Learning for Visual Tracking: A Comprehensive Survey

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators