Mining the Benefits of Two-stage and One-stage HOI Detection

Zhang, Aixi; Liao, Yue; Liu, Si; Lu, Miao; Wang, Yongliang; Gao, Chen; Li, Xiaobo

Computer Science > Computer Vision and Pattern Recognition

arXiv:2108.05077 (cs)

[Submitted on 11 Aug 2021 (v1), last revised 13 Oct 2021 (this version, v2)]

Title:Mining the Benefits of Two-stage and One-stage HOI Detection

Authors:Aixi Zhang, Yue Liao, Si Liu, Miao Lu, Yongliang Wang, Chen Gao, Xiaobo Li

View PDF

Abstract:Two-stage methods have dominated Human-Object Interaction (HOI) detection for several years. Recently, one-stage HOI detection methods have become popular. In this paper, we aim to explore the essential pros and cons of two-stage and one-stage methods. With this as the goal, we find that conventional two-stage methods mainly suffer from positioning positive interactive human-object pairs, while one-stage methods are challenging to make an appropriate trade-off on multi-task learning, i.e., object detection, and interaction classification. Therefore, a core problem is how to take the essence and discard the dregs from the conventional two types of methods. To this end, we propose a novel one-stage framework with disentangling human-object detection and interaction classification in a cascade manner. In detail, we first design a human-object pair generator based on a state-of-the-art one-stage HOI detector by removing the interaction classification module or head and then design a relatively isolated interaction classifier to classify each human-object pair. Two cascade decoders in our proposed framework can focus on one specific task, detection or interaction classification. In terms of the specific implementation, we adopt a transformer-based HOI detector as our base model. The newly introduced disentangling paradigm outperforms existing methods by a large margin, with a significant relative mAP gain of 9.32% on HICO-Det. The source codes are available at this https URL.

Comments:	Accepted by NeurIPS 2021
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2108.05077 [cs.CV]
	(or arXiv:2108.05077v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2108.05077

Submission history

From: Yue Liao [view email]
[v1] Wed, 11 Aug 2021 07:38:09 UTC (4,307 KB)
[v2] Wed, 13 Oct 2021 06:09:22 UTC (4,050 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Mining the Benefits of Two-stage and One-stage HOI Detection

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Mining the Benefits of Two-stage and One-stage HOI Detection

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators