[PDF][PDF] Human Object Interaction Detection Primed with Context.

M Antoun, DC Asmar - VISIGRAPP (5: VISAPP), 2023 - scitepress.org
VISIGRAPP (5: VISAPP), 2023scitepress.org
Recognizing Human-Object Interaction (HOI) in images is a difficult yet fundamental
requirement for scene understanding. Despite the significant advances deep learning has
achieved so far in this field, the performance of state of the art HOI detection systems is still
very low. Contextual information about the scene has shown improvement in the prediction.
However, most works that use semantic features rely on general word embedding models to
represent the objects or the actions rather than contextual embedding. Motivated by …
Abstract
Recognizing Human-Object Interaction (HOI) in images is a difficult yet fundamental requirement for scene understanding. Despite the significant advances deep learning has achieved so far in this field, the performance of state of the art HOI detection systems is still very low. Contextual information about the scene has shown improvement in the prediction. However, most works that use semantic features rely on general word embedding models to represent the objects or the actions rather than contextual embedding. Motivated by evidence from the field of human psychology, this paper suggests contextualizing actions by pairing their verbs with their relative objects at an early stage. The proposed system consists of two streams: a semantic memory stream on one hand, where verb-object pairs are represented via a graph network by their corresponding feature vector; and an episodic memory stream on the other hand in which human-objects interactions are represented by their corresponding visual features. Experimental results indicate that our proposed model achieves comparable results on the HICO-DET dataset with a pretrained object detector and superior results on HICO-DET with finetuned detector.
scitepress.org
Showing the best result for this search. See all results