Authors:
Yasser Boutaleb
1
;
2
;
Catherine Soladie
1
;
Nam-Duong Duong
2
;
Amine Kacete
2
;
Jérôme Royan
2
and
Renaud Seguier
1
Affiliations:
1
IETR/CentraleSupelec, Avenue de la Boulaie, 35510 Cesson-Sevigné, France
;
2
IRT b-com, 1219 Avenue des Champs Blancs, 35510 Cesson-Sevigné, France
Keyword(s):
First-person Hand Activity, Multi-stream Learning, 3D Hand Skeleton, Hand-crafted Features, Temporal Learning.
Abstract:
Recognizing first-person hand activity is a challenging task, especially when not enough data are available. In this paper, we tackle this challenge by proposing a new hybrid learning pipeline for skeleton-based hand activity recognition, which is composed of three blocks. First, for a given sequence of hand’s joint positions, the spatial features are extracted using a dedicated combination of local and global spacial hand-crafted features. Then, the temporal dependencies are learned using a multi-stream learning strategy. Finally, a hand activity sequence classifier is learned, via our Post-fusion strategy, applied to the previously learned temporal dependencies. The experiments, evaluated on two real-world data sets, show that our approach performs better than the state-of-the-art approaches. For more ablation study, we compared our Post-fusion strategy with three traditional fusion baselines and showed an improvement above 2.4% of accuracy.