Spatio-temporal Human Action Localisation and Instance Segmentation in Temporally Untrimmed Videos

Saha, Suman; Singh, Gurkirt; Sapienza, Michael; Torr, Philip H. S.; Cuzzolin, Fabio

Computer Science > Computer Vision and Pattern Recognition

arXiv:1707.07213 (cs)

[Submitted on 22 Jul 2017 (v1), last revised 6 Aug 2017 (this version, v2)]

Title:Spatio-temporal Human Action Localisation and Instance Segmentation in Temporally Untrimmed Videos

Authors:Suman Saha, Gurkirt Singh, Michael Sapienza, Philip H. S. Torr, Fabio Cuzzolin

View PDF

Abstract:Current state-of-the-art human action recognition is focused on the classification of temporally trimmed videos in which only one action occurs per frame. In this work we address the problem of action localisation and instance segmentation in which multiple concurrent actions of the same class may be segmented out of an image sequence. We cast the action tube extraction as an energy maximisation problem in which configurations of region proposals in each frame are assigned a cost and the best action tubes are selected via two passes of dynamic programming. One pass associates region proposals in space and time for each action category, and another pass is used to solve for the tube's temporal extent and to enforce a smooth label sequence through the video. In addition, by taking advantage of recent work on action foreground-background segmentation, we are able to associate each tube with class-specific segmentations. We demonstrate the performance of our algorithm on the challenging LIRIS-HARL dataset and achieve a new state-of-the-art result which is 14.3 times better than previous methods.

Comments:	Typos corrected
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:1707.07213 [cs.CV]
	(or arXiv:1707.07213v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1707.07213

Submission history

From: Suman Saha [view email]
[v1] Sat, 22 Jul 2017 20:46:11 UTC (1,663 KB)
[v2] Sun, 6 Aug 2017 15:22:59 UTC (1,663 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Spatio-temporal Human Action Localisation and Instance Segmentation in Temporally Untrimmed Videos

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Spatio-temporal Human Action Localisation and Instance Segmentation in Temporally Untrimmed Videos

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators