Video Scene Parsing with Predictive Feature Learning

Jin, Xiaojie; Li, Xin; Xiao, Huaxin; Shen, Xiaohui; Lin, Zhe; Yang, Jimei; Chen, Yunpeng; Dong, Jian; Liu, Luoqi; Jie, Zequn; Feng, Jiashi; Yan, Shuicheng

Computer Science > Computer Vision and Pattern Recognition

arXiv:1612.00119 (cs)

[Submitted on 1 Dec 2016 (v1), last revised 13 Dec 2016 (this version, v2)]

Title:Video Scene Parsing with Predictive Feature Learning

Authors:Xiaojie Jin, Xin Li, Huaxin Xiao, Xiaohui Shen, Zhe Lin, Jimei Yang, Yunpeng Chen, Jian Dong, Luoqi Liu, Zequn Jie, Jiashi Feng, Shuicheng Yan

View PDF

Abstract:In this work, we address the challenging video scene parsing problem by developing effective representation learning methods given limited parsing annotations. In particular, we contribute two novel methods that constitute a unified parsing framework. (1) \textbf{Predictive feature learning}} from nearly unlimited unlabeled video data. Different from existing methods learning features from single frame parsing, we learn spatiotemporal discriminative features by enforcing a parsing network to predict future frames and their parsing maps (if available) given only historical frames. In this way, the network can effectively learn to capture video dynamics and temporal context, which are critical clues for video scene parsing, without requiring extra manual annotations. (2) \textbf{Prediction steering parsing}} architecture that effectively adapts the learned spatiotemporal features to scene parsing tasks and provides strong guidance for any off-the-shelf parsing model to achieve better video scene parsing performance. Extensive experiments over two challenging datasets, Cityscapes and Camvid, have demonstrated the effectiveness of our methods by showing significant improvement over well-established baselines.

Comments:	15 pages, 7 figures, 5 tables, currently v2
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:1612.00119 [cs.CV]
	(or arXiv:1612.00119v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1612.00119

Submission history

From: Xiaojie Jin Mr. [view email]
[v1] Thu, 1 Dec 2016 02:48:48 UTC (1,742 KB)
[v2] Tue, 13 Dec 2016 04:55:42 UTC (1,826 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Video Scene Parsing with Predictive Feature Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Video Scene Parsing with Predictive Feature Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators