Joint head pose and facial landmark regression from depth images

Wang, Jie; Zhang, Juyong; Luo, Changwei; Chen, Falai

doi:10.1007/s41095-017-0082-8

Joint head pose and facial landmark regression from depth images

Research Article
Open access
Published: 08 May 2017

Volume 3, pages 229–241, (2017)
Cite this article

Download PDF

You have full access to this open access article

Computational Visual Media Aims and scope Submit manuscript

Joint head pose and facial landmark regression from depth images

Download PDF

Jie Wang¹,
Juyong Zhang¹,
Changwei Luo¹ &
…
Falai Chen¹

1458 Accesses
14 Citations
Explore all metrics

Abstract

This paper presents a joint head pose and facial landmark regression method with input from depth images for realtime application. Our main contributions are: firstly, a joint optimization method to estimate head pose and facial landmarks, i.e., the pose regression result provides supervised initialization for cascaded facial landmark regression, while the regression result for the facial landmarks can also help to further refine the head pose at each stage. Secondly, we classify the head pose space into 9 sub-spaces, and then use a cascaded random forest with a global shape constraint for training facial landmarks in each specific space. This classification-guided method can effectively handle the problem of large pose changes and occlusion. Lastly, we have built a 3D face database containing 73 subjects, each with 14 expressions in various head poses. Experiments on challenging databases show our method achieves state-of-the-art performance on both head pose estimation and facial landmark regression.

Article PDF

3D Facial Landmark Detection: How to Deal with Head Rotations?

Cross-Cascading Regression for Simultaneous Head Pose Estimation and Facial Landmark Detection

6DFLRNet: 6D rotation representation for head pose estimation based on facial landmarks and regression

Article 26 January 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Cao, C.; Weng, Y.; Lin, S.; Zhou, K. 3D shape regression for real-time facial animation. ACM Transactions on Graphics Vol. 32, No. 4, Article No. 41, 2013.
Google Scholar
Cao, C.; Hou, Q.; Zhou, K. Displaced dynamic expression regression for real-time facial tracking and animation. ACM Transactions on Graphics Vol. 33, No. 4, Article No. 43, 2014.
Google Scholar
Breitenstein, M. D.; Kuettel, D.; Weise, T.; van Gool, L.; Pfister, H. Real-time face pose estimation from single range images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1–8, 2008.
Google Scholar
Meyer, G. P.; Gupta, S.; Frosio, I.; Reddy, D.; Kautz, J. Robust model-based 3D head pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision, 3649–3657, 2015.
Google Scholar
Padeleris, P.; Zabulis, X.; Argyros, A. A. Head pose estimation on depth based on particle swarm optimation. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 42–49, 2012.
Google Scholar
Seeman, E.; Nickel, K.; Stiefelhagen, R. Head pose estimation using stereo vision for human–robot interaction. In: Proceedings of the 6th IEEE International Conference on Automatic Face and Gesture Recognition, 626–631, 2004.
Google Scholar
Tulyakov, S.; Vieriu, R. L.; Semeniuta, S.; Sebe, N. Robust real-time extreme head pose estimation. In: Proceedings of the 22nd International Conference on Pattern Recognition, 2263–2268, 2014.
Google Scholar
Burgos-Artizzu, X. P.; Perona, P.; Dollar, P. Robust face landmark estimation under occlusion. In: Proceedings of the IEEE International Conference on Computer Vision, 151–1520, 2013.
Google Scholar
Cao, X.; Wei, Y.; Wei, F.; Sun, J. Face alignment by explicit shape regression. International Journal of Computer Vision Vol. 107, No. 2, 177–190, 2014.
Article MathSciNet Google Scholar
Dantone, M.; Gall, J.; Fanelli, G.; van Gool, L. Real-time facial feature detection using conditional regression forests. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2578–2585, 2012.
Google Scholar
Zhang, Z.; Zhang, W.; Liu, J.; Tang, X. Multiview facial landmark localization in RGB-D images via hierarchical regression with binary patterns. IEEE Transactions on Circuits and Systems for Video Technology Vol. 24, No. 9, 1475–1485, 2014.
Article Google Scholar
Zhu, Z.; Martin, R. R.; Pepperell, R.; Burleigh, A. 3D modeling and motion parallax for improved videoconferencing. Computational Visual Media Vol. 2, No. 2, 131–142, 2016.
Article Google Scholar
Dolláar, P.; Welinder, P.; Perona, P. Cascaded pose regression. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1078–1085, 2010.
Google Scholar
Sun, X.; Wei, Y.; Liang, S.; Tang, X.; Sun, J. Cascaded hand pose regression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 824–832, 2015.
Google Scholar
Chen, D.; Ren, S.; Wei, Y.; Cao, X.; Sun, J. Joint cascade face detection and alignment. In: Computer Vision–ECCV 2014. Fleet, D.; Pajdla, T.; Schiele, B.; Tuytelaars, T. Eds. Springer International Publishing Switzerland, 109–122, 2014.
Google Scholar
Lee, D.; Park, H.; Yoo, C. D. Face alignment using cascade Gaussian process regression trees. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4204–4212, 2015.
Google Scholar
Tzimiropoulos, G. Project-out cascaded regression with an application to face alignment. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3659–3667, 2015.
Google Scholar
Ren, S.; Cao, X.; Wei, Y.; Sun, J. Face alignment at 3000 fps via regression local binary features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1685–1692, 2014.
Google Scholar
Baltrušaitis, T.; Robinson, P.; Morency, L. P. 3D constrained local model for rigid and non-rigid facial tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2610–2617, 2012.
Google Scholar
Tan, D. J.; Tombari, F.; Navab, N. A combined generalized and subject-specific 3D head pose estimation. In: Proceedings of the International Conference on 3D Vision, 500–508, 2015.
Google Scholar
Borovikov, E. Human head pose estimation by facial features location. arXiv preprint arXiv:1510.02774, 2015.
Google Scholar
Fanelli, G.; Dantone, M.; Gall, J.; Fossati, A.; van Gool, L. Random forests for real time 3D face analysis. International Journal of Computer Vision Vol. 101, No. 3, 437–458, 2013.
Article Google Scholar
Fanelli, G.; Gall, J.; van Gool, L. Real time head pose estimation with random regression forests. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 617–624, 2011.
Google Scholar
Fanelli, G.; Weise, T.; Gall, J.; van Gool, L. Real time head pose estimation from consumer depth cameras. In: Pattern Recognition. Mester, R.; Felsberg, M. Eds. Springer-Verlag Berlin Heidelberg, 101–110, 2011.
Chapter Google Scholar
Papazov, C.; Marks, T. K.; Jones, M. Real-time 3D head pose and facial landmark estimation from depth images using triangular surface patch features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4722–4730, 2015.
Google Scholar
Cootes, T. F.; Edwards, G. J.; Taylor, C. J. Active appearance models. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 23, No. 6, 681–685, 2001.
Article Google Scholar
Cristinacce, D.; Cootes, T. Boost regression active shape models. In: Proceedings of the British Machine Conference, 79.1–79.10, 2007.
Google Scholar
Sauer, P.; Cootes, T.; Taylor, C. Accurate regression procedures for active appearance models. In: Proceedings of the British Machine Vision Conference, 30.1–30.11, 2011.
Google Scholar
Tzimiropoulos, G.; Pantic, M. Optimization problems for fast AAM fitting in-the-wild. In: Proceedings of the IEEE International Conference on Computer Vision, 593–600, 2013.
Google Scholar
Xiao, J.; Baker, S.; Matthews, I.; Kanade, T. Realtime combined 2D+3D active appearance models. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 535–542, 2004.
Google Scholar
Ruiz, M. C.; Illingworth, J. Automatic landmarking of faces in 3D-ALF^3D. In: Proceedings of the 5th International Conference on Visual Information Engineering, 41–46, 2008.
Google Scholar
Gilani, S. Z.; Shafait, F.; Mian, A. Shape-based automatic detection of a large number of 3D facial landmarks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4639–4648, 2015.
Google Scholar
Jourabloo, A.; Liu, X. Pose-invariant 3D face alignment. In: Proceedings of the IEEE International Conference on Computer Vision, 3694–3702, 2015.
Google Scholar
Jourabloo, A.; Liu, X. Large-pose face alignment via CNN-based dense 3D model fitting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4188–4196, 2016.
Google Scholar
Breiman, L. Random forests. Machine Learning Vol. 45, No. 1, 5–32, 2001.
Article MATH Google Scholar
Schulter, S.; Leistner, C.; Wohlhart, P.; Roth, P. M.; Bischof, H. Alternating regression forests for object detection and pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision, 417–424, 2013.
Google Scholar
Schulter, S.; Wohlhart, P.; Leistner, C.; Saffari, A.; Roth, P. M.; Bischof, H. Alternating decision forests. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 508–515, 2013.
Google Scholar
Wan, C.; Yao, A.; van Gool, L. Direction matters: Hand pose estimation from local surface normals. arXiv preprint arXiv:1604.02657, 2016.
Google Scholar
Ren, S.; Cao, X.; Wei, Y.; Sun, J. Global refinement of random forest. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 723–730, 2015.
Google Scholar
Fanelli, G.; Dantone, M.; van Gool, L. Real time 3D face alignment with random forests-based active appearance models. In: Proceedings of the 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, 1–8, 2013.
Google Scholar

Download references

Acknowledgements

We thank Luo Jiang and Boyi Jiang for their help in constructing the 3DFEP database. We thank the ETHZ-Computer Vision Lab for permission to use the BIWI Kinect Head Pose database and BIWI 3D Audiovisual Corpus of Affective Communication database. This work was supported by the National Key Technologies R&D Program of China (No. 2016YFC0800501), and the National Natural Science Foundation of China (No. 61672481).

Author information

Authors and Affiliations

University of Science and Technology of China, Hefei, Anhui, 230026, China
Jie Wang, Juyong Zhang, Changwei Luo & Falai Chen

Authors

Jie Wang
View author publications
You can also search for this author in PubMed Google Scholar
Juyong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Changwei Luo
View author publications
You can also search for this author in PubMed Google Scholar
Falai Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Juyong Zhang.

Additional information

This article is published with open access at Springerlink.com

Jie Wang is currently an M.S. student in the School of Computer Science at the University of Science and Technology of China. Her research interest is in computer vision and machine learning.

Juyong Zhang is an associate professor in the Department of Mathematics at the University of Science and Technology of China. He received his Ph.D. degree from the School of Computer Science and Engineering at Nanyang Technological University, Singapore. Before that, he received his B.S. degree in computer science and engineering from the University of Science and Technology of China in 2006. His research interests fall into the areas of computer graphics, computer vision, and machine learning.

Changwei Luo is a research assistant in the Department of Automation at the University of Science and Technology of China. His research interests cover computer vision and human computer interaction.

Falai Chen is a professor in the Department of Mathematics at the University of Science and Technology of China. He received his B.S., M.S., and Ph.D. degrees from the University of Science and Technology of China in 1987, 1989, and 1994, respectively. His current research interests are in computer aided geometric design and geometric modeling, specifically, in algebraic methods in geometric modeling, splines over T-meshes and their applications to isogeometric analysis.

Rights and permissions

Open Access The articles published in this journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www.editorialmanager.com/cvmj.

Reprints and permissions

About this article

Cite this article

Wang, J., Zhang, J., Luo, C. et al. Joint head pose and facial landmark regression from depth images. Comp. Visual Media 3, 229–241 (2017). https://doi.org/10.1007/s41095-017-0082-8

Download citation

Received: 16 January 2017
Accepted: 08 March 2017
Published: 08 May 2017
Issue Date: September 2017
DOI: https://doi.org/10.1007/s41095-017-0082-8

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Joint head pose and facial landmark regression from depth images

Abstract

Article PDF

Similar content being viewed by others

3D Facial Landmark Detection: How to Deal with Head Rotations?

Cross-Cascading Regression for Simultaneous Head Pose Estimation and Facial Landmark Detection

6DFLRNet: 6D rotation representation for head pose estimation based on facial landmarks and regression

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Joint head pose and facial landmark regression from depth images

Abstract

Article PDF

Similar content being viewed by others

3D Facial Landmark Detection: How to Deal with Head Rotations?

Cross-Cascading Regression for Simultaneous Head Pose Estimation and Facial Landmark Detection

6DFLRNet: 6D rotation representation for head pose estimation based on facial landmarks and regression

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation