Regression-Based Face Pose Estimation with Deep Multi-modal Feature Loss

Wu, Yanqiu; Hong, Chaoqun; Chen, Liang; Zeng, Zhiqiang

doi:10.1007/978-981-15-7981-3_39

Regression-Based Face Pose Estimation with Deep Multi-modal Feature Loss

Yanqiu Wu¹⁰,
Chaoqun Hong ORCID: orcid.org/0000-0003-4472-7298¹⁰,
Liang Chen¹¹ &
…
Zhiqiang Zeng¹⁰

Conference paper
First Online: 20 August 2020

1201 Accesses
1 Citations

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1257))

Abstract

Image-based face pose estimation tries to estimate the facial direction with 2D images. It provides important information for many face recognition applications. However, it is a difficult task due to complex conditions and appearances. Deep learning method used in this field has the disadvantage of ignoring the natural structures of human faces. To solve this problem, a framework is proposed in this paper to estimate face poses with regression, which is based on deep learning and multi-modal feature loss (\(M^2FL\)). Different from current loss functions using only a single type of features, the descriptive power was improved by combining multiple image features. To achieve it, hypergraph-based manifold regularization was applied. In this way, the loss of face pose estimation was reduced. Experimental results on commonly-used benchmark datasets demonstrate the performance of \(M^2FL\).

This work was partly supported in part by the National Natural Science Foundation of China (61871464 and 61836002), the Fujian Provincial Natural Science Foundation of China (2018J01573), the Foundation of Fujian Educational Committee (JAT160357), Distinguished Young Scientific Research Talents Plan in Universities of Fujian Province and the Program for New Century Excellent Talents in University of Fujian Province.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

BenAbdelkader, C.: Robust head pose estimation using supervised manifold learning. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6316, pp. 518–531. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15567-3_38
Chapter Google Scholar
Benfold, B., Reid, I.D.: Colour invariant head pose classification in low resolution video. In: British Machine Vision Conference, pp. 1–10 (2008)
Google Scholar
Bulat, A., Tzimiropoulos, G.: How far are we from solving the 2D & 3D face alignment problem? (and a dataset of 230,000 3D facial landmarks). In: IEEE International Conference on Computer Vision, pp. 1021–1030 (2017)
Google Scholar
Chen, C., Odobez, J.M.: We are not contortionists: coupled adaptive learning for head and body orientation estimation in surveillance video. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1544–1551. IEEE (2012)
Google Scholar
Ding, C., Xu, C., Tao, D.: Multi-task pose-invariant face recognition. IEEE Trans. Image Process. 24(3), 980–993 (2015)
Article MathSciNet Google Scholar
Drouard, V., Horaud, R., Deleforge, A., Ba, S., Evangelidis, G.: Robust head-pose estimation based on partially-latent mixture of linear regressions. IEEE Trans. Image Process. (2016)
Google Scholar
Du, G., Zhang, P., Liu, X.: Markerless human manipulator interface using leap motion with interval Kalman filter and improved particle filter. IEEE Trans. Ind. Inf. 12(2), 694–704 (2017)
Article Google Scholar
Fanelli, G., Dantone, M., Gall, J., Fossati, A., Gool, L.V.: Random forests for real time 3D face analysis. Int. J. Comput. Vis. 101(3), 437–458 (2013). https://doi.org/10.1007/s11263-012-0549-0
Article Google Scholar
Flohr, F., Dumitru-Guzu, M., Kooij, J.F.P., Gavrila, D.M.: A probabilistic framework for joint pedestrian head and body orientation estimation. IEEE Trans. Intell. Transp. Syst. 16(4), 1872–1882 (2015)
Article Google Scholar
Hong, C., Yu, J., Li, J., Chen, X.: Multi-view hypergraph learning by patch alignment framework. Neurocomputing 118, 79–86 (2013)
Article Google Scholar
Huang, Q.Y., Jia, C.K., Zhang, X.F., Ye, Y.M.: Learning discriminative subspace models for weakly supervised face detection. IEEE Trans. Ind. Inf. 13(6), 2956–2964 (2017)
Article Google Scholar
Kan, M., Kan, M., Shan, S., Shan, S., Chen, X.: Funnel-structured cascade for multi-view face detection with alignment-awareness. Neurocomputing 221, 138–145 (2016)
Google Scholar
Kazemi, V., Sullivan, J.: One millisecond face alignment with an ensemble of regression trees. In: IEEE Conference on Computer Vision & Pattern Recognition, pp. 1867–1874 (2014)
Google Scholar
Kong, S.G., Mbouna, R.O.: Head pose estimation from a 2D face image using 3D face morphing with depth parameters. IEEE Trans. Image Process. 24(6), 1801–1808 (2015)
Article MathSciNet Google Scholar
Kumar, A., Alavi, A., Chellappa, R.: KEPLER: keypoint and pose estimation of unconstrained faces by learning efficient H-CNN regressors. Computer Vision and Pattern Recognition (2017)
Google Scholar
Lu, F., Sugano, Y., Okabe, T., Sato, Y.: Gaze estimation from eye appearance: a head pose-free method via eye image synthesis. IEEE Trans. Image Process. 24(11), 3680–93 (2015)
Article MathSciNet Google Scholar
Luo, R.C., Chen, S.Y.: Human pose estimation in 3-D space using adaptive control law with point-cloud-based limb regression approach. IEEE Trans. Ind. Inf. 12(1), 51–58 (2016)
Article MathSciNet Google Scholar
Mukherjee, S.S., Robertson, N.M.: Deep head pose: gaze-direction estimation in multimodal video. IEEE Trans. Multimedia 17(11), 2094–2107 (2015)
Article Google Scholar
Pawiak, P., Sonicki, T., Niedwiecki, M., Tabor, Z., Rzecki, K.: Hand body language gesture recognition based on signals from specialized glove and machine learning algorithms. IEEE Trans. Ind. Inf. 12(3), 1104–1113 (2016)
Article Google Scholar
Rajagopal, A.K., Subramanian, R., Ricci, E., Vieriu, R.L., Lanz, O., Sebe, N., et al.: Exploring transfer learning approaches for head pose classification from multi-view surveillance images. Int. J. Comput. Vis. 109(1), 146–167 (2013). https://doi.org/10.1007/s11263-013-0692-2
Ranjan, R., Patel, V.M., Chellappa, R.: Hyperface: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE Trans. Pattern Anal. Mach. Intell. 41, 121–135 (2018)
Article Google Scholar
Robertson, N., Reid, I.: Estimating gaze direction from low-resolution faces in video. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 402–415. Springer, Heidelberg (2006). https://doi.org/10.1007/11744047_31
Chapter Google Scholar
Ruiz, N., Chong, E., Rehg, J.M.: Fine-grained head pose estimation without keypoints. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2018
Google Scholar
Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: Computer Vision and Pattern Recognition, pp. 815–823 (2015)
Google Scholar
Simao, M.A., Neto, P., Gibaru, O.: Unsupervised gesture segmentation by motion detection of a real-time data stream. IEEE Trans. Ind. Inf. 13(2), 473–481 (2016)
Article Google Scholar
Siriteerakul, T., Sugimura, D., Sato, Y.: Head pose classification from low resolution images using pairwise non-local intensity and color differences. In: Pacific-Rim Symposium on Image and Video Technology, pp. 362–369. IEEE (2010)
Google Scholar
Vadakkepat, P., Lim, P., Silva, L.C.D., Jing, L., Ling, L.L.: Multimodal approach to human-face detection and tracking. IEEE Trans. Ind. Electron. 55(3), 1385–1393 (2008)
Article Google Scholar
Yan, Y., Ricci, E., Subramanian, R., Liu, G., Lanz, O., Sebe, N.: A multi-task learning framework for head pose estimation under target motion. IEEE Trans. Pattern Anal. Mach. Intell. 38(6), 1070–1083 (2016)
Article Google Scholar
Yang, T.Y., Chen, Y.T., Lin, Y.Y., Chuang, Y.Y.: FSA-Net: learning fine-grained structure aggregation for head pose estimation from a single image. In: The IEEE Conference on Computer Vision and Pattern Recognition (2019)
Google Scholar
Zhang, T., Tao, D., Li, X., Yang, J.: Patch alignment for dimensionality reduction. IEEE Trans. Knowl. Data Eng. 21, 1299–1313 (2009)
Article Google Scholar
Zhang, T., Tao, D., Li, X., Yang, J.: Patch alignment for dimensionality reduction. IEEE Trans. Knowl. Data Eng. 21(9), 1299–1313 (2009)
Article Google Scholar
Zhu, X., Zhen, L., Liu, X., Shi, H., Li, S.Z.: Face alignment across large poses: a 3D solution. In: IEEE Conference on Computer Vision & Pattern Recognition, pp. 146–155 (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Information Engineering, Xiamen University of Technology, Xiamen, 361024, Fujian, China
Yanqiu Wu, Chaoqun Hong & Zhiqiang Zeng
School of Data and Computer Science, Sun Yat-Sen University, Guangzhou, 510006, China
Liang Chen

Authors

Yanqiu Wu
View author publications
You can also search for this author in PubMed Google Scholar
Chaoqun Hong
View author publications
You can also search for this author in PubMed Google Scholar
Liang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Zhiqiang Zeng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chaoqun Hong .

Editor information

Editors and Affiliations

North University of China, Taiyuan, China
Jianchao Zeng
Northeast Forestry University, Harbin, China
Weipeng Jing
Harbin University of Science and Technology, Harbin, China
Xianhua Song
National Academy of Guo Ding Institute of Data Science, Beijing, China
Zeguang Lu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, Y., Hong, C., Chen, L., Zeng, Z. (2020). Regression-Based Face Pose Estimation with Deep Multi-modal Feature Loss. In: Zeng, J., Jing, W., Song, X., Lu, Z. (eds) Data Science. ICPCSEE 2020. Communications in Computer and Information Science, vol 1257. Springer, Singapore. https://doi.org/10.1007/978-981-15-7981-3_39

Download citation

DOI: https://doi.org/10.1007/978-981-15-7981-3_39
Published: 20 August 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-7980-6
Online ISBN: 978-981-15-7981-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics