Deep Transfer Feature Based Convolutional Neural Forests for Head Pose Estimation

Liu, Yuanyuan; Xie, Zhong; Gong, Xi; Fang, Fang

doi:10.1007/978-3-319-92753-4_1

Yuanyuan Liu¹⁴,
Zhong Xie¹⁴,
Xi Gong¹⁴ &
…
Fang Fang¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10799))

Included in the following conference series:

Pacific-Rim Symposium on Image and Video Technology

1141 Accesses

Abstract

In real-world applications, factors such as illumination, occlusion, and poor image quality, etc. make robust head pose estimation much more challenging. In this paper, a novel deep transfer feature based on convolutional neural forest method (D-CNF) is proposed for head pose estimation. Deep transfer features are extracted from facial patches by a transfer network model, firstly. Then, a D-CNF is devised to integrate random trees with the representation learning from deep convolutional neural networks for robust head pose estimation. In the learning process, we introduce a neurally connected split function (NCSF) as the node splitting strategy in a convolutional neural tree. Experiments were conducted using public Pointing’04, BU3D-HP and CCNU-HP facial datasets. Compared to the state-of-the-art methods, the proposed method achieved much improved performance and great robustness with an average accuracy of 98.99% on BU3D-HP dataset, 95.7% on Pointing’04 and 82.46% on CCNU-HP dataset. In addition, in contrast to deep neural networks which require large-scale training data, our method performs well even when there are only a small amount of training data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ahn, B., Park, J., Kweon, I.S.: Real-time head orientation from a monocular camera using deep neural network. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9005, pp. 82–96. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16811-1_6
Chapter Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article Google Scholar
Bulo, S.R., Kontschieder, P.: Neural decision forests for semantic image labeling. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 81–88 (2014)
Google Scholar
Chu, X., Ouyang, W., Li, H., Wang, X.: Structured feature learning for pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4715–4723 (2016)
Google Scholar
Dantone, M., Gall, J., Fanelli, G., Van Gool, L.: Real-time facial feature detection using conditional regression forests. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2578–2585. IEEE (2012)
Google Scholar
Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., Darrell, T.: Decaf: a deep convolutional activation feature for generic visual recognition. In: ICML, vol. 32, 647–655 (2014)
Google Scholar
Fanelli, G., Yao, A., Noel, P.-L., Gall, J., Van Gool, L.: Hough forest-based facial expression recognition from video sequences. In: Kutulakos, K.N. (ed.) ECCV 2010. LNCS, vol. 6553, pp. 195–206. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35749-7_15
Chapter Google Scholar
García-Montero, M., Redondo-Cabrera, C., López-Sastre, R., Tuytelaars, T.: Fast head pose estimation for human-computer interaction. In: Paredes, R., Cardoso, J.S., Pardo, X.M. (eds.) IbPRIA 2015. LNCS, vol. 9117, pp. 101–110. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19390-8_12
Chapter Google Scholar
Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Google Scholar
Gourier, N., Hall, D., Crowley, J.: Estimating face orientation from robust detection of salient facial features in pointing. In: International Conference on Pattern Recognition Workshop on Visual Observation of Deictic Gestures, pp. 1379–1382 (2004)
Google Scholar
Insafutdinov, E., Pishchulin, L., Andres, B., Andriluka, M., Schiele, B.: DeeperCut: a deeper, stronger, and faster multi-person pose estimation model. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 34–50. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_3
Chapter Google Scholar
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Proceedings of the 22nd ACM International Conference on Multimedia
Google Scholar
Wu, J., Trivedi, M.M.: A two-stage head pose estimation framework and evaluation. Pattern Recogn. 41, 1138–1158 (2008)
Article Google Scholar
Kim, H., Sohn, M., Kim, D., Lee, S.: Kernel locality-constrained sparse coding for head pose estimation. IET Comput. Vis. 10(8), 828–835 (2016)
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Liu, X., Liang, W., Wang, Y., Li, S., Pei, M.: 3D head pose estimation with convolutional neural network trained on synthetic images. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 1289–1293. IEEE (2016)
Google Scholar
Liu, Y., Chen, J., Shu, Z., Luo, Z., Liu, L., Zhang, K.: Robust head pose estimation using dirichlet-tree distribution enhanced random forests. Neurocomputing 173, 42–53 (2016)
Article Google Scholar
Liu, Y., Xie, Z., Yuan, X., Chen, J., Song, W.: Multi-level structured hybrid forest for joint head detection and pose estimation. Neurocomputing 266, 206–215 (2017)
Article Google Scholar
Ma, B., Li, A., Chai, X., Shan, S.: CovGa: a novel descriptor based on symmetry of regions for head pose estimation. Neurocomputing 143, 97–108 (2014)
Article Google Scholar
Mukherjee, S.S., Robertson, N.M.: Deep head pose: gaze-direction estimation in multimodal video. IEEE Trans. Multimedia 17(11), 2094–2107 (2015)
Article Google Scholar
Murphy-Chutorian, E., Trivedi, M.M.: Head pose estimation in computer vision: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 31(4), 607–626 (2009)
Article Google Scholar
Orozco, J., Gong, S., Xiang, T.: Head pose classification in crowded scenes. In: British Machine Vision Conference, London, UK, pp. 1–3, 7–10 September 2009
Google Scholar
Parkhi, O.M., Vedaldi, A., Zisserman, A.: Deep face recognition. In: BMVC, vol. 1, p. 6 (2015)
Google Scholar
Patacchiola, M., Cangelosi, A.: Head pose estimation in the wild using convolutional neural networks and adaptive gradient methods. Pattern Recogn. 71, 132–143 (2017)
Article Google Scholar
Ranjan, R., Patel, V.M., Chellappa, R.: Hyperface: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. arXiv preprint arXiv:1603.01249 (2016)
Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: XNOR-Net: imagenet classification using binary convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 525–542. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_32
Chapter Google Scholar
Schwarz, A., Lin, Z., Stiefelhagen, R.: HeHOP: highly efficient head orientation and position estimation. In: 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1–8. IEEE (2016)
Google Scholar
Wu, S., Kan, M., He, Z., Shan, S., Chen, X.: Funnel-structured cascade for multi-view face detection with alignment-awareness. Neurocomputing 221, 138–145 (2017)
Article Google Scholar
Xin, G., Xia, Y.: Head pose estimation based on multivariate label distribution. In: IEEE Conference on Computer Vision and Pattern Recognition, Ohio, USA, pp. 1837–1842, 24–27 June 2014
Google Scholar
Xu, X., Kakadiaris, I.A.: Joint head pose estimation and face alignment framework using global and local CNN features. In: Proceedings of the 12th IEEE Conference on Automatic Face and Gesture Recognition, Washington, DC, vol. 2 (2017)
Google Scholar
Yin, L., Wei, X., Sun, Y., Wang, J., Rosato, M.J.: A 3D facial expression database for facial behavior research. In: 2006 7th International Conference on Automatic Face and Gesture Recognition, FGR 2006, pp. 211–216. IEEE (2006)
Google Scholar
Zhang, T., Zheng, W., Cui, Z., Zong, Y., Yan, J., Yan, K.: A deep neural network-driven feature learning method for multi-view facial expression recognition. IEEE Trans. Multimedia 18(12), 2528–2536 (2016)
Article Google Scholar
Zheng, W.: Multi-view facial expression recognition based on group sparse reduced-rank regression. IEEE Trans. Affect. Comput. 5(1), 71–85 (2014)
Article Google Scholar

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China (No. 61602429), China Postdoctoral Science Foundation (No. 2016M592406), and Research Funds of CUG from the Colleges Basic Research and Operation of MOE (No. 26420160055).

Author information

Authors and Affiliations

Faculty of Information Engineering, China University of Geosciences, Wuhan, 430074, China
Yuanyuan Liu, Zhong Xie, Xi Gong & Fang Fang

Authors

Yuanyuan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zhong Xie
View author publications
You can also search for this author in PubMed Google Scholar
Xi Gong
View author publications
You can also search for this author in PubMed Google Scholar
Fang Fang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xi Gong .

Editor information

Editors and Affiliations

National Institute of Informatics, Tokyo, Japan
Shin'ichi Satoh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, Y., Xie, Z., Gong, X., Fang, F. (2018). Deep Transfer Feature Based Convolutional Neural Forests for Head Pose Estimation. In: Satoh, S. (eds) Image and Video Technology. PSIVT 2017. Lecture Notes in Computer Science(), vol 10799. Springer, Cham. https://doi.org/10.1007/978-3-319-92753-4_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-92753-4_1
Published: 06 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-92752-7
Online ISBN: 978-3-319-92753-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Deep Transfer Feature Based Convolutional Neural Forests for Head Pose Estimation