Skip to main content

Deep Transfer Feature Based Convolutional Neural Forests for Head Pose Estimation

  • Conference paper
  • First Online:
Image and Video Technology (PSIVT 2017)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10799))

Included in the following conference series:

  • 1141 Accesses

Abstract

In real-world applications, factors such as illumination, occlusion, and poor image quality, etc. make robust head pose estimation much more challenging. In this paper, a novel deep transfer feature based on convolutional neural forest method (D-CNF) is proposed for head pose estimation. Deep transfer features are extracted from facial patches by a transfer network model, firstly. Then, a D-CNF is devised to integrate random trees with the representation learning from deep convolutional neural networks for robust head pose estimation. In the learning process, we introduce a neurally connected split function (NCSF) as the node splitting strategy in a convolutional neural tree. Experiments were conducted using public Pointing’04, BU3D-HP and CCNU-HP facial datasets. Compared to the state-of-the-art methods, the proposed method achieved much improved performance and great robustness with an average accuracy of 98.99% on BU3D-HP dataset, 95.7% on Pointing’04 and 82.46% on CCNU-HP dataset. In addition, in contrast to deep neural networks which require large-scale training data, our method performs well even when there are only a small amount of training data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ahn, B., Park, J., Kweon, I.S.: Real-time head orientation from a monocular camera using deep neural network. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9005, pp. 82–96. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16811-1_6

    Chapter  Google Scholar 

  2. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  Google Scholar 

  3. Bulo, S.R., Kontschieder, P.: Neural decision forests for semantic image labeling. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 81–88 (2014)

    Google Scholar 

  4. Chu, X., Ouyang, W., Li, H., Wang, X.: Structured feature learning for pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4715–4723 (2016)

    Google Scholar 

  5. Dantone, M., Gall, J., Fanelli, G., Van Gool, L.: Real-time facial feature detection using conditional regression forests. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2578–2585. IEEE (2012)

    Google Scholar 

  6. Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., Darrell, T.: Decaf: a deep convolutional activation feature for generic visual recognition. In: ICML, vol. 32, 647–655 (2014)

    Google Scholar 

  7. Fanelli, G., Yao, A., Noel, P.-L., Gall, J., Van Gool, L.: Hough forest-based facial expression recognition from video sequences. In: Kutulakos, K.N. (ed.) ECCV 2010. LNCS, vol. 6553, pp. 195–206. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35749-7_15

    Chapter  Google Scholar 

  8. García-Montero, M., Redondo-Cabrera, C., López-Sastre, R., Tuytelaars, T.: Fast head pose estimation for human-computer interaction. In: Paredes, R., Cardoso, J.S., Pardo, X.M. (eds.) IbPRIA 2015. LNCS, vol. 9117, pp. 101–110. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19390-8_12

    Chapter  Google Scholar 

  9. Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)

    Google Scholar 

  10. Gourier, N., Hall, D., Crowley, J.: Estimating face orientation from robust detection of salient facial features in pointing. In: International Conference on Pattern Recognition Workshop on Visual Observation of Deictic Gestures, pp. 1379–1382 (2004)

    Google Scholar 

  11. Insafutdinov, E., Pishchulin, L., Andres, B., Andriluka, M., Schiele, B.: DeeperCut: a deeper, stronger, and faster multi-person pose estimation model. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 34–50. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_3

    Chapter  Google Scholar 

  12. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Proceedings of the 22nd ACM International Conference on Multimedia

    Google Scholar 

  13. Wu, J., Trivedi, M.M.: A two-stage head pose estimation framework and evaluation. Pattern Recogn. 41, 1138–1158 (2008)

    Article  Google Scholar 

  14. Kim, H., Sohn, M., Kim, D., Lee, S.: Kernel locality-constrained sparse coding for head pose estimation. IET Comput. Vis. 10(8), 828–835 (2016)

    Article  Google Scholar 

  15. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

    Google Scholar 

  16. Liu, X., Liang, W., Wang, Y., Li, S., Pei, M.: 3D head pose estimation with convolutional neural network trained on synthetic images. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 1289–1293. IEEE (2016)

    Google Scholar 

  17. Liu, Y., Chen, J., Shu, Z., Luo, Z., Liu, L., Zhang, K.: Robust head pose estimation using dirichlet-tree distribution enhanced random forests. Neurocomputing 173, 42–53 (2016)

    Article  Google Scholar 

  18. Liu, Y., Xie, Z., Yuan, X., Chen, J., Song, W.: Multi-level structured hybrid forest for joint head detection and pose estimation. Neurocomputing 266, 206–215 (2017)

    Article  Google Scholar 

  19. Ma, B., Li, A., Chai, X., Shan, S.: CovGa: a novel descriptor based on symmetry of regions for head pose estimation. Neurocomputing 143, 97–108 (2014)

    Article  Google Scholar 

  20. Mukherjee, S.S., Robertson, N.M.: Deep head pose: gaze-direction estimation in multimodal video. IEEE Trans. Multimedia 17(11), 2094–2107 (2015)

    Article  Google Scholar 

  21. Murphy-Chutorian, E., Trivedi, M.M.: Head pose estimation in computer vision: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 31(4), 607–626 (2009)

    Article  Google Scholar 

  22. Orozco, J., Gong, S., Xiang, T.: Head pose classification in crowded scenes. In: British Machine Vision Conference, London, UK, pp. 1–3, 7–10 September 2009

    Google Scholar 

  23. Parkhi, O.M., Vedaldi, A., Zisserman, A.: Deep face recognition. In: BMVC, vol. 1, p. 6 (2015)

    Google Scholar 

  24. Patacchiola, M., Cangelosi, A.: Head pose estimation in the wild using convolutional neural networks and adaptive gradient methods. Pattern Recogn. 71, 132–143 (2017)

    Article  Google Scholar 

  25. Ranjan, R., Patel, V.M., Chellappa, R.: Hyperface: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. arXiv preprint arXiv:1603.01249 (2016)

  26. Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: XNOR-Net: imagenet classification using binary convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 525–542. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_32

    Chapter  Google Scholar 

  27. Schwarz, A., Lin, Z., Stiefelhagen, R.: HeHOP: highly efficient head orientation and position estimation. In: 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1–8. IEEE (2016)

    Google Scholar 

  28. Wu, S., Kan, M., He, Z., Shan, S., Chen, X.: Funnel-structured cascade for multi-view face detection with alignment-awareness. Neurocomputing 221, 138–145 (2017)

    Article  Google Scholar 

  29. Xin, G., Xia, Y.: Head pose estimation based on multivariate label distribution. In: IEEE Conference on Computer Vision and Pattern Recognition, Ohio, USA, pp. 1837–1842, 24–27 June 2014

    Google Scholar 

  30. Xu, X., Kakadiaris, I.A.: Joint head pose estimation and face alignment framework using global and local CNN features. In: Proceedings of the 12th IEEE Conference on Automatic Face and Gesture Recognition, Washington, DC, vol. 2 (2017)

    Google Scholar 

  31. Yin, L., Wei, X., Sun, Y., Wang, J., Rosato, M.J.: A 3D facial expression database for facial behavior research. In: 2006 7th International Conference on Automatic Face and Gesture Recognition, FGR 2006, pp. 211–216. IEEE (2006)

    Google Scholar 

  32. Zhang, T., Zheng, W., Cui, Z., Zong, Y., Yan, J., Yan, K.: A deep neural network-driven feature learning method for multi-view facial expression recognition. IEEE Trans. Multimedia 18(12), 2528–2536 (2016)

    Article  Google Scholar 

  33. Zheng, W.: Multi-view facial expression recognition based on group sparse reduced-rank regression. IEEE Trans. Affect. Comput. 5(1), 71–85 (2014)

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China (No. 61602429), China Postdoctoral Science Foundation (No. 2016M592406), and Research Funds of CUG from the Colleges Basic Research and Operation of MOE (No. 26420160055).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xi Gong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, Y., Xie, Z., Gong, X., Fang, F. (2018). Deep Transfer Feature Based Convolutional Neural Forests for Head Pose Estimation. In: Satoh, S. (eds) Image and Video Technology. PSIVT 2017. Lecture Notes in Computer Science(), vol 10799. Springer, Cham. https://doi.org/10.1007/978-3-319-92753-4_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-92753-4_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-92752-7

  • Online ISBN: 978-3-319-92753-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics