Skip to main content

Simultaneous Face Detection and Head Pose Estimation: A Fast and Unified Framework

  • Conference paper
  • First Online:
Computer Vision – ACCV 2018 (ACCV 2018)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11361))

Included in the following conference series:

  • 2065 Accesses

Abstract

In this paper, we present a fast and unified framework for simultaneous face detection and 3D pose (pitch, yaw, roll) estimation of unconstrained faces using deep convolutional neural networks (CNN). Face detection is implemented with region-based framework as previous work like Faster RCNN. We model the pose estimation as a classification and regression problem: first divide continuous head poses into several discrete clusters, then adjust poses within each class with a class-specific regressor to achieve more accurate results. All classification and regressions for the two tasks are trained and tested simultaneously in one unified network. Our approach runs at 10 fps, which is the fastest implementation among the recently proposed methods as far as we know. Moreover, it is able to predict pose without using any 3D information. Extensive evaluations on several challenging benchmarks such as AFLW and AFW demonstrate the effectiveness of the proposed method with competitive results.

X. Zhao—This research is supported by the funding from NSFC programs (61673269, 61273285, U1764264).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Benenson, R., Mathias, M., Tuytelaars, T., Van Gool, L.: Seeking the strongest rigid detector. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3666–3673 (2013)

    Google Scholar 

  2. Chen, D., Ren, S., Wei, Y., Cao, X., Sun, J.: Joint cascade face detection and alignment. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 109–122. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10599-4_8

    Chapter  Google Scholar 

  3. Doshi, A., Trivedi, M.M.: Head and eye gaze dynamics during visual attention shifts in complex environments. J. Vis. 12(2), 9–9 (2012)

    Article  Google Scholar 

  4. Everingham, M., Van Gool, L., Williams, C., Winn, J., Zisserman, A.: The pascal visual object classes challenge 2012 (voc2012) results (2012) (2010). http://www.pascal-network.org/challenges/VOC/voc2011/workshop/index.html

  5. Farfade, S.S., Saberian, M.J., Li, L.J.: Multi-view face detection using deep convolutional neural networks. In: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, pp. 643–650. ACM (2015)

    Google Scholar 

  6. Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)

    Article  Google Scholar 

  7. Fu, Y., Huang, T.S.: Graph embedded analysis for head pose estimation. In: 7th International Conference on Automatic Face and Gesture Recognition FGR 2006, p. 6. IEEE (2006)

    Google Scholar 

  8. Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)

    Google Scholar 

  9. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)

    Google Scholar 

  10. Gu, C., Ren, X.: Discriminative mixture-of-templates for viewpoint classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6315, pp. 408–421. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15555-0_30

    Chapter  Google Scholar 

  11. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  12. Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM International Conference on Multimedia, pp. 675–678. ACM (2014)

    Google Scholar 

  13. Jones, M., Viola, P.: Fast multi-view face detection. Mitsubishi Electric Res. Lab TR-20003-96 3(14), 2 (2003)

    Google Scholar 

  14. Katzenmaier, M., Stiefelhagen, R., Schultz, T.: Identifying the addressee in human-human-robot interactions based on head pose and speech. In: International Conference on Multimodal Interaction, pp. 144–151 (2004)

    Google Scholar 

  15. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

    Google Scholar 

  16. Kumar, A., Alavi, A., Chellappa, R.: Kepler: keypoint and pose estimation of unconstrained faces by learning efficient H-CNN regressors. In: 2017 12th IEEE International Conference on Automatic Face Gesture Recognition (FG 2017), pp. 258–265, May 2017. https://doi.org/10.1109/FG.2017.149

  17. Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theor. 28(2), 129–137 (1982)

    Article  MathSciNet  Google Scholar 

  18. Koestinger, M., Wohlhart, P., Roth, P.M., Bischof, H.: Annotated facial landmarks in the wild: a large-scale, real-world database for facial landmark localization. In: Proceedings of First IEEE International Workshop on Benchmarking Facial Image Analysis Technologies (2011)

    Google Scholar 

  19. Mathias, M., Benenson, R., Pedersoli, M., Van Gool, L.: Face detection without bells and whistles. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 720–735. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10593-2_47

    Chapter  Google Scholar 

  20. Mukherjee, S.S., Robertson, N.M.: Deep head pose: gaze-direction estimation in multimodal video. IEEE Trans. Multimedia 17(11), 2094–2107 (2015)

    Article  Google Scholar 

  21. Murphy-Chutorian, E., Trivedi, M.M.: Head pose estimation in computer vision: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 31(4), 607–626 (2009)

    Article  Google Scholar 

  22. Osadchy, M., Cun, Y.L., Miller, M.L.: Synergistic face detection and pose estimation with energy-based models. J. Mach. Learn. Res. 8(May), 1197–1215 (2007)

    Google Scholar 

  23. Qin, H., Yan, J., Li, X., Hu, X.: Joint training of cascaded CNN for face detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3456–3465 (2016)

    Google Scholar 

  24. Ramanathan, S., Yan, Y., Staiano, J., Lanz, O., Sebe, N.: On the relationship between head pose, social attention and personality prediction for unstructured and dynamic group interactions. In: ICMI, pp. 3–10 (2013)

    Google Scholar 

  25. Ranjan, R., Patel, V.M., Chellappa, R.: Hyperface: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, p. 1 (2017). https://doi.org/10.1109/TPAMI.2017.2781233

    Article  Google Scholar 

  26. Ranjan, R., Patel, V.M., Chellappa, R.: A deep pyramid deformable part model for face detection. In: 2015 IEEE 7th International Conference on Biometrics Theory, Applications and Systems Biometrics Theory, Applications and Systems (BTAS), pp. 1–8. IEEE (2015)

    Google Scholar 

  27. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)

    Google Scholar 

  28. Shen, X., Lin, Z., Brandt, J., Wu, Y.: Detecting and aligning faces by image retrieval. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3460–3467 (2013)

    Google Scholar 

  29. Sherrah, J., Gong, S., Ong, E.J.: Face distributions in similarity space under varying head pose. Image Vis. Comput. 19(12), 807–819 (2001)

    Article  Google Scholar 

  30. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  31. Uijlings, J.R., Van De Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. Int. J. Comput. Vis. 104(2), 154–171 (2013)

    Article  Google Scholar 

  32. Xiang, Y., Choi, W., Lin, Y., Savarese, S.: Subcategory-aware convolutional neural networks for object proposals and detection. arXiv preprint arXiv:1604.04693 (2016)

  33. Yan, J., Zhang, X., Lei, Z., Li, S.Z.: Face detection by structural models. Image Vis. Comput. 32(10), 790–799 (2014)

    Article  Google Scholar 

  34. Yang, S., Luo, P., Loy, C.C., Tang, X.: From facial parts responses to face detection: a deep learning approach. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3676–3684 (2015)

    Google Scholar 

  35. Zafeiriou, S., Zhang, C., Zhang, Z.: A survey on face detection in the wild: past, present and future. Comput. Vis. Image Underst. 138, 1–24 (2015)

    Article  Google Scholar 

  36. Zhu, X., Ramanan, D.: Face detection, pose estimation, and landmark localization in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2879–2886. IEEE (2012)

    Google Scholar 

  37. Zhu, X., Ramanan, D.: FACEDPL: detection, pose estimation, and landmark localization in the wild. Preprint 1(2), 6 (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xu Zhao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, T., Zhao, X. (2019). Simultaneous Face Detection and Head Pose Estimation: A Fast and Unified Framework. In: Jawahar, C., Li, H., Mori, G., Schindler, K. (eds) Computer Vision – ACCV 2018. ACCV 2018. Lecture Notes in Computer Science(), vol 11361. Springer, Cham. https://doi.org/10.1007/978-3-030-20887-5_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-20887-5_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-20886-8

  • Online ISBN: 978-3-030-20887-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics