Abstract
Recently, deep learning approaches have achieved promising results in various fields of computer vision. In this paper, we tackle the problem of head pose estimation through a Convolutional Neural Network (CNN). Differently from other proposals in the literature, the described system is able to work directly and based only on raw depth data. Moreover, the head pose estimation is solved as a regression problem and does not rely on visual facial features like facial landmarks. We tested our system on a well known public dataset, Biwi Kinect Head Pose, showing that our approach achieves state-of-art results and is able to meet real time performance requirements.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The tool is written in Java and it is completely free and open source. It takes as input the JSON file produced by the Keras framework and generates image outputs in common formats such as png, jpeg or gif. We invite the readers to test and use this software, hoping it can help in deep learning studies and presentations. The code can be downloaded at the following link:
References
distraction.gov, official us government website for distracted driving. http://www.distraction.gov/index.html. Accessed 1 Sept 2016
Craye, C., Karray, F.: Driver distraction detection and recognition using RGB-D sensor. CoRR, vol. abs/1502.00250 (2015). http://arxiv.org/abs/1502.00250
Rahman, H., Begum, S., Ahmed, M.U.: Driver monitoring in the context of autonomous vehicle, November 2015. http://www.es.mdh.se/publications/4021-
Murphy-Chutorian, E., Trivedi, M.M.: Head pose estimation in computer vision: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 31(4), 607–626 (2009). https://doi.org/10.1109/TPAMI.2008.106
Fanelli, G., Gall, J., Van Gool, L.: Real time head pose estimation with random regression forests. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 617–624. IEEE (2011)
Ahn, B., Park, J., Kweon, I.S.: Real-time head orientation from a monocular camera using deep neural network. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014, Part III. LNCS, vol. 9005, pp. 82–96. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16811-1_6
Mukherjee, S.S., Robertson, N.M.: Deep head pose: gaze-direction estimation in multimodal video. IEEE Trans. Multimed. 17(11), 2094–2107 (2015)
Liu, X., Liang, W., Wang, Y., Li, S., Pei, M.: 3D head pose estimation with convolutional neural network trained on synthetic images. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 1289–1293. IEEE (2016)
Chen, J., Wu, J., Richter, K., Konrad, J., Ishwar, P.: Estimating head pose orientation using extremely low resolution images. In: IEEE Southwest Symposium on Image Analysis and Interpretation (SSIAI), pp. 65–68. IEEE (2016)
Drouard, V., Ba, S., Evangelidis, G., Deleforge, A., Horaud, R.: Head pose estimation via probabilistic high-dimensional regression. In: 2015 IEEE International Conference on Image Processing (ICIP), pp. 4624–4628. IEEE (2015)
Malassiotis, S., Strintzis, M.G.: Robust real-time 3D head pose estimation from range data. Pattern Recogn. 38(8), 1153–1165 (2005)
Breitenstein, M.D., Kuettel, D., Weise, T., Van Gool, L., Pfister, H.: Real-time face pose estimation from single range images. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008, pp. 1–8. IEEE (2008)
Kondori, F.A., Yousefi, S., Li, H., Sonning, S., Sonning, S.: 3D head pose estimation using the kinect. In: 2011 International Conference on Wireless Communications and Signal Processing (WCSP), pp. 1–4. IEEE (2011)
Padeleris, P., Zabulis, X., Argyros, A.A.: Head pose estimation on depth data based on particle swarm optimization. In: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 42–49. IEEE (2012)
Papazov, C., Marks, T.K., Jones, M.: Real-time 3D head pose and facial landmark estimation from depth images using triangular surface patch features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4722–4730 (2015)
Seemann, E., Nickel, K., Stiefelhagen, R.: Head pose estimation using stereo vision for human-robot interaction. In: FGR, pp. 626–631. IEEE Computer Society (2004). http://dblp.uni-trier.de/db/conf/fgr/fgr2004.html
Bleiweiss, A., Werman, M.: Robust head pose estimation by fusing time-of-flight depth and color. In: 2010 IEEE International Workshop on Multimedia Signal Processing (MMSP), pp. 116–121. IEEE (2010)
Baltrušaitis, T., Robinson, P., Morency, L.-P.: 3D constrained local model for rigid and non-rigid facial tracking. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2610–2617. IEEE (2012)
Yang, J., Liang, W., Jia, Y.: Face pose estimation with combined 2D and 3D hog features. In: 2012 21st International Conference on Pattern Recognition (ICPR), pp. 2492–2495. IEEE (2012)
Saeed, A., Al-Hamadi, A.: Boosted human head pose estimation using Kinect camera. In: 2015 IEEE International Conference on Image Processing (ICIP), pp. 1752–1756. IEEE (2015)
Ghiass, R.S., Arandjelović, O., Laurendeau, D.: Highly accurate and fully automatic head pose estimation from a low quality consumer-level RGB-D sensor. In: Proceedings of the 2nd Workshop on Computational Models of Social Interactions: Human-Computer-Media Communication, pp. 25–34. ACM (2015)
Viola, P., Jones, M.J.: Robust real-time face detection. Int. J. Comput. Vis. 57(2), 137–154 (2004)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Fanelli, G., Dantone, M., Gall, J., Fossati, A., Van Gool, L.: Random forests for real time 3D face analysis. Int. J. Comput. Vis. 101(3), 437–458 (2013)
Nuevo, J., Bergasa, L.M., Jiménez, P.: RSMAT: robust simultaneous modeling and tracking. Pattern Recogn. Lett. 31, 2455–2463 (2010). https://doi.org/10.1016/j.patrec.2010.07.016
Bagdanov, A.D., Masi, I., Del Bimbo, A.: The florence 2D/3D hybrid face datset. In: Proceedings of ACM Multimedia International Workshop on Multimedia Access to 3D Human Objects (MA3HO 2011). ACM Press, December 2011
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Venturelli, M., Borghi, G., Vezzani, R., Cucchiara, R. (2018). Deep Head Pose Estimation from Depth Data for In-Car Automotive Applications. In: Wannous, H., Pala, P., Daoudi, M., Flórez-Revuelta, F. (eds) Understanding Human Activities Through 3D Sensors. UHA3DS 2016. Lecture Notes in Computer Science(), vol 10188. Springer, Cham. https://doi.org/10.1007/978-3-319-91863-1_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-91863-1_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91862-4
Online ISBN: 978-3-319-91863-1
eBook Packages: Computer ScienceComputer Science (R0)