Abstract
In real-world applications, factors such as illumination, occlusion, and poor image quality, etc. make robust head pose estimation much more challenging. In this paper, a novel deep transfer feature based on convolutional neural forest method (D-CNF) is proposed for head pose estimation. Deep transfer features are extracted from facial patches by a transfer network model, firstly. Then, a D-CNF is devised to integrate random trees with the representation learning from deep convolutional neural networks for robust head pose estimation. In the learning process, we introduce a neurally connected split function (NCSF) as the node splitting strategy in a convolutional neural tree. Experiments were conducted using public Pointing’04, BU3D-HP and CCNU-HP facial datasets. Compared to the state-of-the-art methods, the proposed method achieved much improved performance and great robustness with an average accuracy of 98.99% on BU3D-HP dataset, 95.7% on Pointing’04 and 82.46% on CCNU-HP dataset. In addition, in contrast to deep neural networks which require large-scale training data, our method performs well even when there are only a small amount of training data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ahn, B., Park, J., Kweon, I.S.: Real-time head orientation from a monocular camera using deep neural network. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9005, pp. 82–96. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16811-1_6
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Bulo, S.R., Kontschieder, P.: Neural decision forests for semantic image labeling. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 81–88 (2014)
Chu, X., Ouyang, W., Li, H., Wang, X.: Structured feature learning for pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4715–4723 (2016)
Dantone, M., Gall, J., Fanelli, G., Van Gool, L.: Real-time facial feature detection using conditional regression forests. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2578–2585. IEEE (2012)
Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., Darrell, T.: Decaf: a deep convolutional activation feature for generic visual recognition. In: ICML, vol. 32, 647–655 (2014)
Fanelli, G., Yao, A., Noel, P.-L., Gall, J., Van Gool, L.: Hough forest-based facial expression recognition from video sequences. In: Kutulakos, K.N. (ed.) ECCV 2010. LNCS, vol. 6553, pp. 195–206. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35749-7_15
García-Montero, M., Redondo-Cabrera, C., López-Sastre, R., Tuytelaars, T.: Fast head pose estimation for human-computer interaction. In: Paredes, R., Cardoso, J.S., Pardo, X.M. (eds.) IbPRIA 2015. LNCS, vol. 9117, pp. 101–110. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19390-8_12
Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Gourier, N., Hall, D., Crowley, J.: Estimating face orientation from robust detection of salient facial features in pointing. In: International Conference on Pattern Recognition Workshop on Visual Observation of Deictic Gestures, pp. 1379–1382 (2004)
Insafutdinov, E., Pishchulin, L., Andres, B., Andriluka, M., Schiele, B.: DeeperCut: a deeper, stronger, and faster multi-person pose estimation model. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 34–50. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_3
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Proceedings of the 22nd ACM International Conference on Multimedia
Wu, J., Trivedi, M.M.: A two-stage head pose estimation framework and evaluation. Pattern Recogn. 41, 1138–1158 (2008)
Kim, H., Sohn, M., Kim, D., Lee, S.: Kernel locality-constrained sparse coding for head pose estimation. IET Comput. Vis. 10(8), 828–835 (2016)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Liu, X., Liang, W., Wang, Y., Li, S., Pei, M.: 3D head pose estimation with convolutional neural network trained on synthetic images. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 1289–1293. IEEE (2016)
Liu, Y., Chen, J., Shu, Z., Luo, Z., Liu, L., Zhang, K.: Robust head pose estimation using dirichlet-tree distribution enhanced random forests. Neurocomputing 173, 42–53 (2016)
Liu, Y., Xie, Z., Yuan, X., Chen, J., Song, W.: Multi-level structured hybrid forest for joint head detection and pose estimation. Neurocomputing 266, 206–215 (2017)
Ma, B., Li, A., Chai, X., Shan, S.: CovGa: a novel descriptor based on symmetry of regions for head pose estimation. Neurocomputing 143, 97–108 (2014)
Mukherjee, S.S., Robertson, N.M.: Deep head pose: gaze-direction estimation in multimodal video. IEEE Trans. Multimedia 17(11), 2094–2107 (2015)
Murphy-Chutorian, E., Trivedi, M.M.: Head pose estimation in computer vision: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 31(4), 607–626 (2009)
Orozco, J., Gong, S., Xiang, T.: Head pose classification in crowded scenes. In: British Machine Vision Conference, London, UK, pp. 1–3, 7–10 September 2009
Parkhi, O.M., Vedaldi, A., Zisserman, A.: Deep face recognition. In: BMVC, vol. 1, p. 6 (2015)
Patacchiola, M., Cangelosi, A.: Head pose estimation in the wild using convolutional neural networks and adaptive gradient methods. Pattern Recogn. 71, 132–143 (2017)
Ranjan, R., Patel, V.M., Chellappa, R.: Hyperface: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. arXiv preprint arXiv:1603.01249 (2016)
Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: XNOR-Net: imagenet classification using binary convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 525–542. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_32
Schwarz, A., Lin, Z., Stiefelhagen, R.: HeHOP: highly efficient head orientation and position estimation. In: 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1–8. IEEE (2016)
Wu, S., Kan, M., He, Z., Shan, S., Chen, X.: Funnel-structured cascade for multi-view face detection with alignment-awareness. Neurocomputing 221, 138–145 (2017)
Xin, G., Xia, Y.: Head pose estimation based on multivariate label distribution. In: IEEE Conference on Computer Vision and Pattern Recognition, Ohio, USA, pp. 1837–1842, 24–27 June 2014
Xu, X., Kakadiaris, I.A.: Joint head pose estimation and face alignment framework using global and local CNN features. In: Proceedings of the 12th IEEE Conference on Automatic Face and Gesture Recognition, Washington, DC, vol. 2 (2017)
Yin, L., Wei, X., Sun, Y., Wang, J., Rosato, M.J.: A 3D facial expression database for facial behavior research. In: 2006 7th International Conference on Automatic Face and Gesture Recognition, FGR 2006, pp. 211–216. IEEE (2006)
Zhang, T., Zheng, W., Cui, Z., Zong, Y., Yan, J., Yan, K.: A deep neural network-driven feature learning method for multi-view facial expression recognition. IEEE Trans. Multimedia 18(12), 2528–2536 (2016)
Zheng, W.: Multi-view facial expression recognition based on group sparse reduced-rank regression. IEEE Trans. Affect. Comput. 5(1), 71–85 (2014)
Acknowledgments
This work was supported by the National Natural Science Foundation of China (No. 61602429), China Postdoctoral Science Foundation (No. 2016M592406), and Research Funds of CUG from the Colleges Basic Research and Operation of MOE (No. 26420160055).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Liu, Y., Xie, Z., Gong, X., Fang, F. (2018). Deep Transfer Feature Based Convolutional Neural Forests for Head Pose Estimation. In: Satoh, S. (eds) Image and Video Technology. PSIVT 2017. Lecture Notes in Computer Science(), vol 10799. Springer, Cham. https://doi.org/10.1007/978-3-319-92753-4_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-92753-4_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-92752-7
Online ISBN: 978-3-319-92753-4
eBook Packages: Computer ScienceComputer Science (R0)