Abstract
Human pose estimation in image is an important branch of computer vision and graphics research. In this paper, an improved modular convolution neural network is proposed to solve the problem of human pose estimation in static 2D images. A cascaded three-stage full convolutional network (FCN) can learn the non-linear mapping from image feature space to human pose space in an end-to-end way. In order to improve the accuracy of predicting joints, the method of multi-feature source fusion is adopted to improve the estimation process of the human body posture. The first two stages of the network focus on learning local image features and joints neighborhood pixel features, and these features are merged in the third stage of the network. Finally, the coordinates of human joints are obtained by regression of the merged features. In our experiments, using the strict PCP criteria on the full body pose dataset LSP, the average prediction accuracy of our method is 79.3%. In addition, using the PCKh standard on the upper body pose dataset FLIC, our method achieves an average prediction accuracy of 93% without additional training.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Cho, N., Yuille, A.L., Lee, S.: Adaptive occlusion state estimation for human pose tracking under self-occlusions. Pattern Recogn. 46(3), 649–661 (2013)
Wang, C., Wang, Y., Yuille, A.L.: An approach to pose-based action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2013)
Ramakrishna, V., Munoz, D., Hebert, M., Andrew Bagnell, J., Sheikh, Y.: Pose machines: articulated pose estimation via inference machines. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 33–47. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10605-2_3
Wei, S., et al.: Convolutional pose machines. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Ramanan, D., Forsyth, D.A., Zisserman, A.: Strike a pose: tracking people by finding stylized poses. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005. IEEE (2005)
Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008. IEEE (2008)
Eichner, M., Ferrari, V., Zurich, S.: Better appearance models for pictorial structures. In: BMVC 2009 (2009)
Yang, Y., Ramanan, D.: Articulated pose estimation with flexible mixtures-of-parts. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2011)
Dantone, M., et al.: Human pose estimation using body parts dependent joint regressors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2013)
Yang, Y., Ramanan, D.: Articulated human detection with flexible mixtures of parts. IEEE Trans. Pattern Anal. Mach. Intell. 35(12), 2878–2890 (2013)
Johnson, S., Everingham, M.: Learning effective human pose estimation from inaccurate annotation, pp. 1465–1472 (2011)
Pishchulin, L., et al.: Strong appearance and expressive spatial models for human pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision (2013)
Bourdev, L., Malik, J.: Poselets: body part detectors trained using 3D human pose annotations. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 1365–1372. IEEE (2009)
Ouyang, W., Chu, X., Wang, X.: Multi-source deep learning for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2014)
Jain, A., et al.: Learning human pose estimation features with convolutional networks. arXiv preprint arXiv:1312.7302 (2013)
Tompson, J., et al.: Efficient object localization using convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)
Chen, X., Yuille, A.L.: Articulated pose estimation by a graphical model with image dependent pairwise relations. In: Advances in Neural Information Processing Systems (2014)
Chen, X., Yuille, A.L.: Parsing occluded people by flexible compositions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)
Pfister, T., Charles, J., Zisserman, A.: Flowing convnets for human pose estimation in videos. In: Proceedings of the IEEE International Conference on Computer Vision (2015)
Pishchulin, L., et al.: Articulated people detection and pose estimation: reshaping the future. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2012)
Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding, pp. 675–678 (2014)
Wang, F., Li, Y.: Beyond physical connections: tree models in human pose estimation, pp. 596–603 (2013)
Tompson, J., et al.: Joint training of a convolutional network and a graphical model for human pose estimation. In: Advances in Neural Information Processing Systems (2014)
Fan, X., et al.: Combining local appearance and holistic view: dual-source deep neural networks for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)
Yang, W., et al.: End-to-end learning of deformable mixture of parts and deep convolutional neural networks for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Acknowledgments
This work is supported by the National Natural Science Foundation of China (Nos. 61603066), Program for the Liaoning Distinguished Professor, the Hunan Provincial Natural Science Fund Project (No. 2015JJ6028); Excellent Youth Project of Hunan Education Department (No. 16B065); by the Science and Technology Innovation Fund of Dalian (No. 2018J12GX036), and by the High-level talent innovation support project of Dalian (No. 2017RD11); Equipment Pre-research Foundation for Key Laboratory of National Defense Science and Technology (No. 614222202040571).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, Z., Dong, J., Zhou, D., Fang, X., Wei, X. (2019). Improved Modular Convolution Neural Network for Human Pose Estimation. In: El Rhalibi, A., Pan, Z., Jin, H., Ding, D., Navarro-Newball, A., Wang, Y. (eds) E-Learning and Games. Edutainment 2018. Lecture Notes in Computer Science(), vol 11462. Springer, Cham. https://doi.org/10.1007/978-3-030-23712-7_53
Download citation
DOI: https://doi.org/10.1007/978-3-030-23712-7_53
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-23711-0
Online ISBN: 978-3-030-23712-7
eBook Packages: Computer ScienceComputer Science (R0)