Improved Modular Convolution Neural Network for Human Pose Estimation

Zhang, Zhengxuan; Dong, Jing; Zhou, Dongsheng; Fang, Xiaoyong; Wei, Xiaopeng

doi:10.1007/978-3-030-23712-7_53

Zhengxuan Zhang¹⁴,
Jing Dong¹⁴,
Dongsheng Zhou¹⁴,
Xiaoyong Fang¹⁶ &
…
Xiaopeng Wei¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11462))

Included in the following conference series:

International Conference on E-Learning and Games

1236 Accesses

Abstract

Human pose estimation in image is an important branch of computer vision and graphics research. In this paper, an improved modular convolution neural network is proposed to solve the problem of human pose estimation in static 2D images. A cascaded three-stage full convolutional network (FCN) can learn the non-linear mapping from image feature space to human pose space in an end-to-end way. In order to improve the accuracy of predicting joints, the method of multi-feature source fusion is adopted to improve the estimation process of the human body posture. The first two stages of the network focus on learning local image features and joints neighborhood pixel features, and these features are merged in the third stage of the network. Finally, the coordinates of human joints are obtained by regression of the merged features. In our experiments, using the strict PCP criteria on the full body pose dataset LSP, the average prediction accuracy of our method is 79.3%. In addition, using the PCKh standard on the upper body pose dataset FLIC, our method achieves an average prediction accuracy of 93% without additional training.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Cho, N., Yuille, A.L., Lee, S.: Adaptive occlusion state estimation for human pose tracking under self-occlusions. Pattern Recogn. 46(3), 649–661 (2013)
Article Google Scholar
Wang, C., Wang, Y., Yuille, A.L.: An approach to pose-based action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2013)
Google Scholar
Ramakrishna, V., Munoz, D., Hebert, M., Andrew Bagnell, J., Sheikh, Y.: Pose machines: articulated pose estimation via inference machines. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 33–47. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10605-2_3
Chapter Google Scholar
Wei, S., et al.: Convolutional pose machines. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Google Scholar
Ramanan, D., Forsyth, D.A., Zisserman, A.: Strike a pose: tracking people by finding stylized poses. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005. IEEE (2005)
Google Scholar
Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008. IEEE (2008)
Google Scholar
Eichner, M., Ferrari, V., Zurich, S.: Better appearance models for pictorial structures. In: BMVC 2009 (2009)
Google Scholar
Yang, Y., Ramanan, D.: Articulated pose estimation with flexible mixtures-of-parts. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2011)
Google Scholar
Dantone, M., et al.: Human pose estimation using body parts dependent joint regressors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2013)
Google Scholar
Yang, Y., Ramanan, D.: Articulated human detection with flexible mixtures of parts. IEEE Trans. Pattern Anal. Mach. Intell. 35(12), 2878–2890 (2013)
Article Google Scholar
Johnson, S., Everingham, M.: Learning effective human pose estimation from inaccurate annotation, pp. 1465–1472 (2011)
Google Scholar
Pishchulin, L., et al.: Strong appearance and expressive spatial models for human pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision (2013)
Google Scholar
Bourdev, L., Malik, J.: Poselets: body part detectors trained using 3D human pose annotations. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 1365–1372. IEEE (2009)
Google Scholar
Ouyang, W., Chu, X., Wang, X.: Multi-source deep learning for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2014)
Google Scholar
Jain, A., et al.: Learning human pose estimation features with convolutional networks. arXiv preprint arXiv:1312.7302 (2013)
Tompson, J., et al.: Efficient object localization using convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)
Google Scholar
Chen, X., Yuille, A.L.: Articulated pose estimation by a graphical model with image dependent pairwise relations. In: Advances in Neural Information Processing Systems (2014)
Google Scholar
Chen, X., Yuille, A.L.: Parsing occluded people by flexible compositions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)
Google Scholar
Pfister, T., Charles, J., Zisserman, A.: Flowing convnets for human pose estimation in videos. In: Proceedings of the IEEE International Conference on Computer Vision (2015)
Google Scholar
Pishchulin, L., et al.: Articulated people detection and pose estimation: reshaping the future. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2012)
Google Scholar
Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding, pp. 675–678 (2014)
Google Scholar
Wang, F., Li, Y.: Beyond physical connections: tree models in human pose estimation, pp. 596–603 (2013)
Google Scholar
Tompson, J., et al.: Joint training of a convolutional network and a graphical model for human pose estimation. In: Advances in Neural Information Processing Systems (2014)
Google Scholar
Fan, X., et al.: Combining local appearance and holistic view: dual-source deep neural networks for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)
Google Scholar
Yang, W., et al.: End-to-end learning of deformable mixture of parts and deep convolutional neural networks for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Google Scholar

Download references

Acknowledgments

This work is supported by the National Natural Science Foundation of China (Nos. 61603066), Program for the Liaoning Distinguished Professor, the Hunan Provincial Natural Science Fund Project (No. 2015JJ6028); Excellent Youth Project of Hunan Education Department (No. 16B065); by the Science and Technology Innovation Fund of Dalian (No. 2018J12GX036), and by the High-level talent innovation support project of Dalian (No. 2017RD11); Equipment Pre-research Foundation for Key Laboratory of National Defense Science and Technology (No. 614222202040571).

Author information

Authors and Affiliations

Key Laboratory of Advanced Design and Intelligent Computing, Dalian University, Ministry of Education, Dalian, 116622, China
Zhengxuan Zhang, Jing Dong & Dongsheng Zhou
School of Computer Science and Technology, Dalian University of Technology, Dalian, 1160243, China
Xiaopeng Wei
Research Institute of Human, Factors and Safety Engineering, Hunan Institute of Technology, Hengyang, 421002, Hunan Province, China
Xiaoyong Fang

Authors

Zhengxuan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jing Dong
View author publications
You can also search for this author in PubMed Google Scholar
Dongsheng Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyong Fang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaopeng Wei
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Jing Dong or Dongsheng Zhou .

Editor information

Editors and Affiliations

Liverpool John Moores University, Liverpool, UK
Abdennour El Rhalibi
Hangzhou Normal University, Hangzhou, China
Zhigeng Pan
Xi’an University of Technology, Xi’an, China
Haiyan Jin
Hangzhou Normal University, Hangzhou, China
Dandan Ding
Pontificia Universidad Javeriana, Cali, Colombia
Andres A. Navarro-Newball
Xi’an University of Technology, Xi’an, China
Yinghui Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, Z., Dong, J., Zhou, D., Fang, X., Wei, X. (2019). Improved Modular Convolution Neural Network for Human Pose Estimation. In: El Rhalibi, A., Pan, Z., Jin, H., Ding, D., Navarro-Newball, A., Wang, Y. (eds) E-Learning and Games. Edutainment 2018. Lecture Notes in Computer Science(), vol 11462. Springer, Cham. https://doi.org/10.1007/978-3-030-23712-7_53

Download citation

DOI: https://doi.org/10.1007/978-3-030-23712-7_53
Published: 17 July 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-23711-0
Online ISBN: 978-3-030-23712-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics