Skip to main content

Improved Modular Convolution Neural Network for Human Pose Estimation

  • Conference paper
  • First Online:
Book cover E-Learning and Games (Edutainment 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11462))

Included in the following conference series:

  • 1236 Accesses

Abstract

Human pose estimation in image is an important branch of computer vision and graphics research. In this paper, an improved modular convolution neural network is proposed to solve the problem of human pose estimation in static 2D images. A cascaded three-stage full convolutional network (FCN) can learn the non-linear mapping from image feature space to human pose space in an end-to-end way. In order to improve the accuracy of predicting joints, the method of multi-feature source fusion is adopted to improve the estimation process of the human body posture. The first two stages of the network focus on learning local image features and joints neighborhood pixel features, and these features are merged in the third stage of the network. Finally, the coordinates of human joints are obtained by regression of the merged features. In our experiments, using the strict PCP criteria on the full body pose dataset LSP, the average prediction accuracy of our method is 79.3%. In addition, using the PCKh standard on the upper body pose dataset FLIC, our method achieves an average prediction accuracy of 93% without additional training.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://human-pose.mpi-inf.mpg.de/.

  2. 2.

    https://bensapp.github.io/flic-dataset.html.

  3. 3.

    http://sam.johnson.io/research/lsp.html.

  4. 4.

    http://sam.johnson.io/research/lspet.html.

  5. 5.

    http://human-pose.mpi-inf.mpg.de/#related_benchmarks.

References

  1. Cho, N., Yuille, A.L., Lee, S.: Adaptive occlusion state estimation for human pose tracking under self-occlusions. Pattern Recogn. 46(3), 649–661 (2013)

    Article  Google Scholar 

  2. Wang, C., Wang, Y., Yuille, A.L.: An approach to pose-based action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2013)

    Google Scholar 

  3. Ramakrishna, V., Munoz, D., Hebert, M., Andrew Bagnell, J., Sheikh, Y.: Pose machines: articulated pose estimation via inference machines. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 33–47. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10605-2_3

    Chapter  Google Scholar 

  4. Wei, S., et al.: Convolutional pose machines. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)

    Google Scholar 

  5. Ramanan, D., Forsyth, D.A., Zisserman, A.: Strike a pose: tracking people by finding stylized poses. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005. IEEE (2005)

    Google Scholar 

  6. Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008. IEEE (2008)

    Google Scholar 

  7. Eichner, M., Ferrari, V., Zurich, S.: Better appearance models for pictorial structures. In: BMVC 2009 (2009)

    Google Scholar 

  8. Yang, Y., Ramanan, D.: Articulated pose estimation with flexible mixtures-of-parts. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2011)

    Google Scholar 

  9. Dantone, M., et al.: Human pose estimation using body parts dependent joint regressors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2013)

    Google Scholar 

  10. Yang, Y., Ramanan, D.: Articulated human detection with flexible mixtures of parts. IEEE Trans. Pattern Anal. Mach. Intell. 35(12), 2878–2890 (2013)

    Article  Google Scholar 

  11. Johnson, S., Everingham, M.: Learning effective human pose estimation from inaccurate annotation, pp. 1465–1472 (2011)

    Google Scholar 

  12. Pishchulin, L., et al.: Strong appearance and expressive spatial models for human pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision (2013)

    Google Scholar 

  13. Bourdev, L., Malik, J.: Poselets: body part detectors trained using 3D human pose annotations. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 1365–1372. IEEE (2009)

    Google Scholar 

  14. Ouyang, W., Chu, X., Wang, X.: Multi-source deep learning for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2014)

    Google Scholar 

  15. Jain, A., et al.: Learning human pose estimation features with convolutional networks. arXiv preprint arXiv:1312.7302 (2013)

  16. Tompson, J., et al.: Efficient object localization using convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)

    Google Scholar 

  17. Chen, X., Yuille, A.L.: Articulated pose estimation by a graphical model with image dependent pairwise relations. In: Advances in Neural Information Processing Systems (2014)

    Google Scholar 

  18. Chen, X., Yuille, A.L.: Parsing occluded people by flexible compositions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)

    Google Scholar 

  19. Pfister, T., Charles, J., Zisserman, A.: Flowing convnets for human pose estimation in videos. In: Proceedings of the IEEE International Conference on Computer Vision (2015)

    Google Scholar 

  20. Pishchulin, L., et al.: Articulated people detection and pose estimation: reshaping the future. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2012)

    Google Scholar 

  21. Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding, pp. 675–678 (2014)

    Google Scholar 

  22. Wang, F., Li, Y.: Beyond physical connections: tree models in human pose estimation, pp. 596–603 (2013)

    Google Scholar 

  23. Tompson, J., et al.: Joint training of a convolutional network and a graphical model for human pose estimation. In: Advances in Neural Information Processing Systems (2014)

    Google Scholar 

  24. Fan, X., et al.: Combining local appearance and holistic view: dual-source deep neural networks for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)

    Google Scholar 

  25. Yang, W., et al.: End-to-end learning of deformable mixture of parts and deep convolutional neural networks for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)

    Google Scholar 

Download references

Acknowledgments

This work is supported by the National Natural Science Foundation of China (Nos. 61603066), Program for the Liaoning Distinguished Professor, the Hunan Provincial Natural Science Fund Project (No. 2015JJ6028); Excellent Youth Project of Hunan Education Department (No. 16B065); by the Science and Technology Innovation Fund of Dalian (No. 2018J12GX036), and by the High-level talent innovation support project of Dalian (No. 2017RD11); Equipment Pre-research Foundation for Key Laboratory of National Defense Science and Technology (No. 614222202040571).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Jing Dong or Dongsheng Zhou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, Z., Dong, J., Zhou, D., Fang, X., Wei, X. (2019). Improved Modular Convolution Neural Network for Human Pose Estimation. In: El Rhalibi, A., Pan, Z., Jin, H., Ding, D., Navarro-Newball, A., Wang, Y. (eds) E-Learning and Games. Edutainment 2018. Lecture Notes in Computer Science(), vol 11462. Springer, Cham. https://doi.org/10.1007/978-3-030-23712-7_53

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-23712-7_53

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-23711-0

  • Online ISBN: 978-3-030-23712-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics