Skip to main content

Simple Fine-Tuning Attention Modules for Human Pose Estimation

  • Conference paper
  • First Online:

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1287))

Abstract

The convolution neural networks (CNNs) have achieved the best performance not only for human pose estimation but also for other computer vision tasks (e.g., object detection, semantic segmentation, image classification). Then this paper focuses on a useful attention module (AM) for feed-forward CNNs. Firstly, feed the feature map after a block in the backbone network into the attention module, split into two separate dimensions, channel and spatial. After that, the AM combines these two feature maps by multiplication and gives it to the next block in the backbone. The network can capture the information in the long-range dependencies (channel) and the spatial data, which can gain better performance in accuracy. Therefore, our experimental results will illustrate how different between when using the attention module and the existing methods. As a result, the predicted joint heatmap maintains the accuracy and spatially better with the simple baseline. Besides, the proposed architecture gains 1.0 points in AP higher than the baseline. Moreover, the proposed network trained on COCO 2017 benchmarks, which is an accessible dataset nowadays.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2D human pose estimation: new benchmark and state of the art analysis. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3686–3693 (2014)

    Google Scholar 

  2. Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields (2016)

    Google Scholar 

  3. Chen, C., Ramanan, D.: 3D human pose estimation = 2D pose estimation + matching. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5759–5767, July 2017. https://doi.org/10.1109/CVPR.2017.610

  4. Chou, C.J., Chien, J.T., Chen, H.T.: Self adversarial training for human pose estimation (2017)

    Google Scholar 

  5. Chu, X., Yang, W., Ouyang, W., Ma, C., Yuille, A.L., Wang, X.: Multi-context attention for human pose estimation (2017)

    Google Scholar 

  6. Dumoulin, V., Visin, F.: A guide to convolution arithmetic for deep learning (2016)

    Google Scholar 

  7. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN (2017)

    Google Scholar 

  8. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2015)

    Google Scholar 

  9. Hu, J., Shen, L., Albanie, S., Sun, G., Wu, E.: Squeeze-and-excitation networks (2017)

    Google Scholar 

  10. Hussain, Z., Sheng, M., Zhang, W.E.: Different approaches for human activity recognition: a survey (2019)

    Google Scholar 

  11. Indolia, S., Goswami, A., Mishra, S., Asopa, P.: Conceptual understanding of convolutional neural network- a deep learning approach. Proc. Comput. Sci. 132, 679–688 (2018). https://doi.org/10.1016/j.procs.2018.05.069

    Article  Google Scholar 

  12. Insafutdinov, E., Pishchulin, L., Andres, B., Andriluka, M., Schiele, B.: Deepercut: a deeper, stronger, and faster multi-person pose estimation model (2016)

    Google Scholar 

  13. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift (2015)

    Google Scholar 

  14. Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks (2015)

    Google Scholar 

  15. Kim, E., Helal, S., Cook, D.: Human activity recognition and pattern discovery. IEEE Pervasive Comput. 9(1), 48–53 (2010). https://doi.org/10.1109/MPRV.2010.7

    Article  Google Scholar 

  16. Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations, December 2014

    Google Scholar 

  17. Li, W., Zhao, R., Wang, X.: Human reidentification with transferred metric learning. In: Asian Conference on Computer Vision (ACCV), pp. 31–44, November 2012

    Google Scholar 

  18. Li, X., Wang, W., Hu, X., Yang, J.: Selective kernel networks (2019)

    Google Scholar 

  19. Lin, T., et al.: Microsoft COCO: common objects in context. CoRR abs/1405.0312 (2014). http://arxiv.org/abs/1405.0312

  20. Mastyło, M.: Bilinear interpolation theorems and applications. J. Funct. Anal. 265, 185–207 (2013). https://doi.org/10.1016/j.jfa.2013.05.001

    Article  MathSciNet  MATH  Google Scholar 

  21. Moon, G., Chang, J.Y., Lee, K.M.: Posefix: model-agnostic general human pose refinement network (2018)

    Google Scholar 

  22. Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. CoRR abs/1603.06937 (2016). http://arxiv.org/abs/1603.06937

  23. Ning, G., Zhang, Z., He, Z.: Knowledge-guided deep fractal neural networks for human pose estimation (2017)

    Google Scholar 

  24. Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation (2019)

    Google Scholar 

  25. Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.: Inception-v4, inception-resnet and the impact of residual connections on learning (2016)

    Google Scholar 

  26. Tang, Z., Peng, X., Geng, S., Wu, L., Zhang, S., Metaxas, D.: Quantized densely connected u-nets for efficient landmark localization (2018)

    Google Scholar 

  27. Toshev, A., Szegedy, C.: Deeppose: Human pose estimation via deep neural networks. CoRR abs/1312.4659 (2013). http://arxiv.org/abs/1312.4659

  28. Wang, X., Girshick, R.B., Gupta, A., He, K.: Non-local neural networks. CoRR abs/1711.07971 (2017). http://arxiv.org/abs/1711.07971

  29. Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines (2016)

    Google Scholar 

  30. Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: Cbam: convolutional block attention module (2018)

    Google Scholar 

  31. Xiao, B., Wu, H., Wei, Y.: Simple baselines for human pose estimation and tracking. CoRR abs/1804.06208 (2018). http://arxiv.org/abs/1804.06208

  32. Yang, X., Wang, M., Tao, D.: Person re-identification with metric learning using privileged information. CoRR abs/1904.05005 (2019). http://arxiv.org/abs/1904.05005

Download references

Acknowledgement

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government. (MSIT)(2020R1A2C2008972)

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kang-Hyun Jo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tran, TD., Vo, XT., Russo, MA., Jo, KH. (2020). Simple Fine-Tuning Attention Modules for Human Pose Estimation. In: Hernes, M., Wojtkiewicz, K., Szczerbicki, E. (eds) Advances in Computational Collective Intelligence. ICCCI 2020. Communications in Computer and Information Science, vol 1287. Springer, Cham. https://doi.org/10.1007/978-3-030-63119-2_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-63119-2_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-63118-5

  • Online ISBN: 978-3-030-63119-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics