Skip to main content

Advertisement

Log in

More accurate heatmap generation method for human pose estimation

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

Human pose estimation plays a crucial role in computer vision, such as understanding body language and tracking behavior. We observed that neural networks trained to generate heatmaps of human joints often produce blurred outputs that lacks a well-defined Gaussian structure. Despite this, existing methods largely prioritize network architecture innovations, neglecting heatmap generation itself. In light of the discovered importance, we propose a novel approach that incorporates a visual center module and a heatmap enhancer to improve existing human pose estimation methods. First, we extract features from the backbone network (any model based on convolutional neural networks) at different depths. The visual center module is then used to capture the global long-range dependencies and the cross-channel message of these features, which facilitates the optimal generation of the heatmap. Finally, the heatmap value distribution is adjusted using the heatmap enhancer. The heatmap enhancer can handle multiple peaks around the maximum activation through a Gaussian filter, allowing the heatmap to achieve accurate localization of the body’s joint points. We combine our module with current mainstream human pose estimation methods for experiments. The experimental results show that the proposed method has achieved good results on the two benchmark datasets of MSCOCO and MPII.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availibility

The datasets generated during and/or analysed during the current study are available in the COCO and MPII repository, https://cocodataset.org/http://human-pose.mpi-inf.mpg.de/.

References

  1. Chen, H., Feng, R., Wu, S., Xu, H., Zhou, F., Liu, Z.: 2d human pose estimation: a survey. Multimed. Syst. 29(5), 3115–3138 (2023). https://doi.org/10.1007/s00530-022-01019-0

    Article  Google Scholar 

  2. Chen, Y., Tian, Y., He, M.: Monocular human pose estimation: a survey of deep learning-based methods. Comput. Vis. Image Underst. 192, 102897 (2020). https://doi.org/10.1016/j.cviu.2019.102897

    Article  Google Scholar 

  3. Dubey, S., Dixit, M.: A comprehensive survey on human pose estimation approaches. Multimed. Syst. 29(1), 167–195 (2023). https://doi.org/10.1007/s00530-022-00980-0

    Article  Google Scholar 

  4. Andriluka, M., Roth, S., Schiele, B.: Pictorial structures revisited: people detection and articulated pose estimation. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1014–1021. IEEE (2009). https://doi.org/10.1109/cvpr.2009.5206754

  5. Johnson, S., Everingham, M.: Clustered pose and nonlinear appearance models for human pose estimation. In: Bmvc, vol. 2, p. 5. Aberystwyth, UK (2010). https://doi.org/10.5244/c.24.12

  6. Voulodimos, A., Doulamis, N., Doulamis, A., Protopapadakis, E., et al.: Deep learning for computer vision: a brief review. Comput. Intell. Neurosci. (2018). https://doi.org/10.1155/2018/7068349

    Article  Google Scholar 

  7. Jain, A., Tompson, J., Andriluka, M., Taylor, G.W., Bregler, C.: Learning human pose estimation features with convolutional networks (2013). https://doi.org/10.48550/arXiv.1312.7302. arXiv preprint arXiv:1312.7302

  8. Luvizon, D.C., Tabia, H., Picard, D.: Human pose regression by combining indirect part detection and contextual information. Comput. Graph. 85, 15–22 (2019). https://doi.org/10.1016/j.cag.2019.09.002

    Article  Google Scholar 

  9. Sun, X., Shang, J., Liang, S., Wei, Y.: Compositional human pose regression. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2602–2611 (2017). https://doi.org/10.1016/j.cviu.2018.10.006

  10. Yang, W., Li, S., Ouyang, W., Li, H., Wang, X.: Learning feature pyramids for human pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1281–1290 (2017). https://doi.org/10.1109/ICCV.2017.144

  11. Wei, S.-E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4724–4732 (2016). https://doi.org/10.1109/CVPR.2016.511

  12. Chen, Y., Wang, Z., Peng, Y., Zhang, Z., Yu, G., Sun, J.: Cascaded pyramid network for multi-person pose estimation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7103–7112 (2018). https://doi.org/10.1109/CVPR.2018.00742

  13. Xue, N., Wu, T., Xia, G.-S., Zhang, L.: Learning local-global contextual adaptation for multi-person pose estimation. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13055–13064 (2022). https://doi.org/10.1109/CVPR52688.2022.01272

  14. Diller, C., Funkhouser, T., Dai, A.: Forecasting characteristic 3d poses of human actions. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 15893–15902 (2022). https://doi.org/10.1109/CVPR52688.2022.01545

  15. Zhao, Q., Zheng, C., Liu, M., Wang, P., Chen, C.: Poseformerv2: exploring frequency domain for efficient and robust 3d human pose estimation. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8877–8886 (2023). https://doi.org/10.1109/CVPR52729.2023.00857

  16. Tompson, J., Jain, A., LeCun, Y., Bregler, C.: Joint training of a convolutional network and a graphical model for human pose estimation. In: NIPS’14, pp. 1799–1807. MIT Press, Cambridge (2014). https://doi.org/10.5555/2968826.2969027

  17. Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Computer Vision—ECCV 2016, pp. 483–499. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_29

  18. Ke, L., Chang, M.-C., Qi, H., Lyu, S.: Multi-scale structure-aware network for human pose estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision—ECCV 2018, pp. 731–746. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01216-8_44

    Chapter  Google Scholar 

  19. Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5686–5696 (2019). https://doi.org/10.1109/CVPR.2019.00584

  20. Zhang, F., Zhu, X., Ye, M.: Fast human pose estimation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3512–3521 (2019). https://doi.org/10.1109/CVPR.2019.00363

  21. Cao, Z., Hidalgo, G., Simon, T., Wei, S.-E., Sheikh, Y.: Openpose: realtime multi-person 2d pose estimation using part affinity fields. IEEE Trans. Pattern Anal. Mach. Intell. 43(1), 172–186 (2021). https://doi.org/10.1109/TPAMI.2019.2929257

    Article  Google Scholar 

  22. Cheng, B., Xiao, B., Wang, J., Shi, H., Huang, T.S., Zhang, L.: Higherhrnet: scale-aware representation learning for bottom-up human pose estimation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5385–5394 (2020). https://doi.org/10.1109/CVPR42600.2020.00543

  23. Chen, C.-H., Ramanan, D.: 3d human pose estimation = 2d pose estimation + matching. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5759–5767 (2017). https://doi.org/10.1109/CVPR.2017.610

  24. Ma, X., Su, J., Wang, C., Zhu, W., Wang, Y.: 3d human mesh estimation from virtual markers. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 534–543 (2023). https://doi.org/10.1109/CVPR52729.2023.00059

  25. Wang, Z., Nie, X., Qu, X., Chen, Y., Liu, S.: Distribution-aware single-stage models for multi-person 3d pose estimation. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13086–13095 (2022). https://doi.org/10.1109/CVPR52688.2022.01275

  26. Tolstikhin, I.O., Houlsby, N., Kolesnikov, A., Beyer, L., Zhai, X., Unterthiner, T., Yung, J., Steiner, A., Keysers, D., Uszkoreit, J., Lucic, M., Dosovitskiy, A.: Mlp-mixer: an all-mlp architecture for vision. In: Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P.S., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems, vol. 34, pp. 24261–24272. Curran Associates, Inc. (2021)

    Google Scholar 

  27. Ding, X., Xia, C., Zhang, X., Chu, X., Han, J., Ding, G.: Repmlp: re-parameterizing convolutions into fully-connected layers for image recognition (2021). arXiv preprint arXiv:2105.01883

  28. Yu, W., Luo, M., Zhou, P., Si, C., Zhou, Y., Wang, X., Feng, J., Yan, S.: Metaformer is actually what you need for vision. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10809–10819 (2022). https://doi.org/10.1109/CVPR52688.2022.01055

  29. Sun, X., Xiao, B., Wei, F., Liang, S., Wei, Y.: Integral human pose regression. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision—ECCV 2018, pp. 536–553. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01231-1_33

    Chapter  Google Scholar 

  30. Mao, W., Ge, Y., Shen, C., Tian, Z., Wang, X., Wang, Z.: Tfpose: direct human pose estimation with transformers (2021). arXiv preprint arXiv:2103.15320

  31. Toshev, A., Szegedy, C.: Deeppose: human pose estimation via deep neural networks. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1653–1660 (2014). https://doi.org/10.1109/CVPR.2014.214

  32. Hu, J., Shen, L., Albanie, S., Sun, G., Wu, E.: Squeeze-and-excitation networks. IEEE Trans. Pattern Anal. Mach. Intell. 42(8), 2011–2023 (2020). https://doi.org/10.1109/TPAMI.2019.2913372

    Article  Google Scholar 

  33. Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 315–323 (2011) (JMLR Workshop and Conference Proceedings). https://doi.org/10.1109/IWAENC.2016.7602891

  34. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90

  35. Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: ultra-deep neural networks without residuals (2016). arXiv preprint arXiv:1605.07648

  36. Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) Computer Vision—ECCV 2014, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  37. Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2d human pose estimation: New benchmark and state of the art analysis. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014)

  38. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization (2014). arXiv preprint arXiv:1412.6980

  39. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)

  40. Xiao, B., Wu, H., Wei, Y.: Simple baselines for human pose estimation and tracking. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision—ECCV 2018, pp. 472–487. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01231-1_29

    Chapter  Google Scholar 

  41. Li, S., Wang, Z., Liu, Z., Tan, C., Lin, H., Wu, D., Chen, Z., Zheng, J., Li, S.Z.: Efficient multi-order gated aggregation network (2022). https://doi.org/10.48550/arXiv.2211.03295. arXiv preprint arXiv:2211.03295

  42. Zhang, J., Chen, Z., Tao, D.: Towards high performance human keypoint detection. Int. J. Comput. Vis. 129(9), 2639–2662 (2021). https://doi.org/10.1007/s11263-021-01482-8

    Article  Google Scholar 

  43. Xie, Z., Geng, Z., Hu, J., Zhang, Z., Hu, H., Cao, Y.: Revealing the dark secrets of masked image modeling. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14475–14485 (2023). https://doi.org/10.1109/CVPR52729.2023.01391

  44. Geng, Z., Wang, C., Wei, Y., Liu, Z., Li, H., Hu, H.: Human pose as compositional tokens. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 660–671 (2023). https://doi.org/10.1109/CVPR52729.2023.00071

  45. Liu, H., Liu, F., Fan, X., Huang, D.: Polarized self-attention: towards high-quality pixel-wise mapping. Neurocomputing 506, 158–167 (2022)

    Article  Google Scholar 

  46. Li, K., Wang, S., Zhang, X., Xu, Y., Xu, W., Tu, Z.: Pose recognition with cascade transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1944–1953 (2021). https://doi.org/10.1109/CVPR46437.2021.00198

  47. Li, Y., Zhang, S., Wang, Z., Yang, S., Yang, W., Xia, S.-T., Zhou, E.: Tokenpose: learning keypoint tokens for human pose estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11313–11322 (2021). https://doi.org/10.1109/ICCV48922.2021.01112

  48. Mao, W., Ge, Y., Shen, C., Tian, Z., Wang, X., Wang, Z.: Tfpose: direct human pose estimation with transformers (2021). https://doi.org/10.48550/arXiv.2103.15320. arXiv preprint arXiv:2103.15320

  49. Zhang, F., Zhu, X., Dai, H., Ye, M., Zhu, C.: Distribution-aware coordinate representation for human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7093–7102 (2020). https://doi.org/10.1109/CVPR42600.2020.00712

Download references

Acknowledgements

The research was supported by the National Natural Science Foundation of China under Grant 62267007, Gansu Provincial Department of Education Higher Education Industry Support Plan Project under Grant 2022CYZC-16.

Author information

Authors and Affiliations

Authors

Contributions

Hengrui Zhang wrote the main manuscript text. Jia Liu is responsible for proofreading. Qi Yongfeng is responsible for assisting.

Corresponding author

Correspondence to Hengrui Zhang.

Ethics declarations

Conflict of interest

The authors declare that they have no Conflict of interest.

Additional information

Communicated by Chenggang Yan.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qi, Y., Zhang, H. & Liu, J. More accurate heatmap generation method for human pose estimation. Multimedia Systems 30, 180 (2024). https://doi.org/10.1007/s00530-024-01390-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00530-024-01390-0

Keywords