Skip to main content
Log in

Multi-person pose estimation based on graph grouping optimization

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Multi-person pose estimation has been an increasingly popular topic with the advancements of all kinds of computer vision and human-machine interaction tasks. This study field could further enhance the understanding of human poses and activities. The current mainstream multi-person pose estimation methods are generally divided into two categories: top-down and bottom-up methods. Although top-down methods are capable of achieving better performance by simplifying the problem to single-person pose estimation, while this strategy somewhat greatly increases the time complexity as a trade-off for better accuracy. The bottom-up methods could directly locate all the keypoints in the image, which can be potentially more effective and can be made real-time. However, most of the current bottom-up methods have separated the detection and grouping of keypoints into two independent steps. This greatly hindered the overall performance and computation efficiency of the algorithms. To address this issue, our study proposes an end-to-end bottom-up framework for multi-person pose estimation. Using the HRNet as the backbone structure, we add a deconvolution module to acquire high-resolution feature maps in the keypoints proposal stage. The graph neural network is leveraged in the grouping stage, which is integrated to the backbone so that the whole framework can be trained in an end-to-end manner. Using the keypoint candidates as nodes, two discriminators are exploited to supervise the grouping process. Lastly, a graph-based pose optimization algorithm is explored to refine the results. Experiments on the COCO and CrowdPose datasets show that our method achieves better accuracy and greatly reduce the computation time as well.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Cao Z, Simon T, Wei SE., Sheikh Y (2017) Realtime multi-person 2d pose estimation using part affinity fields. In Proceedings of the IEEE conference on computer vision and pattern recognition. pp 7291–7299

  2. Chen Y, Wang Z, Peng Y, Zhang Z, Yu G Sun J (2018) Cascaded pyramid network for multi-person pose estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition. pp 7103–7112

  3. Chen Y, Rohrbach M, Yan Z, Shuicheng Y, Feng J, Kalantidis Y (2019) Graph-based global reasoning networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 433–442

  4. Cheng B, Xiao B, Wang J, Shi H, Huang TS, Zhang L (2020) Higherhrnet: Scale-aware representation learning for bottom-up human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5386–5395

  5. Dhillon IS, Guan Y, Kulis B (2007) Weighted graph cuts without eigenvectors a multilevel approach. IEEE Trans Pattern Anal Mach Intell 29(11):1944–1957

    Article  Google Scholar 

  6. Duvenaud DK, Maclaurin D, Iparraguirre J, Bombarell R, Hirzel T, Aspuru-Guzik A, Adams RP (2015) Convolutional networks on graphs for learning molecular fingerprints. Adv Neural Inf Proces Syst 28:2224–2232

    Google Scholar 

  7. Estrach JB, Zaremba W, Szlam A, LeCun Y (2014) Spectral networks and deep locally connected networks on graphs. In 2nd International Conference on Learning Representations. pp 1–14

  8. Fang HS, Xie S, Tai Y W, Lu C (2017) Rmpe: Regional multi-person pose estimation. In Proceedings of the IEEE international conference on computer vision. 2334–2343

  9. Gori M, Monfardini G, Scarselli F (2005) A new model for learning in graph domains. In Proceedings. 2005 IEEE International Joint Conference on Neural Networks. 2: 729–734

  10. He K, Gkioxari G, Doll’ar P and Girshick R (2017) Mask r-cnn. In Proceedings of the IEEE international conference on computer vision. 2961–2969

  11. Huang S, Gong M, Tao D (2017) A coarse-fine network for keypoint localization. In Proceedings of the IEEE international conference on computer vision. 3028–3037

  12. Jin S, Liu W, Ouyang W, Qian C (2019) Multi-person articulated tracking with spatial and temporal embeddings. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5664–5673

  13. Kipf TN, Welling M (2017) Semi-supervised classification with graph convolutional networks. In 5th International Conference on Learning Representations. 1–14

  14. Kreiss S, Bertoni L & Alahi A (2019) Pifpaf: Composite fields for human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11977–11986

  15. Li J, Wang C, Zhu H, Mao Y, Fang HS, Lu C (2019) Crowdpose: Efficient crowded scenes pose estimation and a new benchmark. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 10863–10872

  16. Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, ... Zitnick CL (2014) Microsoft coco: common objects in context. In European conference on computer vision. Springer, Cham. pp. 740–755

  17. Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In European conference on computer vision. Springer, Cham. 483–499

  18. Newell A, Huang Z, Deng J (2017) Associative embedding: end-to-end learning for joint detection and grouping. Adv Neural Inf Proces Syst 2017:2278–2288

    Google Scholar 

  19. Nie X, Feng J, Zhang J, Yan, S (2019) Single-stage multi-person pose machines. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 6951–6960

  20. Papandreou G, Zhu T, Kanazawa N, Toshev A, Tompson J, Bregler C, Murphy K (2017) Towards accurate multi-person pose estimation in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4903–4911

  21. Papandreou G, Zhu T, Chen LC, Gidaris S, Tompson J, Murphy K (2018) Personlab: Person pose estimation and instance segmentation with a bottom-up, part-based, geometric embedding model. In Proceedings of the European Conference on Computer Vision (ECCV). 269–286

  22. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Adv Neural Inf Proces Syst 28:91–99

    Google Scholar 

  23. Sun K, Xiao B, Liu D, Wang J (2019) Deep high-resolution representation learning for human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5693–5703

  24. Sun X, Xiao B, Wei F, Liang S, Wei Y (2018) Integral human pose regression. In Proceedings of the European Conference on Computer Vision (ECCV). 529–545

  25. Wang Y, Sun Y, Liu Z, Sarma SE, Bronstein MM, Solomon JM (2019) Dynamic graph cnn for learning on point clouds. Acm Trans Graphics (tog) 38(5):1–12

    Article  Google Scholar 

  26. Wei SE, Ramakrishna V, Kanade T, Sheikh Y (2016) Convolutional pose machines. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. pp 4724–4732

  27. Xiao B, Wu H, Wei Y (2018) Simple baselines for human pose estimation and tracking. In Proceedings of the European conference on computer vision (ECCV). 466–481

  28. Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In Thirty-second AAAI conference on artificial intelligence. 7444–7452

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yingsong Hu.

Ethics declarations

Conflict of interests

No conflicts of interests in this work.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zeng, Q., Hu, Y., Li, D. et al. Multi-person pose estimation based on graph grouping optimization. Multimed Tools Appl 82, 7039–7053 (2023). https://doi.org/10.1007/s11042-022-13445-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-13445-3

Keywords