Skip to main content
Log in

Human pose estimation using deep learning: review, methodologies, progress and future research directions

  • Trends and Surveys
  • Published:
International Journal of Multimedia Information Retrieval Aims and scope Submit manuscript

Abstract

Human pose estimation (HPE) has developed over the past decade into a vibrant field for research with a variety of real-world applications like 3D reconstruction, virtual testing and re-identification of the person. Information about human poses is also a critical component in many downstream tasks, such as activity recognition and movement tracking. This review focuses on the key aspects of deep learning in the development of both 2D & 3D HPE. It provides detailed information on the variety of databases, performance metrics and human body models incorporated for implementing HPE methodologies. This paper discusses variety of applications of HPE across domains like activity recognition, animation and gaming, virtual reality, video tracking, etc. The paper presents an analytical study of all the major works that use deep learning methods for various downstream tasks in each domain for both 2D & 3D HPE. Finally, it discusses issues and limitations in the current topic of HPE and recommend potential future research directions in order to make meaningful progress in this area.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Availability of data and materials

Not applicable.

References

  1. Du Y, Wang W, Wang L (2015) Hierarchical recurrent neural network for skeleton based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1110–1118

  2. Li M, Chen S, Chen X, Zhang Y, Wang Y, Tian Q (2019) Actional-structural graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3595–3603

  3. Yan A, Wang Y, Li Z, Qiao Y (2019) Pa3d: pose-action 3d machine for video recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7922–7931

  4. Huang L, Huang Y, Ouyang W, Wang L (2019) Part-aligned pose-guided recurrent network for action recognition. Pattern Recogn 92:165–176

    Article  Google Scholar 

  5. Luvizon DC, Picard D, Tabia H (2018) 2d/3d pose estimation and action recognition using multitask deep learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5137–5146

  6. Choi H, Moon G, Lee KM (2020) Pose2mesh: graph convolutional network for 3d human pose and mesh recovery from a 2d human pose. In: European conference on computer vision. Springer, pp 769–787

  7. Kundu JN, Rakesh M, Jampani V, Venkatesh RM, Venkatesh Babu R (2020) Appearance consensus driven self-supervised human mesh recovery. In: European conference on computer vision. Springer, pp 794–812

  8. Samet N, Akbas E (2021) Hprnet: hierarchical point regression for whole-body human pose estimation. arXiv preprint arXiv:2106.04269

  9. Kanazawa A, Black MJ, Jacobs DW, Malik J (2018) End-to-end recovery of human shape and pose. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7122–7131

  10. Cimen G, Maurhofer C, Sumner B, Guay M (2018) Ar poser: automatically augmenting mobile pictures with digital avatars imitating poses. In: 12th international conference on computer graphics, visualization, computer vision and image processing

  11. Elhayek A, Kovalenko O, Murthy P, Malik J, Stricker D (2018) Fully automatic multi-person human motion capture for vr applications. In: International conference on virtual reality and augmented reality. Springer, pp 28–47

  12. Tzimiropoulos G (2015) Project-out cascaded regression with an application to face alignment. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3659–3667

  13. Terven JR, Córdova-Esparza DM (2021) Kinz an azure kinect toolkit for python and matlab. Sci Comput Program 102702

  14. Tölgyessy M, Dekan M, Chovanec L (2021) Skeleton tracking accuracy and precision evaluation of kinect v1, kinect v2, and the azure kinect. Appl Sci 11(12):5756

    Article  Google Scholar 

  15. Kumarapu L, Mukherjee P (2021) Animepose: multi-person 3d pose estimation and animation. Pattern Recogn Lett 147:16–24

    Article  Google Scholar 

  16. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440

  17. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 28:91–99

    Google Scholar 

  18. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105

    Google Scholar 

  19. Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Lawrence ZC (2014) Microsoft coco: common objects in context. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) Computer vision - ECCV 2014. Springer, Cham, pp 740–755

    Chapter  Google Scholar 

  20. Joo H, Simon T, Li X, Liu H, Tan L, Gui L, Banerjee S, Godisart T, Nabbe B, Matthews I et al (2017) Panoptic studio: a massively multiview system for social interaction capture. IEEE Trans Pattern Anal Mach Intell 41(1):190–204

    Article  Google Scholar 

  21. Mehta D, Rhodin H, Casas D, Fua P, Sotnychenko O, Xu W, Theobalt C (2017) Monocular 3d human pose estimation in the wild using improved cnn supervision. In: 2017 international conference on 3D vision (3DV). IEEE, pp 506–516

  22. Loper M, Mahmood N, Romero J, Pons-Moll G, Black MJ (2015) Smpl: a skinned multi-person linear model. ACM transactions on graphics (TOG) 34(6):1–16

    Article  Google Scholar 

  23. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), volume 1. IEEE, pp 886–893

  24. Bourdev L, Malik J (2009) Poselets: body part detectors trained using 3d human pose annotations. In: 2009 IEEE 12th international conference on computer vision, pp 1365–1372

  25. Bourdev L, Maji S, Brox T, Malik J (2010) Detecting people using mutually consistent poselet activations. In: European conference on computer vision. Springer, pp 168–181

  26. Song L, Yu G, Yuan J, Liu Z (2021) Human pose estimation and its application to action recognition: a survey. J Vis Commun Image Represent, 103055

  27. Felzenszwalb PF, Huttenlocher DP (2005) Pictorial structures for object recognition. Int J Comput Vis 61(1):55–79

    Article  Google Scholar 

  28. Yang Y, Ramanan D (2011) Articulated pose estimation with flexible mixtures-of-parts. CVPR 2011:1385–1392

    Google Scholar 

  29. Wang C, Wang Y, Yuille AL (2013) An approach to pose-based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 915–922

  30. Li D, Chen X, Zhang Z, Huang K (2018) Pose guided deep model for pedestrian attribute recognition in surveillance scenarios. In: 2018 IEEE international conference on multimedia and expo (ICME). IEEE, pp 1–6

  31. Wei S-E, Ramakrishna V, Kanade T, Sheikh Y (2016) Convolutional pose machines. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4724–4732

  32. Xiao B, Wu H, Wei Y (2018) Simple baselines for human pose estimation and tracking. In: Proceedings of the European conference on computer vision (ECCV), pp 466–481

  33. Sun K, Xiao B, Liu D, Wang J (2019) Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5693–5703

  34. Cao Z, Simon T, Wei S-E, Sheikh Y (2017) Realtime multi-person 2d pose estimation using part affinity fields. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7291–7299

  35. Newell A, Huang Z, Deng J (2016) Associative embedding: end-to-end learning for joint detection and grouping. arXiv preprint arXiv:1611.05424

  36. Cheng B, Xiao B, Wang J, Shi H, Huang TS, Zhang L (2020) Higherhrnet: scale-aware representation learning for bottom-up human pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5386–5395

  37. Liu Z, Zhu J, Jiajun B, Chen C (2015) A survey of human pose estimation: the body parts parsing based methods. J Vis Commun Image Represent 32:10–19

    Article  Google Scholar 

  38. Gong W, Zhang X, Gonzàlez J, Sobral A, Bouwmans T, Changhe T, Zahzah E (2016) Human pose estimation from monocular images: a comprehensive survey. Sensors 16(12):1966

    Article  Google Scholar 

  39. Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: European conference on computer vision. Springer, pp 483–499

  40. Fang H-S, Xie S, Tai Y-W, Lu C (2017) Rmpe: regional multi-person pose estimation. In: Proceedings of the IEEE international conference on computer vision, pp 2334–2343

  41. Jin S, Xu L, Xu J, Wang C, Liu W, Qian C, Ouyang W, Luo P (2020) Whole-body human pose estimation in the wild. In: European conference on computer vision. Springer, pp 196–214

  42. Liu W, Chen J, Li C, Qian C, Chu X, Hu X (2018) A cascaded inception of inception network with attention modulated feature fusion for human pose estimation. In: Thirty-second AAAI conference on artificial intelligence

  43. Duan H, Lin K-Y, Jin S, Liu W, Qian C, Ouyang W (2019) Trb: a novel triplet representation for understanding 2d human body. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9479–9488

  44. Kreiss S, Bertoni L, Alahi A (2019) Pifpaf: composite fields for human pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11977–11986

  45. Jin S, Liu W, Xie E, Wang W, Qian C, Ouyang W, Luo P (2020) Differentiable hierarchical graph grouping for multi-person pose estimation. In: European conference on computer vision. Springer, pp 718–734

  46. Jin S, Liu W, Ouyang W, Qian C (2019) Multi-person articulated tracking with spatial and temporal embeddings. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5664–5673

  47. Zhang H-B, Lei Q, Zhong B-N, Du J-X, Peng J (2016) A survey on human pose estimation. Intell Autom Soft Comput 22(3):483–489

    Article  Google Scholar 

  48. Guo Y, Liu Y, Oerlemans A, Lao S, Wu S, Lew MS (2016) Deep learning for visual understanding: a review. Neurocomputing 187:27–48

    Article  Google Scholar 

  49. Dang Q, Yin J, Wang B, Zheng W (2019) Deep learning based 2d human pose estimation: a survey. Tsinghua Sci Technol 24(6):663–676

    Article  Google Scholar 

  50. Wang P, Li W, Ogunbona P, Wan J (2018) and Sergio Escalera. A survey, Rgb-d-based human motion recognition with deep learning

  51. Munea TL, Jembre YZ, Weldegebriel HT, Chen L, Huang C, Yang C (2020) The progress of human pose estimation: a survey and taxonomy of models applied in 2d human pose estimation. IEEE Access 8:133330–133348

    Article  Google Scholar 

  52. Chen Y, Tian Y, He M (2020) Monocular human pose estimation: a survey of deep learning-based methods. Comput Vis Image Underst 192:102897

    Article  Google Scholar 

  53. Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: European conference on computer vision. Springer, pp 740–755

  54. Papandreou G, Zhu T, Kanazawa N, Toshev A, Tompson J, Bregler C, Murphy K (2017) Towards accurate multi-person pose estimation in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4903–4911

  55. Luo Z, Wang Z, Huang Y, Wang L, Tan T, Zhou E (2021) Rethinking the heatmap regression for bottom-up human pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13264–13273

  56. Johnson S, Everingham M (2010) Clustered pose and nonlinear appearance models for human pose estimation. In: bmvc, vol 2, p 5. Citeseer

  57. Tang W, Wu Y (2019) Does learning specific features for related parts help human pose estimation? In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1107–1116

  58. Sapp B, Taskar B (2013) Modec: multimodal decomposable models for human pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3674–3681

  59. Andriluka M, Pishchulin L, Gehler P, Schiele B (2014) 2d human pose estimation: new benchmark and state of the art analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3686–3693

  60. Nie X, Feng J, Xing J, Yan S (2018) Pose partition networks for multi-person pose estimation. In: Proceedings of the European conference on computer vision (eccv), pp 684–699

  61. Li J, Wang C, Zhu H, Mao Y, Fang H-S, Lu C (2019) Crowdpose: efficient crowded scenes pose estimation and a new benchmark. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10863–10872

  62. Tian C, Yu R, Zhao X, Xia W, Wang H, Yang Y (2021) Posedet: fast multi-person pose estimation using pose embedding. In: 2021 16th IEEE international conference on automatic face and gesture recognition (FG 2021). IEEE, pp 1–8

  63. Geng Z, Sun K, Xiao B, Zhang Z, Wang J (2021) Bottom-up human pose estimation via disentangled keypoint regression. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14676–14686

  64. Zhang W, Zhu M, Derpanis KG (2013) From actemes to action: a strongly-supervised representation for detailed action understanding. In: Proceedings of the IEEE international conference on computer vision, pp 2248–2255

  65. Artacho B, Savakis A (2021) Omnipose: a multi-scale framework for multi-person pose estimation. arXiv preprint arXiv:2103.10180

  66. Yang D, Wang Y, Dantcheva A, Garattoni L, Francesca G, Bremond F (2021) Unik: a unified framework for real-world skeleton-based action recognition. arXiv preprint arXiv:2107.08580

  67. Andriluka M, Iqbal U, Insafutdinov E, Pishchulin L, Milan A, Gall J, Schiele B (2018) Posetrack: a benchmark for human pose estimation and tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5167–5176

  68. Liu Z, Feng R, Chen H, Wu S, Gao Y, Gao Y, Wang X (2022) Temporal feature alignment and mutual information maximization for video-based human pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11006–11016

  69. Kreiss S, Bertoni L, Alahi A (2021) Openpifpaf: composite fields for semantic keypoint detection and spatio-temporal association. IEEE Trans Intell Transport Syst

  70. Ionescu C, Papava D, Olaru V, Sminchisescu C (2013) Human 3.6m: large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Trans Pattern Anal Mach Intell 36(7):1325–1339

    Article  Google Scholar 

  71. Sun X, Xiao B, Wei F, Liang S, Wei Y (2018) Integral human pose regression. In: Proceedings of the European conference on computer vision (ECCV), pp 529–545

  72. Sárándi I, Linder T, Arras KO, Leibe B (2020) Metric-scale truncation-robust heatmaps for 3d human pose estimation. In: 2020 15th IEEE international conference on automatic face and gesture recognition (FG 2020). IEEE, pp 407–414

  73. Li S, Ke L, Pratama K, Tai Y-W, Tang C-K, Cheng K-T (2020) Cascaded deep monocular 3d human pose estimation with evolutionary training data. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6173–6183

  74. Zhao L, Peng X, Tian Y, Kapadia M, Metaxas DN (2019) Semantic graph convolutional networks for 3d human pose regression. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3425–3435

  75. Arnab A, Doersch C, Zisserman A (2019) Exploiting temporal context for 3d human pose estimation in the wild. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3395–3404

  76. Yang W, Ouyang W, Wang X, Ren J, Li H, Wang X (2018) 3d human pose estimation in the wild by adversarial learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5255–5264

  77. Joo H, Liu H, Tan L, Gui L, Nabbe B, Matthews I, Kanade T, Nobuhara S, Sheikh Y (2015) Panoptic studio: a massively multiview system for social motion capture. In: Proceedings of the IEEE international conference on computer vision, pp 3334–3342

  78. Tu H, Wang C, Zeng W (2020) Voxelpose: towards multi-camera 3d human pose estimation in wild environment. In: Computer vision—ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16. Springer, pp 197–212

  79. Nibali A, He Z, Morgan S, Prendergast L (2019) 3d human pose estimation with 2d marginal heatmaps. In: 2019 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 1477–1485

  80. Mehta D, Sotnychenko O, Mueller F, Xu W, Sridhar S, Pons-Moll G, Theobalt C (2018) Single-shot multi-person 3d pose estimation from monocular rgb. In: 2018 international conference on 3D vision (3DV). IEEE, pp 120–130

  81. Zhou K, Han X, Jiang N, Jia K, Lu J (2019) Hemlets pose: learning part-centric heatmap triplets for accurate 3d human pose estimation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 2344–2353

  82. Trumble M, Gilbert A, Malleson C, Hilton A, Collomosse J (2017) Total capture: 3d human pose estimation fusing video and inertial sensors. In: Proceedings of 28th British machine vision conference, pp 1–13. University of Surrey

  83. Yi X, Zhou Y, Feng X (2021) Transpose: real-time 3d human translation and pose estimation with six inertial sensors. ACM Trans Gr 40(4):1–13

    Article  Google Scholar 

  84. Zhang Z, Wang C, Qiu W, Qin W, Zeng W (2021) Adafuse: adaptive multiview fusion for accurate human pose estimation in the wild. Int J Comput Vis 129(3):703–718

    Article  Google Scholar 

  85. Varol G, Romero J, Martin X, Mahmood N, Black MJ, Laptev I, Schmid C (2017) Learning from synthetic humans. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 109–117

  86. Leinen F, Cozzolino V, Schön T (2021) Volnet: estimating human body part volumes from a single rgb image. arXiv preprint arXiv:2107.02259

  87. Lassner C, Romero J, Kiefel M, Bogo F, Black MJ, Gehler Peter V (2017) Unite the people: closing the loop between 3d and 2d human representations. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6050–6059

  88. Sengupta A, Budvytis I, Cipolla R (2021) Hierarchical kinematic probability distributions for 3d human shape and pose estimation from images in the wild. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 11219–11229

  89. Zeng W, Ouyang W, Luo P, Liu W, Wang X (2020) 3d human mesh regression with dense correspondence. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7054–7063

  90. Fabbri M, Lanzi F, Calderara S, Palazzi A, Vezzani R, Cucchiara R (2018) Learning to detect and track visible and occluded body joints in a virtual world. In: Proceedings of the European conference on computer vision (ECCV), pp 430–446

  91. Cheng Y, Wang B, Yang B, Tan RT (2021) Monocular 3d multi-person pose estimation by integrating top-down and bottom-up networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7649–7659

  92. Meinhardt T, Kirillov A, Leal-Taixe L, Feichtenhofer C (2022) Trackformer: multi-object tracking with transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8844–8854

  93. von Marcard T, Henschel R, Black MJ, Rosenhahn B, Pons-Moll G (2018) Recovering accurate 3d human pose in the wild using imus and a moving camera. In: Proceedings of the European conference on computer vision (ECCV), pp 601–617

  94. Zeng A, Ju X, Yang L, Gao R, Zhu X, Dai B, Xu Q (2022) Deciwatch: a simple baseline for 10x efficient 2d and 3d pose estimation. arXiv preprint arXiv:2203.08713

  95. Xu J, Yu Z, Ni B, Yang J, Yang X, Zhang W (2020) Deep kinematics analysis for monocular 3d human pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 899–908

  96. Mahmood N, G, Troje NF, Pons-Moll G, Black MJ (2019) Amass: archive of motion capture as surface shapes. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 5442–5451

  97. Bouazizi A, Holzbock A, Kressel U, Dietmayer K, Belagiannis V (2022) Motionmixer: mlp-based 3d human body pose forecasting. arXiv preprint arXiv:2207.00499

  98. Hong F, Zhang M, Pan L, Cai Z, Yang L, Liu Z (2022) Avatarclip: zero-shot text-driven generation and animation of 3d avatars. arXiv preprint arXiv:2205.08535

  99. Cao Z, Gao H, Mangalam K, Cai Q-Z, Vo M, Malik J (2020) Long-term human motion prediction with scene context. In: European conference on computer vision. Springer, pp 387–404

  100. Mohamed A, Chen H, Wang Z, Claudel C (2021) Skeleton-graph: long-term 3d motion prediction from 2d observations using deep spatio-temporal graph cnns. arXiv preprint arXiv:2109.10257

  101. Sarafianos N, Boteanu B, Ionescu B, Kakadiaris IA (2016) 3d human pose estimation: a review of the literature and analysis of covariates. Comput Vis Image Underst 152:1–20

    Article  Google Scholar 

  102. Moon G, Chang JY, Lee KM (2019) Camera distance-aware top-down approach for 3d multi-person pose estimation from a single rgb image. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10133–10142

  103. Lin K, Wang L, Liu Z (2021) End-to-end human pose and mesh reconstruction with transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1954–1963

  104. Zheng C, Wu W, Yang T, Zhu S, Chen C, Liu R, Shen J, Kehtarnavaz N, Shah M (2020) Deep learning-based human pose estimation: a survey. arXiv preprint arXiv:2012.13392

  105. Tome D, Russell C, Agapito L (2017) Lifting from the deep: convolutional 3d pose estimation from a single image. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2500–2509

  106. Sidenbladh H, De la Torre F, Black MJ (2000) A framework for modeling the appearance of 3d articulated figures. In: Proceedings fourth IEEE international conference on automatic face and gesture recognition (Cat. No. PR00580). IEEE, pp 368–375

  107. Anguelov D, Srinivasan P, Koller D, Thrun S, Rodgers J, Davis J (2005) Scape: shape completion and animation of people. In: ACM SIGGRAPH 2005 papers, pp 408–416

  108. Joo H, Simon T, Sheikh Y (2018) Total capture: a 3d deformation model for tracking faces, hands, and bodies. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8320–8329

  109. Alp Guler R, Trigeorgis G, Antonakos E, Snape P, Zafeiriou S, Kokkinos I (2017) Densereg: fully convolutional dense shape regression in-the-wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6799–6808

  110. Cootes TF, Taylor CJ, Cooper DH, Graham J (1995) Active shape models-their training and application. Comput Vis Image Underst 61(1):38–59

    Article  Google Scholar 

  111. Ju SX, Black MJ, Yacoob Y (1996) Cardboard people: a parameterized model of articulated image motion. In: Proceedings of the second international conference on automatic face and gesture recognition. IEEE, pp 38–44

  112. Zuffi S, Freifeld O, Black MJ (2012) From pictorial structures to deformable structures. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE, pp 3546–3553

  113. Mehta D, Sridhar S, Sotnychenko O, Rhodin H, Shafiei M, Seidel HP, Xu W, Casas D, Theobalt C (2017) Vnect: real-time 3d human pose estimation with a single rgb camera. ACM Trans Gr 36(4):1–14

    Article  Google Scholar 

  114. Dantone M, Gall J, Leistner C, Van Gool L (2013) Human pose estimation using body parts dependent joint regressors. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3041–3048

  115. Chen X, Yuille A (2014) Articulated pose estimation by a graphical model with image dependent pairwise relations. arXiv preprint arXiv:1407.3399

  116. Gkioxari G, Hariharan B, Girshick R, Malik J (2014) Using k-poselets for detecting people and localizing their keypoints. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3582–3589

  117. Cai Y, Wang Z, Luo Z, Yin B, Du A, Wang H, Zhang X, Zhou X, Zhou E, Sun J (2020) Learning delicate local representations for multi-person pose estimation. In: European conference on computer vision. Springer, pp 455–472

  118. Cao Z, Simon T, Wei SE, Sheikh Y (2019) Openpose: realtime multi-person 2d pose estimation using part affinity fields. IEEE Trans Pattern Anal Mach Intell 43(1):172–186

    Article  Google Scholar 

  119. Chen Y, Wang Z, Peng Y, Zhang Z, Yu G, Sun J (2018) Cascaded pyramid network for multi-person pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7103–7112

  120. Li W, Wang Z, Yin B, Peng Q, Du Y, Xiao T, Yu G, Lu H, Wei Y, Sun J (2019) Rethinking on multi-stage networks for human pose estimation. arXiv preprint arXiv:1901.00148

  121. Tian Z, Chen H, Shen C (2019) Directpose: direct end-to-end multi-person pose estimation. arXiv preprint arXiv:1911.07451

  122. Sun X, Shang J, Liang S, Wei Y (2017) Compositional human pose regression. In: Proceedings of the IEEE international conference on computer vision, pp 2602–2611

  123. Huang J, Zhu Z, Guo F, Huang G (2020) The devil is in the details: delving into unbiased data processing for human pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5700–5709

  124. Carreira J, Agrawal P, Fragkiadaki K, Malik J (2016) Human pose estimation with iterative error feedback. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4733–4742

  125. Nie X, Feng J, Zhang J, Yan S (2019) Single-stage multi-person pose machines. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6951–6960

  126. Toshev A, Szegedy C (2014) Deeppose: human pose estimation via deep neural networks’. CVPR (Columbus, Ohio), pp 1653–1660

  127. Tompson JJ, Arjun J, Yann L, Christoph B (2014) Joint training of a convolutional network and a graphical model for human pose estimation. Adv Neural Inf Process Syst 27:1799–1807

    Google Scholar 

  128. Andriluka M, Roth S, Schiele B (2009) Pictorial structures revisited: people detection and articulated pose estimation. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE, pp 1014–1021

  129. Toshev A, Szegedy C (2014) Deeppose: human pose estimation via deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1653–1660

  130. Su K, Yu D, Xu Z, Geng X, Wang C (2019) Multi-person pose estimation with enhanced channel-wise and spatial information. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5674–5682

  131. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37

  132. Sun M, Kohli P, Shotton J (2012) Conditional regression forests for human pose estimation. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE, pp 3394–3401

  133. Pishchulin L, Andriluka M, Gehler P, Schiele B (2013) Poselet conditioned pictorial structures. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 588–595

  134. Tang W, Yu P, Wu Y (2018) Deeply learned compositional models for human pose estimation. In: Proceedings of the European conference on computer vision (ECCV), pp 190–206

  135. Zhou X, Wang D, Krähenbühl P (2019) Objects as points. arXiv preprint arXiv:1904.07850

  136. Li J, Wen S, Wang Z (2020) Simple pose: rethinking and improving a bottom-up approach for multi-person pose estimation. Proceedings of the AAAI conference on artificial intelligence 34:11354–11361

    Article  Google Scholar 

  137. Wei F, Sun X, Li H, Wang J, Lin S (2020) Point-set anchors for object detection, instance segmentation and pose estimation. In: European conference on computer vision. Springer, pp 527–544

  138. Pishchulin L, Insafutdinov E, Tang S, Andres B, Andriluka M, Gehler PV, Schiele B (2016) Deepcut: joint subset partition and labeling for multi person pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4929–4937

  139. Kocabas M, Karagoz S, Akbas E (2018) Multiposenet: fast multi-person pose estimation using pose residual network. In: Proceedings of the European conference on computer vision (ECCV), pp 417–433

  140. Papandreou G, Zhu T, Chen L-C, Gidaris S, Tompson J, Murphy K (2018) Personlab: Person pose estimation and instance segmentation with a bottom-up, part-based, geometric embedding model. In: Proceedings of the European conference on computer vision (ECCV), pp 269–286

  141. Luo Y, Xu Z, Liu P, Du Y, Guo J-M (2018) Multi-person pose estimation via multi-layer fractal network and joints kinship pattern. IEEE Trans Image Process 28(1):142–155

    Article  MathSciNet  MATH  Google Scholar 

  142. Insafutdinov E, Pishchulin L, Andres B, Andriluka M, Schiele B (2016) Deepercut: a deeper, stronger, and faster multi-person pose estimation model. In: European conference on computer vision. Springer, pp 34–50

  143. Martinez J, Hossain R, Romero J, Little JJ (2017) A simple yet effective baseline for 3d human pose estimation. In: Proceedings of the IEEE international conference on computer vision, pp 2640–2649

  144. Hogg D (1983) Model-based vision: a program to see a walking person. Image Vis Comput 1(1):5–20

    Article  Google Scholar 

  145. O’rourke J, Badler NI (1980) Model-based image analysis of human motion using constraint propagation. IEEE Trans Pattern Anal Mach Intell 6:522–536

    Article  Google Scholar 

  146. Chen C-H, Ramanan D (2017) 3d human pose estimation= 2d pose estimation+ matching. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7035–7043

  147. Tekin B, Katircioglu I, Salzmann M, Lepetit V, Fua P (2016) Structured prediction of 3d human pose with deep neural networks. arXiv preprint arXiv:1605.05180

  148. Pavlakos G, Zhou X, Derpanis KG, Daniilidis K (2017) Coarse-to-fine volumetric prediction for single-image 3d human pose. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7025–7034

  149. Wang J, Sun K, Cheng T, Jiang B, Deng C, Zhao Y, Liu D, Mu Y, Tan M, Wang X et al. (2020) Deep high-resolution representation learning for visual recognition. IEEE Trans Pattern Anal Mach Intell

  150. Alp Güler R, Neverova N, Kokkinos I (2018) Densepose: dense human pose estimation in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7297–7306

  151. Jiang W, Kolotouros N, Pavlakos G, Zhou X, Daniilidis K (2020) Coherent reconstruction of multiple humans from a single image. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5579–5588

  152. Andriluka M, Roth S, Schiele B (2010) Monocular 3d pose estimation and tracking by detection. In: 2010 IEEE computer society conference on computer vision and pattern recognition. IEEE, pp 623–630

  153. Moreno-Noguer F (2017) 3d human pose estimation from a single image via distance matrix regression. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2823–2832

  154. Belagiannis V, Amin S, Andriluka M, Schiele B, Navab N, Ilic S (2015) 3d pictorial structures revisited: multiple human pose estimation. IEEE Trans Pattern Anal Mach Intell 38(10):1929–1942

    Article  Google Scholar 

  155. Ershadi-Nasab S, Noury E, Kasaei S, Sanaei E (2018) Multiple human 3d pose estimation from multiview images. Multimed Tools Appl 77(12):15573–15601

    Article  Google Scholar 

  156. Tome D, Toso M, Agapito L, Russell C (2018) Rethinking pose in 3d: multi-stage refinement and recovery for markerless motion capture. In: 2018 international conference on 3D vision (3DV). IEEE, pp 474–483

  157. Zhang Y, An L, Yu T, Li X, Li K, Liu Y (2020) 4d association graph for realtime multi-person motion capture using multiple video cameras. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1324–1333

  158. Chen L, Ai H, Chen R, Zhuang Z, Liu S (2020) Cross-view tracking for multi-human 3d pose estimation at over 100 fps. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3279–3288

  159. Lee K, Lee I, Lee S (2018) Propagating lstm: 3d pose estimation based on joint interdependency. In: Proceedings of the European conference on computer vision (ECCV), pp 119–135

  160. Hossain MRI, Little JJ (2018) Exploiting temporal information for 3d human pose estimation. In: Proceedings of the European conference on computer vision (ECCV), pp 68–84

  161. Nie BX, Wei P, Zhu S-C (2017) Monocular 3d human pose estimation by predicting depth on joints. In: 2017 IEEE international conference on computer vision (ICCV). IEEE, pp 3467–3475

  162. Pavlakos G, Zhou X, Daniilidis K (2018) Ordinal depth supervision for 3d human pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7307–7316

  163. Yasin H, Iqbal U, Kruger B, Weber A, Gall J (2016) A dual-source approach for 3d pose estimation from a single image. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4948–4956

  164. Dabral R, Mundhada A, Kusupati U, Afaque S, Sharma A, Jain A (2018) Learning 3d human pose from structure and motion. In: Proceedings of the European conference on computer vision (ECCV), pp 668–683

  165. Tekin B, Márquez-Neila P, Salzmann M, Fua P (2017) Learning to fuse 2d and 3d image cues for monocular body pose estimation. In: Proceedings of the IEEE international conference on computer vision, pp 3941–3950

  166. Sárándi I, Linder T, Arras KO, Leibe B (2018)Synthetic occlusion augmentation with volumetric heatmaps for the 2018 eccv posetrack challenge on 3d human pose estimation. arXiv preprint arXiv:1809.04987

  167. Rogez G, Weinzaepfel P, Schmid C (2019) Lcr-net++: multi-person 2d and 3d pose detection in natural images. IEEE Trans Pattern Anal Mach Intell 42(5):1146–1161

    Google Scholar 

  168. Zanfir A, Marinoiu E, Sminchisescu C (2018) Monocular 3d pose and shape estimation of multiple people in natural scenes-the importance of multiple scene constraints. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2148–2157

  169. Mehta D, Sotnychenko O, Mueller F, Xu W, Elgharib M, Fua P, Seidel H-P, Rhodin H, Pons-Moll G, Theobalt C (2019) Xnect: real-time multi-person 3d human pose estimation with a single rgb camera. arXiv preprint arXiv:1907.00837

  170. Remelli E, Han S, Honari S, Fua P, Wang R (2020) Lightweight multi-view 3d pose estimation through camera-disentangled representation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6040–6049

  171. Qiu H, Wang C, Wang J, Wang N, Zeng W (2019) Cross view fusion for 3d human pose estimation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 4342–4351

  172. Andrew AM (2001) Multiple view geometry in computer vision. Kybernetes

  173. Iskakov K, Burkov E, Lempitsky V, Malkov Y (2019) Learnable triangulation of human pose. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 7718–7727

  174. Chen H, Guo P, Li P, Lee GH, Chirikjian G (2020) Multi-person 3d pose estimation in crowded scenes based on multi-view geometry. In: European conference on computer vision. Springer, pp 541–557

  175. Dong J, Jiang W, Huang Q, Bao H, Zhou X (2019) Fast and robust multi-person 3d pose estimation from multiple views. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7792–7801

  176. Huang C, Jiang S, Li Y, Zhang Z, Traish J, Deng C, Ferguson S, Xu RY (2020) End-to-end dynamic matching network for multi-view multi-person 3d pose estimation. In: European conference on computer vision. Springer, pp 477–493

  177. Kadkhodamohammadi A, Padoy N (2021) A generalizable approach for multi-view 3d human pose regression. Mach Vis Appl 32(1):1–14

    Article  Google Scholar 

  178. Svensén M, Bishop CM (2007) Pattern recognition and machine learning

  179. Belagiannis V, Amin S, Andriluka M, Schiele B, Navab N, Ilic S (2014) 3d pictorial structures for multiple human pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1669–1676

  180. Zhong Z, Zheng L, Zheng Z, Li S, Yang Y (2018) Camera style adaptation for person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5157–5166

  181. Li S, Chan AB (2014) 3d human pose estimation from monocular images with deep convolutional neural network. In: Asian conference on computer vision. Springer, pp 332–347

  182. Li S, Zhang W, Chan AB (2015) Maximum-margin structured learning with deep networks for 3d human pose estimation. In: Proceedings of the IEEE international conference on computer vision, pp 2848–2856

  183. Rogez G, Weinzaepfel P, Schmid C (2017) Lcr-net: localization-classification-regression for human pose. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3433–3441

  184. Luo C, Chu X, Yuille A (2018) Orinet: a fully convolutional network for 3d human pose estimation. arXiv preprint arXiv:1811.04989

  185. Fang HS, Xu Y, Wang W, Liu X, Zhu SC (2018) Learning pose grammar to encode human body configuration for 3d pose estimation. In: Proceedings of the AAAI Conference on Artificial Intelligence, volume 32

  186. Mehta D, Sotnychenko O, Mueller F, Xu W, Elgharib M, Fua P, Seidel HP, Rhodin H, Pons-Moll G, Theobalt C (2020) Xnect: real-time multi-person 3d motion capture with a single rgb camera. ACM Trans Gr 39(4):82–91

    Article  Google Scholar 

  187. Rhodin H, Spörri J, Katircioglu I, Constantin V, Meyer F, Müller E, Salzmann M, Fua P (2018) Learning monocular 3d human pose estimation from multi-view images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8437–8446

  188. Wandt B, Rosenhahn B (2019) Repnet: weakly supervised training of an adversarial reprojection network for 3d human pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7782–7791

  189. Wang C, Kong C, Lucey S (2019) Distill knowledge from nrsfm for weakly supervised 3d pose learning. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 743–752

  190. Kundu JN, Seth S, Jampani V, Rakesh M, Venkatesh BR, Chakraborty A (2020) Self-supervised 3d human pose estimation via part guided novel image synthesis. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6152–6162

  191. Zanfir A, Bazavan EG, Xu H, Freeman WT, Sukthankar RSC (2020) Weakly supervised 3d human pose and shape reconstruction with normalizing flows. In: European conference on computer vision. Springer, pp 465–481

  192. Chen Z, Liu X, Sheng B, Li P (2020) Garnet: graph attention residual networks based on adversarial learning for 3d human pose estimation. In: Computer graphics international conference. Springer, pp 276–287

  193. Habekost J, Shiratori T, Ye Y, Komura T, Shi M, Aberman K, Aristidou A, Lischinski D, Cohen-Or D, Chen B et al. (2020) Learning 3d global human motion estimation from unpaired, disjoint datasets. In: BMVC

  194. Xiaohan Nie B, Xiong C, Zhu S-C (2015) Joint action recognition and pose estimation from video. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1293–1301

  195. Cao C, Zhang Y, Zhang C, Hanqing L (2017) Body joint guided 3-d deep convolutional descriptors for action recognition. IEEE Trans Cybern 48(3):1095–1108

    Article  Google Scholar 

  196. Liu J, Shahroudy A, Xu D, Wang G (2016) Spatio-temporal lstm with trust gates for 3d human action recognition. In: European conference on computer vision. Springer, pp 816–833

  197. Liu J, Shahroudy A, Xu D, Wang G (2017) Deep multimodal feature analysis for action recognition in rgb+ d videos. IEEE Trans Pattern Anal Mach Intell 40(5):1045–1058

    Google Scholar 

  198. Baradel F, Wolf C, Mille J (2017) Pose-conditioned spatio-temporal attention for human action recognition. arXiv preprint arXiv:1703.10106

  199. Raaj Y, Idrees H, Hidalgo G, Sheikh Y (2019) Efficient online multi-person 2d pose tracking with recurrent spatio-temporal affinity fields. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4620–4628

  200. Girdhar R, Gkioxari G, Torresani L, Paluri M, Tran D (2018) Detect-and-track: efficient pose estimation in videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 350–359

  201. Ramachandran A, Karuppiah A (2020) A survey on recent advances in wearable fall detection systems. BioMed Res Int

  202. Khan SS, Hoey J (2017) Review of fall detection techniques: a data availability perspective. Med Eng Phys 39:12–22

    Article  Google Scholar 

  203. Ma X, Wang H, Xue B, Zhou M, Ji B, Li Y (2014) Depth-based human fall detection via shape features and improved extreme learning machine. IEEE J Biomed Health Inform 18(6):1915–1922

    Article  Google Scholar 

  204. Geertsema EE, Visser GH, Viergever MA, Kalitzin SN (2019) Automated remote fall detection using impact features from video and audio. J Biomech 88:25–32

    Article  Google Scholar 

  205. Mastorakis G, Makris D (2014) Fall detection system using kinect’s infrared sensor. J Real Time Image Proc 9(4):635–646

    Article  Google Scholar 

  206. Yajai A, Rasmequan S (2017) Adaptive directional bounding box from rgb-d information for improving fall detection. J Vis Commun Image Represent 49:257–273

    Article  Google Scholar 

  207. Ciabattoni L, Foresi G, Monteriù A, Proietti Pagnotta D, Tomaiuolo L (2018) Fall detection system by using ambient intelligence and mobile robots. In: 2018 zooming innovation in consumer technologies conference (ZINC). IEEE, pp 130–131

  208. Núñez-Marcos A, Azkune G, Arganda-Carreras I (2017) Vision-based fall detection with convolutional neural networks. Wirel Commun Mobile Comput

  209. Han Q, Zhao H, Min W, Cui H, Zhou X, Zuo K, Liu R (2020) A two-stream approach to fall detection with mobilevgg. IEEE Access 8:17556–17566

    Article  Google Scholar 

  210. Na L, Yidan W, Feng L, Song J (2018) Deep learning for fall detection: three-dimensional cnn combined with lstm on video kinematic data. IEEE J Biomed Health Inform 23(1):314–323

    Google Scholar 

  211. Sajjan S, Moore M, Pan M, Nagaraja G, Lee J, Zeng A, Song S (2020) Clear grasp: 3d shape estimation of transparent objects for manipulation. In: 2020 IEEE international conference on robotics and automation (ICRA). IEEE, pp 3634–3642

  212. Escalona F, Martinez-Martin E, Cruz E, Cazorla M, Gomez-Donoso F (2020) Eva: evaluating at-home rehabilitation exercises using augmented reality and low-cost sensors. Virtual Real 24(4):567–581

    Article  Google Scholar 

  213. Shi D, Jiang X (2021) Sport training action correction by using convolutional neural network. Internet Technol Lett 4(3):e261

    Article  Google Scholar 

  214. Wang J, Qiu K, Peng H, Fu J, Zhu J (2019) Ai coach: deep human pose estimation and analysis for personalized athletic training assistance. In: Proceedings of the 27th ACM international conference on multimedia, pp 374–382

  215. Insafutdinov E, Andriluka M, Pishchulin L, Tang S, Levinkov E, Andres B, Schiele B (2017) Arttrack: articulated multi-person tracking in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6457–6465

  216. Jin S, Ma X, Han Z, Wu Y, Yang W, Liu W, Qian C, Ouyang W (2017) Towards multi-person pose tracking: bottom-up and top-down methods. In: ICCV posetrack workshop 2:7

  217. Xiu Y, Li J, Wang H, Fang Y, Lu C (2018) Pose flow: efficient online pose tracking. arXiv preprint arXiv:1802.00977

  218. Doering A, Iqbal U, Gall J (2018) Joint flow: temporal flow fields for multi person tracking. arXiv preprint arXiv:1805.04596

  219. Li J, Xu C, Chen Z, Bian S, Yang L, Lu C (2021) Hybrik: a hybrid analytical-neural inverse kinematics solution for 3d human pose and shape estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3383–3393

  220. Lin K, Wang L, Liu Z (2021) Mesh graphormer. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 12939–12948

  221. Yuan Y, Iqbal U, Molchanov P, Kitani K, Kautz J (2022) Glamr: global occlusion-aware human mesh recovery with dynamic cameras. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11038–11049

  222. Kundu JN, Seth S, Ym P, Jampani V, Chakraborty A, Babu RV (2022) Uncertainty-aware adaptation for self-supervised 3d human pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 20448–20459

  223. Khirodkar R, Tripathi S, Kitani K (2022) Occluded human mesh recovery. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1715–1725

  224. Li Z, Wang X, Wang F, Jiang P (2019) On boosting single-frame 3d human pose estimation via monocular videos. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 2192–2201

  225. Khurana T, Dave A, Ramanan D (2021) Detecting invisible people. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 3174–3184

  226. Jiang T, Camgoz NC, Bowden R (2021) Skeletor: skeletal transformers for robust body-pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3394–3402

  227. Choi H, Moon G, Chang JY, Lee KM (2021) Beyond static features for temporally consistent 3d human pose and shape from a video. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1964–1973

  228. Jiao J, Cao Y, Song Y, Lau R (2018) Look deeper into depth: monocular depth estimation with semantic booster and attention-driven loss. In: Proceedings of the European conference on computer vision (ECCV), pp 53–69

  229. Long X, Lin C, Liu L, Li W, Theobalt C, Yang R, Wang W (2021) Adaptive surface normal constraint for depth estimation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 12849–12858

  230. Park J, Joo K, Hu Z, Liu C-K, Kweon IS (2020) Non-local spatial propagation network for depth completion. In: European conference on computer vision. Springer, pp 120–136

  231. Xiong X, Xiong H, Xian K, Zhao C, Cao Z, Li X (2020) Sparse-to-dense depth completion revisited: sampling strategy and graph construction. In: European conference on computer vision. Springer, pp 682–699

  232. Qu C, Liu W, Taylor CJ (2021) Bayesian deep basis fitting for depth completion with uncertainty. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 16147–16157

  233. Reddy ND, Guigues L, Pishchulin L, Eledath J, Narasimhan SG (2021) Tessetrack: end-to-end learnable multi-person articulated 3d pose tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 15190–15200

  234. Wu S, Jin S, Liu W, Bai L, Qian C, Liu D, Ouyang W (2021) Graph-based 3d multi-person pose estimation using multi-view images. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 11148–11157

  235. Zhang Y, Wang C, Wang X, Liu W, Zeng W (2022) Voxeltrack: multi-person 3d human pose estimation and tracking in the wild. IEEE Trans Pattern Anal Mach Intell

  236. Johnson WR, Alderson J, Lloyd D, Mian A (2018) Predicting athlete ground reaction forces and moments from spatio-temporal driven cnn models. IEEE Trans Biomed Eng 66(3):689–694

    Article  Google Scholar 

  237. Alcantara RS, Edwards WB, Millet GY, Grabowski AM (2022) Predicting continuous ground reaction forces from accelerometers during uphill and downhill running: a recurrent neural network solution. PeerJ 10:e12752

    Article  Google Scholar 

  238. McGinley JL, Baker R, Wolfe R, Morris ME (2009) The reliability of three-dimensional kinematic gait measurements: a systematic review. Gait Posture 29(3):360–369

    Article  Google Scholar 

  239. Morris C, Mundt M, Goldacre M, Weber J, Mian A, Alderson J (2021) Predicting 3d ground reaction force from 2d video via neural networks in sidestepping tasks. ISBS Proc Arch 39(1):300

    Google Scholar 

  240. Yu H, Xu Y, Zhang J, Zhao W, Guan Z, Tao D (2021) Ap-10k: a benchmark for animal pose estimation in the wild. arXiv preprint arXiv:2108.12617

  241. Mathis A, Biasi T, Schneider S, Yuksekgonul M, Rogers B, Bethge M, Mathis MW (2021) Pretraining boosts out-of-domain robustness for pose estimation. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 1859–1868

  242. Graving JM, Chae D, Naik H, Li L, Koger B, Costelloe BR, Couzin ID (2019) Deepposekit, a software toolkit for fast and robust animal pose estimation using deep learning. Elife 8:e47994

    Article  Google Scholar 

  243. Labuguen R, Matsumoto J, Negrete SB, Nishimaru H, Nishijo H, Takada M, Go Y, Inoue KI, Shibata T (2021) Macaquepose: a novel “in the wild’’ macaque monkey pose dataset for markerless motion capture. Front Behav Neurosci 14:581154

    Article  Google Scholar 

  244. Pereira TD, Aldarondo DE, Willmore L, Kislin M, Wang SS, Murthy M, Shaevitz JW (2019) Fast animal pose estimation using deep neural networks. Nat Methods 16(1):117–125

    Article  Google Scholar 

  245. Li S, Li J, Tang H, Qian R, Lin W(2019) Atrw: a benchmark for amur tiger re-identification in the wild. arXiv preprint arXiv:1906.05586

  246. Hendrycks D, Dietterich T (2019) Benchmarking neural network robustness to common corruptions and perturbations. arXiv preprint arXiv:1903.12261

  247. Michaelis C, Mitzkus B, Geirhos R, Rusak E, Bringmann O, Ecker AS, Bethge M, Brendel W (2019) Benchmarking robustness in object detection: autonomous driving when winter is coming. arXiv preprint arXiv:1907.07484

  248. Kamann C, Rother C (2020) Benchmarking the robustness of semantic segmentation models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8828–8838

  249. Liu W, Mei T (2022) Recent advances of monocular 2d and 3d human pose estimation: a deep learning perspective. ACM Comput Surv

  250. Wang J, Jin S, Liu W, Liu W, Qian C, Luo P (2021) When human pose estimation meets robustness: adversarial algorithms and benchmarks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11855–11864

  251. Zheng C, Wu W, Yang T, Zhu S, Chen C, Liu R, Shen J, Kehtarnavaz N, Shah M (2020) Deep learning-based human pose estimation: a survey. CoRR, arXiv:2012.13392

  252. Charles J, Pfister T, Magee D, Hogg D, Zisserman A (2016) Personalizing human video pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3063–3072

  253. Liu Z, Chen H, Feng R, Wu S, Ji S, Yang B, Wang X (2021) Deep dual consecutive network for human pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 525–534

  254. Xu L, Jin S, Liu W, Qian C, Ouyang W, Luo P, Wang X (2022) Zoomnas: searching for whole-body human pose estimation in the wild. IEEE Trans Pattern Anal Mach Intell

  255. Zhang D, Wu Y, Guo M, Chen Y (2021) Deep learning methods for 3d human pose estimation under different supervision paradigms: a survey. Electronics 10(18):2267

  256. Wang C, Zhang F, Ge SS (2021) A comprehensive survey on 2d multi-person pose estimation methods. Eng Appl Artif Intell 102:104260

    Article  Google Scholar 

  257. Giryes R, Sapiro G, Bronstein AM (2014) On the stability of deep networks. arXiv preprint arXiv:1412.5896

  258. Zheng S, Song Y, Leung T, Goodfellow I (2016) Improving the robustness of deep neural networks via stability training. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4480–4488

  259. Moosavi-Dezfooli SM, Fawzi A, Fawzi O, Frossard P(2017) Universal adversarial perturbations. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1765–1773

  260. Haber E, Ruthotto L (2017) Stable architectures for deep neural networks. Inverse Prob 34(1):014004

    Article  MathSciNet  MATH  Google Scholar 

  261. Chen R, Chen H, Ren J, Huang G, Zhang Q (2019) Explaining neural networks semantically and quantitatively. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9187–9196

  262. Zhang Y, Tiňo P, Leonardis A, Tang K (2021) A survey on neural network interpretability. IEEE Trans Emerg Top Comput Intell

  263. Liu J, Akhtar N, Mian A (2020) Adversarial attack on skeleton-based human action recognition. IEEE Trans Neural Netw Learn Syst

Download references

Acknowledgements

Not applicable.

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pranjal Kumar.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Ethics approval and consent to participate

Not applicable.

Consent for publication

Yes.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kumar, P., Chauhan, S. & Awasthi, L.K. Human pose estimation using deep learning: review, methodologies, progress and future research directions. Int J Multimed Info Retr 11, 489–521 (2022). https://doi.org/10.1007/s13735-022-00261-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13735-022-00261-6

Keywords

Navigation