Skip to main content
Log in

A Comprehensive Survey on Single-Person Pose Estimation in Social Robotics

  • Survey
  • Published:
International Journal of Social Robotics Aims and scope Submit manuscript

Abstract

With the development of the economy and the improvement of people’s living standard, social robotics gradually enter into daily lives of individuals. Human–robot interaction is the basic function of social robotics, and how to achieve better experience of human–robot interaction is an important issue in the field of social robotics. Single-person pose estimation is the core technology for human–robot interaction in social robots. Benefiting from the development of deep learning, single-person pose estimation has made great progress. This paper reviews the development of single-person pose estimation from four aspects: data augmentation, the evolution of SPPE model, learning target and post-processing. Besides, we give the commonly used datasets and evaluation metrics. Finally, the problems of SPPE are discussed and the future research trends are given.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

References

  1. Alhaddad AY, Cabibihan JJ, Bonarini A (2020) Influence of reaction time in the emotional response of a companion robot to a child’s aggressive interaction. Int J Soc Robotics 12:1279–1291

  2. Andriluka M, Roth S, Schiele B (2009) Pictorial structures revisited: people detection and articulated pose estimation. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE, pp 1014–1021

  3. Andriluka M, Pishchulin L, Gehler P, Schiele B (2014) 2d human pose estimation: new benchmark and state of the art analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3686–3693

  4. Artacho B, Savakis A (2020) Unipose: unified human pose estimation in single images and videos. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7035–7044

  5. Baker B, Gupta O, Naik N, Raskar R (2016) Designing neural network architectures using reinforcement learning. arXiv:1611.02167

  6. Belagiannis V, Zisserman A (2017) Recurrent human pose estimation. In: 2017 12th IEEE international conference on automatic face and gesture recognition (FG 2017). IEEE, pp 468–475

  7. Bin Y, Cao X, Chen X, Ge Y, Tai Y, Wang C, Li J, Huang F, Gao C, Sang N (2020) Adversarial semantic data augmentation for human pose estimation. In: European conference on computer vision, pp 1–1

  8. Buehler P, Everingham M, Huttenlocher DP, Zisserman A (2011) Upper body detection and tracking in extended signing sequences. Int J Comput Vis 95(2):180

    Article  Google Scholar 

  9. Bulat A, Tzimiropoulos G (2016) Human pose estimation via convolutional part heatmap regression. In: European conference on computer vision. Springer, pp 717–732

  10. Cao X, Ge Y, Tai Y, Zhang W, Li J, Wang C, Li J, Huang F (2019) Anti-confusing: region-aware network for human pose estimation. arXiv:1905.00996

  11. Carreira J, Agrawal P, Fragkiadaki K, Malik J (2016) Human pose estimation with iterative error feedback. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4733–4742

  12. Charles J, Pfister T, Everingham M, Zisserman A (2014) Automatic and efficient human pose estimation for sign language videos. Int J Comput Vis 110(1):70–90

    Article  Google Scholar 

  13. Chen X, Yuille AL (2014) Articulated pose estimation by a graphical model with image dependent pairwise relations. In: Advances in neural information processing systems, pp 1736–1744

  14. Chen Y, Shen C, Wei XS, Liu L, Yang J (2017) Adversarial posenet: a structure-aware convolutional network for human pose estimation. In: Proceedings of the IEEE international conference on computer vision, pp 1212–1221

  15. Chen Y, Wang Z, Peng Y, Zhang Z, Yu G, Sun J (2018) Cascaded pyramid network for multi-person pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7103–7112

  16. Chen Y, Tian Y, He M (2020) Monocular human pose estimation: a survey of deep learning-based methods. Comput Vis Image Understanding 192:102897

    Article  Google Scholar 

  17. Cherian A, Mairal J, Alahari K, Schmid C (2014) Mixing body-part sequences for human pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2353–2360

  18. Chou CJ, Chien JT, Chen HT (2018) Self adversarial training for human pose estimation. In: 2018 Asia-Pacific signal and information processing association annual summit and conference (APSIPA ASC). IEEE, pp 17–30

  19. Chu X, Ouyang W, Li H, Wang X (2016a) Structured feature learning for pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4715–4723

  20. Chu X, Ouyang W, Wang X et al (2016b) Crf-cnn: modeling structured information in human pose estimation. In: Advances in neural information processing systems, pp 316–324

  21. Chu X, Yang W, Ouyang W, Ma C, Yuille AL, Wang X (2017) Multi-context attention for human pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1831–1840

  22. Cireşan DC, Meier U, Gambardella LM, Schmidhuber J (2010) Deep, big, simple neural nets for handwritten digit recognition. Neural Comput 22(12):3207–3220

    Article  Google Scholar 

  23. Dang Q, Yin J, Wang B, Zheng W (2019) Deep learning based 2d human pose estimation: a survey. Tsinghua Sci Technol 24(6):663–676

    Article  Google Scholar 

  24. Escalera S, Gonzàlez J, Baró X, Reyes M, Lopes O, Guyon I, Athitsos V, Escalante H (2013) Multi-modal gesture recognition challenge 2013: dataset and results. In: Proceedings of the 15th ACM on international conference on multimodal interaction, pp 445–452

  25. Fan X, Zheng K, Lin Y, Wang S (2015) Combining local appearance and holistic view: dual-source deep neural networks for human pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1347–1355

  26. Felzenszwalb PF, Huttenlocher DP (2005) Pictorial structures for object recognition. Int J Comput Vis 61(1):55–79

    Article  Google Scholar 

  27. Ferrari V, Marin-Jimenez M, Zisserman A (2008) Progressive search space reduction for human pose estimation. In: 2008 IEEE conference on computer vision and pattern recognition. IEEE, pp 1–8

  28. Fieraru M, Khoreva A, Pishchulin L, Schiele B (2018) Learning to refine human pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 205–214

  29. Fischler MA, Elschlager RA (1973) The representation and matching of pictorial structures. IEEE Trans Comput 100(1):67–92

    Article  Google Scholar 

  30. Gatys LA, Ecker AS, Bethge M (2016) Image style transfer using convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2414–2423

  31. Gkioxari G, Toshev A, Jaitly N (2016) Chained predictions using convolutional neural networks. In: European conference on computer vision. Springer, pp 728–743

  32. Gong W, Zhang X, Gonzàlez J, Sobral A, Bouwmans T, Tu C, Zahzah E (2016) Human pose estimation from monocular images: a comprehensive survey. Sensors 16(12):1966

    Article  Google Scholar 

  33. Gong X, Chen W, Jiang Y, Yuan Y, Liu X, Zhang Q, Li Y, Wang Z (2020) Autopose: searching multi-scale branch aggregation for pose estimation. arXiv:2008.07018

  34. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  35. Hou L, Cao J, Zhao Y, Shen H, Meng Y, He R, Ye J (2020) Augmented parallel-pyramid net for attention guided pose-estimation. In: European conference on computer vision, pp 1–1

  36. Jain A, Tompson J, Andriluka M, Taylor GW, Bregler C (2013) Learning human pose estimation features with convolutional networks. arXiv:1312.7302

  37. Ji X, Liu H (2009) Advances in view-invariant human motion analysis: a review. IEEE Trans Syst Man Cybern Part C (Appl Rev) 40(1):13–24

    MathSciNet  Google Scholar 

  38. Johnson S, Everingham M (2010) Clustered pose and nonlinear appearance models for human pose estimation. In: bmvc, Citeseer, vol 2, p 5

  39. Johnson S, Everingham M (2011) Learning effective human pose estimation from inaccurate annotation. In: CVPR 2011. IEEE, pp 1465–1472

  40. Ke L, Chang MC, Qi H, Lyu S (2018) Multi-scale structure-aware network for human pose estimation. In: Proceedings of the European conference on computer vision (ECCV), pp 713–728

  41. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

  42. Liang X, Gong K, Shen X, Lin L (2018) Look into person: joint body parsing & pose estimation network and a new benchmark. IEEE Trans Pattern Anal Mach Intell 41(4):871–885

    Article  Google Scholar 

  43. Lifshitz I, Fetaya E, Ullman S (2016) Human pose estimation using deep consensus voting. In: European conference on computer vision. Springer, pp 246–260

  44. Liu H, Simonyan K, Yang Y (2018a) Darts: differentiable architecture search. arXiv:1806.09055

  45. Liu H, Simonyan K, Yang Y (2019) DARTS: differentiable architecture search. In: International conference on learning representations, New Orleans, LA, USA

  46. Liu W, Chen J, Li C, Qian C, Chu X, Hu X (2018b) A cascaded inception of inception network with attention modulated feature fusion for human pose estimation. In: Thirty-Second AAAI conference on artificial intelligence

  47. Liu X, Qi F, Ye W, Cheng K, Guo J, Zheng R (2018c) Analysis and modeling methodologies for heat exchanges of deep-sea in situ spectroscopy detection system based on rov. Sensors 18(8):2729

    Article  Google Scholar 

  48. Liu X, Maghlakelidze G, Zhou J, Izadi OH, Pommerenke D (2020) Detection of esd-induced soft failures by analyzing linux kernel function calls. IEEE Trans Device Mater Reliab PP(99):1–1

    Google Scholar 

  49. Liu Z, Zhu J, Bu J, Chen C (2015) A survey of human pose estimation: the body parts parsing based methods. J Vis Commun Image Representation 32:10–19

    Article  Google Scholar 

  50. Martin Arjovsky S, Bottou L (2017) Wasserstein generative adversarial networks. In: Proceedings of the 34th international conference on machine learning, Sydney, Australia

  51. Mirowski P, Grimes M, Malinowski M, Hermann KM, Anderson K, Teplyashin D, Simonyan K, Zisserman A, Hadsell R et al (2018) Learning to navigate in cities without a map. In: Advances in neural information processing systems, pp 2419–2430

  52. Moon G, Chang JY, Lee KM (2019) Posefix: model-agnostic general human pose refinement network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7773–7781

  53. Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: European conference on computer vision. Springer, pp 483–499

  54. Nibali A, He Z, Stuart M, Prendergast L (2018) Numerical coordinate regression with convolutional neural networks. CoRR abs/1801.07372

  55. Nie X, Feng J, Zuo Y, Yan S (2018) Human pose estimation with parsing induced learner. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2100–2108

  56. Ning G, Zhang Z, He Z (2017) Knowledge-guided deep fractal neural networks for human pose estimation. IEEE Trans Multimed 20(5):1246–1259

    Article  Google Scholar 

  57. Papandreou G, Zhu T, Kanazawa N, Toshev A, Tompson J, Bregler C, Murphy K (2017) Towards accurate multi-person pose estimation in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4903–4911

  58. Park S, Sb Lee, Park J (2020) Data augmentation method for improving the accuracy of human pose estimation with cropped images. Pattern Recognit Lett 136:244–250

    Article  Google Scholar 

  59. Peng X, Tang Z, Yang F, Feris RS, Metaxas D (2018) Jointly optimize data augmentation and network training: adversarial data augmentation in human pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2226–2234

  60. Pfister T, Simonyan K, Charles J, Zisserman A (2014) Deep convolutional neural networks for efficient pose estimation in gesture videos. In: Asian conference on computer vision. Springer, pp 538–552

  61. Pfister T, Charles J, Zisserman A (2015) Flowing convnets for human pose estimation in videos. In: Proceedings of the IEEE international conference on computer vision, pp 1913–1921

  62. Pishchulin L, Jain A, Andriluka M, Thormählen T, Schiele B (2012) Articulated people detection and pose estimation: reshaping the future. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE, pp 3178–3185

  63. Rafi U, Leibe B, Gall J, Kostrikov I (2016) An efficient convolutional network for human pose estimation. In: BMVC, vol 1, p 2

  64. Ruggero Ronchi M, Perona P (2017) Benchmarking and error diagnosis in multi-instance pose estimation. In: Proceedings of the IEEE international conference on computer vision, pp 369–378

  65. Sapp B, Taskar B (2013) Modec: multimodal decomposable models for human pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3674–3681

  66. Sarafianos N, Boteanu B, Ionescu B, Kakadiaris IA (2016) 3d human pose estimation: a review of the literature and analysis of covariates. Comput Vis Image Understanding 152:1–20

    Article  Google Scholar 

  67. Saxena S, Verbeek J (2016) Convolutional neural fabrics. In: Advances in neural information processing systems, pp 4053–4061

  68. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

  69. Su H, Yang C, Ferrigno G, De Momi E (2019a) Improved human-robot collaborative control of redundant robot for teleoperated minimally invasive surgery. IEEE Robot Automat Lett 4(2):1447–1453

    Article  Google Scholar 

  70. Su H, Hu Y, Karimi HR, Knoll A, Ferrigno G, De Momi E (2020a) Improved recurrent neural network-based manipulator control with remote center of motion constraints: experimental results. Neural Netw 131:291–299

    Article  MATH  Google Scholar 

  71. Su H, Qi W, Yang C, Sandoval J, Ferrigno G, De Momi E (2020b) Deep neural network approach in robot tool dynamics identification for bilateral teleoperation. IEEE Robot Automat Lett 5(2):2943–2949

    Article  Google Scholar 

  72. Su K, Yu D, Xu Z, Geng X, Wang C (2019b) Multi-person pose estimation with enhanced channel-wise and spatial information. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5674–5682

  73. Sun K, Lan C, Xing J, Zeng W, Liu D, Wang J (2017a) Human pose estimation using global and local normalization. In: Proceedings of the IEEE international conference on computer vision, pp 5599–5607

  74. Sun K, Xiao B, Liu D, Wang J (2019) Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5693–5703

  75. Sun X, Shang J, Liang S, Wei Y (2017b) Compositional human pose regression. In: Proceedings of the IEEE international conference on computer vision, pp 2602–2611

  76. Sun X, Xiao B, Wei F, Liang S, Wei Y (2018) Integral human pose regression. In: Proceedings of the European conference on computer vision (ECCV), pp 529–545

  77. Tang W, Wu Y (2019) Does learning specific features for related parts help human pose estimation? In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1107–1116

  78. Tompson J, Goroshin R, Jain A, LeCun Y, Bregler C (2015) Efficient object localization using convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 648–656

  79. Tompson JJ, Jain A, LeCun Y, Bregler C (2014) Joint training of a convolutional network and a graphical model for human pose estimation. In: Advances in neural information processing systems, pp 1799–1807

  80. Toshev A, Szegedy C (2014) Deeppose: Human pose estimation via deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1653–1660

  81. Wang P, Li W, Ogunbona P, Wan J, Escalera S (2018a) Rgb-d-based human motion recognition with deep learning: a survey. Comput Vis Image Understanding 171:118–139

    Article  Google Scholar 

  82. Wang X (2013) Intelligent multi-camera video surveillance: a review. Pattern Recognit Lett 34(1):3–19

    Article  Google Scholar 

  83. Wang Z, Li W, Yin B, Peng Q, Xiao T, Du Y, Li Z, Zhang X, Yu G, Sun J (2018b) Mscoco keypoints challenge 2018. In: Joint recognition challenge workshop at ECCV 2018, vol 5

  84. Wei SE, Ramakrishna V, Kanade T, Sheikh Y (2016) Convolutional pose machines. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4724–4732

  85. Xia F, Wang P, Chen X, Yuille AL (2017) Joint multi-person pose estimation and semantic part segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6769–6778

  86. Yang F, Chen Y, Pan Z, Zhang M, Xue M, Mo Y, Zhang Y, Guan G, Qian B, Xiao Z, et al. (2020) Train your data processor: Distribution-aware and error-compensation coordinate decoding for human pose estimation. arXiv:2007.05887

  87. Yang S, Yang W, Cui Z (2019) Pose neural fabrics search. arXiv:1909.07068

  88. Yang W, Ouyang W, Li H, Wang X (2016) End-to-end learning of deformable mixture of parts and deep convolutional neural networks for human pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3073–3082

  89. Yang W, Li S, Ouyang W, Li H, Wang X (2017) Learning feature pyramids for human pose estimation. In: Proceedings of the IEEE international conference on computer vision, pp 1281–1290

  90. Yang Y, Ramanan D (2011) Articulated pose estimation with flexible mixtures-of-parts. In: CVPR 2011. IEEE, pp 1385–1392

  91. Zhang F, Zhu X, Dai H, Ye M, Zhu C (2020) Distribution-aware coordinate representation for human pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7093–7102

  92. Zhang H, Ouyang H, Liu S, Qi X, Shen X, Yang R, Jia J (2019a) Human pose estimation with spatial contextual information. arXiv:1901.01760

  93. Zhang P, Lan C, Xing J, Zeng W, Xue J, Zheng N (2019b) View adaptive neural networks for high performance skeleton-based human action recognition. IEEE Trans Pattern Anal Mach Intell 41(8):1963–1978

  94. Zheng L, Huang Y, Lu H, Yang Y (2019) Pose-invariant embedding for deep person re-identification. IEEE Trans Image Process 28(9):4500–4509

    Article  MathSciNet  MATH  Google Scholar 

  95. Zoph B, Le QV (2016) Neural architecture search with reinforcement learning. arXiv:1611.01578

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chen Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, F., Zhu, X. & Wang, C. A Comprehensive Survey on Single-Person Pose Estimation in Social Robotics. Int J of Soc Robotics 14, 1995–2008 (2022). https://doi.org/10.1007/s12369-020-00739-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12369-020-00739-5

Keywords

Navigation