A survey of human pose estimation: The body parts parsing based methods☆
Introduction
Human pose estimation (HPE) is the process of inferring the 2D or 3D human body part positions from still images or videos. Conventional HPE methods usually employ extra hardware devices to capture human poses and construct a human skeleton based on the captured body joints. These methods are either expensive or inefficient. During the past decade, considerable research efforts have been devoted to HPE problem in computer vision domain.
Although having investigated the issues of human body part configuration, human body detection and human motion [1] in the previous studies, there still lacks a survey to summarize the most recent progress on body pose estimation. In this survey, we mainly review the recent advances in vision-based human pose estimation. Human pose estimation includes nearly all the human-related problems in computer vision, ranging from the whole human body pose parsing to the detailed body parts localization. As it is hard to cover all these fields within a single survey, we mainly focus on the body part parsing methods. For better comparison of different body part parsing methods, we divide them into four parts, including 2D single person parsing in images, 2D multi-person parsing in images, 2D single person parsing in videos and 3D single person parsing in images and videos. Moreover, we discuss the limitations of the existing approaches and foresee the future trend.
Human pose estimation techniques become more and more mature in the past decades. Being the great interest of different domains, new applications constantly emerge along with the technological evolutions. Human pose estimation is not only an important computer vision problem, but also plays critical role in a variety of real-world applications in the following.
Video Surveillance. Video surveillance aims at tracking and monitoring the locations and motions of pedestrians in special circumstances. It is the earliest application area that HPE technologies have been used. The common scenes are the supermarket and airport passageway.
Human–Computer Interaction (HCI). Advanced human computer interaction systems with human pose estimation have been developed rapidly. In these systems, instructions can be analyzed accurately by capturing the human body poses. In recent years, intelligence driving emerges as a novel practical application.
Digital Entertainment. Digital entertainment, including computer games, computer animation and films, has become a huge industry and an active domain in recent years. For instance, People enjoys the pleasure the body sensor games give to them. Also, In the pre-production of the special effects for movie Avatar [2], actors wear the special equipments to animate the activities of Avatars.
Medical Imaging. Human pose estimation has been widely used in the automatic medical field. A specific instance is that HPE can be used to assist doctors to check patients’ activities from the remote monitor, which greatly simplifies the therapeutic process.
Sports Scenes. In sports news and live broadcast, human pose estimation is employed to track athletes’ locations and activities. Moreover, the estimated poses can be used to employed the detailed movements of their actions.
Other applications include military, children mental development, virtual reality, and so on. The related application fields of HPE are shown in Fig. 1.
In recent years, various devices and commercial systems have been released accompany with HPE technology, including Microsoft Kinect sensor [3], [4], Leap Motion [5], body mounted camera [6], 3D laser scanner [7] and infrared light source [8]. These commercial systems have quite different implementation principles and application fields, as shown in Table 1.
Section snippets
Related surveys and overview
During the last decade, several surveys have been published to summarize the related work on human pose estimation. 3D HPE has attracted lots of attentions in computer vision. For instance, Hen and Paramesran [13] summarize the single camera 3D pose estimation from images and Sminchisescu [14] aims to reconstruct 3D human poses from monocular video sequences. Wearable equipments make it possible to estimate the depth in motion capture, Helten et al. [15] review the depth camera based motion
Preprocessing work
The preprocessing stage for HPE includes camera calibration, foreground segmentation and human body detection, in this section we review the recent advance on these techniques.
Body parts parsing
Body parts parsing aims at locating different body parts in the images, which is the most important step in human pose estimation. In this section, we review the recent technique advances in parsing human body parts.
The body parsing methods varying from 2D body parsing to 3D body parsing, and from images to videos. To make a clear illustration, we divided these methods into four subcategories, which are single person parsing in single 2D images, single person parsing in 3D images/videos, single
Datasets
Due to the large variations in different scenes, it is difficult to build a universal dataset to evaluate the human pose estimation. Alternatively, researchers have created lots of datasets to evaluate their proposed techniques for the specific task, which makes the fair comparison on the different algorithms even harder.
We summarize the current publicly available datasets into Table 5. HumanEva [90] dataset is made of a number of images capturing the synchronized people performing the
Future work and conclusion
Due to challenges ranging from most of the important topics in computer vision domain, estimating human poses from images and videos is always hard. This survey summarize the recent research efforts on this problem.
However, these technologies are limited especially for the irregular poses. A future trend is to explore the unsupervised or semi-supervised learning in body parts parsing. Over-segmentation is useful to keep the contour information, which is a promising preprocessing technique.
Acknowledgments
The authors appreciate the reviewers for their extensive and informative comments for the improvement of this manuscript. This work was supported in part by National Natural Science Foundation of China under the Grant (61103105), National High Technology Research and Development Program of China (2013AA040601).
References (104)
Vision-based human motion analysis: an overview
Comput. Vis. Image Underst.
(2007)A survey on vision-based human action recognition
Image Vis. Comput.
(2010)- et al.
Free viewpoint action recognition using motion history volumes
Comput. Vis. Image Underst.
(2006) - et al.
Human body part estimation from depth images via spatially-constrained deep learning
Pattern Recogn. Lett.
(2014) - et al.
An adaptable system for rgb-d based human body detection and pose estimation
J. Visual Commun. Image Represent.
(2014) - et al.
Parametric annealing: a stochastic search method for human pose tracking
Pattern Recogn.
(2013) - J. Cameron, Avatar....
- et al.
Enhanced computer vision with microsoft kinect sensor: a review
IEEE Trans. Cybern.
(2013) - et al.
Efficient human pose estimation from single depth images
IEEE Trans. Pattern Anal. Mach. Intell.
(2013) - et al.
Analysis of the accuracy and robustness of the leap motion controller
Sensors
(2013)
Segmentation and modeling of full human body shape from 3-d scan data: a survey
IEEE Trans. Syst. Man Cybern. Part C
Scanning 3d full human bodies using kinects
IEEE Trans. Visual Comput. Graphics
Human–computer interaction based on hand gestures using rgb-d sensors
Sensors
Analysis of the accuracy and robustness of the leap motion controller
Sensors
Lean on Wii: Physical rehabilitation with virtual reality and Wii peripherals
Annu. Rev. CyberTherapy Telemedicine
2d articulated human pose estimation and retrieval in (almost) unconstrained still images
Int. J. Comput. Vision
Articulated human detection with flexible mixtures of parts
IEEE Trans. Pattern Anal. Mach. Intell.
Cited by (114)
Towards reliable multi-person pose estimation using Conditional Random Fields
2023, Pattern Recognition LettersA Systematic Review of Recent Deep Learning Approaches for 3D Human Pose Estimation
2023, Journal of ImagingHuman Pose Estimation Using Deep Learning: A Systematic Literature Review
2023, Machine Learning and Knowledge Extraction
- ☆
This paper has been recommended for acceptance by M.T. Sun.