Recovering 3D human pose based on biomechanical constraints, postures comfort and image shading
Introduction
The way a person poses in front of camera can indicate his/her emotions, attitudes and intentions. Recovering 3D human pose from video streams or images can be useful in areas such as sport performance analysis, automatic search in image databases, avatar reconstruction, and person identification systems, among other applications. Indeed, many applications can benefit from this technology that can deal with data coming from single images or videos. However, besides the large range of applications, this is an open research field due to the variability of possible human movements, partial occlusions and the restrictions imposed by loss of information when the 3D world is mapped into a single 2D image. Considerable effort has been done in order to solve such problems. Several papers surveying the state of the art in the area are available in the literature, providing good summaries of the models currently being developed (Agarwal and Triggs, 2006, Moeslund et al., 2006). The recovery of human poses can deal with information from a single image, where there is no depth information, or stream video, which presents motion and time information involved in the process. The output of pose recovery algorithms also can vary in nature, being a pose in two or three dimensions, depending on the needs of the solution.
An important problem in 3D human pose recovery is the ambiguity. Such ambiguity is generated when estimating three-dimensional positions from a 2D image, since many 3D postures can present same 2D projections, and it is inherent to the loss of depth information present in 2D images (Hua, Yang, & Wu, 2005). Many authors (Jiang, 2010, Pishchulin et al., 2012, Wei and Jinxiang, 2009, Fergie and Galata, 2013) mention this as one of the main problems to obtain the 3D human posture. In the study developed by Agarwal and Triggs (2004), the authors define this as an intrinsic challenge to estimate 3D poses. The treatment of ambiguity is dealt in several ways. For instance, Wei and Jinxiang (2009) propose a method that uses a set of biomechanical restrictions on the angles of the body joints to eliminate ambiguity. In the study by Lee and Cohen (2006), the ambiguity is handled through an approach that employs Markov Chains. Moeslund et al. (2006) uses techniques based on kinematic constraints and movements to treat ambiguity based on motion capture. Moreover, in the study of state of the art we find methods that are focused on a particular controlled situation, usually requiring databases containing posture samples. It is the case of the model proposed by Mori, Ren, Efros, and Malik (2004), who developed an approach for obtaining poses of baseball players.
In this paper, we propose a new model to handle the ambiguity problem, by initially using a set of biomechanical restrictions to obtain a set of possible 3D poses from 2D images. The resulting set is then ranked based on a comfort criterion that encodes assumptions in terms of pose equilibrium. In addition, a luminosity criterion that considers the original image lighting to improve the rank in the set of postures is also used. It is important to emphasize that our paper does not have a main focus of application, e.g. baseball players (Mori et al., 2004). However, our scope is more focused on frontal poses in images without perspective deformation. The main contributions are the use of the biomechanical model, which reduces the number of possible 3D poses by discarding impossible postures, and the ranking of more appropriate poses based on comfort of posture and luminosity of the scene. While the biomechanical model and posture comfort deal with joints positions, the luminosity of the scene considers the pixels of the original processed image. The experimental results show that our model brings new ground to the area when compared to the competitive approach.
This paper is organized as follows: Section 2 presents an overview of the main techniques currently developed. In Section 3 we present our model to recovery human 3D poses from single images. In Section 4, experimental results are discussed. Finally, the Section 5 shows the limitations and conclusions of the proposed model.
Section snippets
Related work
Although many efforts have been employed to obtain the 3D human poses, there is still no well-defined taxonomy in the literature to deal with such problem. Hu, Wang, Lin, and Yan (2009) suggest two main categories: models that get the human posture from video sequences (Agarwal and Triggs, 2006, Chen et al., 2011, Lee and Nevatia, 2007, Menier et al., 2006) and the models that recover the pose from static images such as photographs, which is the case of present work. The models based on static
Proposed model
The problem of estimating the 3D pose of a person in videos have received special attention in the literature of computer vision as discussed in Section 2. This is partly due to the fact that solutions to this problem can be employed in a wide range of applications. However, less attention has been given to the problem of determining the 3D human pose based on a single image. In fact, this problem is a challenge because the restrictions of 2D images are often not sufficient to determine poses
Results
In this section we describe the experiments performed in order to perform a validation of presented model. Initially, we selected 430 images containing people in various postures. These images were obtained in databases available on the Internet according to the work (Bourdev and Malik, 2009, Dalal and Triggs, 2005, Ferrari et al., 2008). For each image, it was generated the ground truth of the human pose in the image (defined through the sign of of each joint i). The ground truth process
Conclusions
This paper described a model for the recovery of 3D poses from a single 2D image. We verified through the literature investigation that some challenges in this area are still open and there is no definitive solution to the problem. Characteristics such as perspective, lighting, noise, partial occlusions, different clothes and the ambiguity in 3D poses are examples of challenges that need to be solved. In particular, the approach proposed in this paper aimed to minimize the ambiguous 3D postures
References (31)
- et al.
3D human pose recovery from image by efficient visual feature selection
Computer Vision and Image Understanding
(2011) - et al.
Mixtures of gaussian process models for human pose estimation
Image and Vision Computing
(2013) - et al.
Motion capture and human pose reconstruction from a single-view video sequence
Digital Signal Processing
(2013) - et al.
Recovery of upper body poses in static images based on joints detection
Pattern Recognition Letters
(2009) - et al.
Body part detection for human pose estimation and tracking
- et al.
A survey of advances in vision-based human motion capture and analysis
Computer Vision Image Understanding
(2006) Reconstruction of articulated objects from point correspondences in a single uncalibrated image
Computer Vision and Image Understanding
(2000)- Agarwal, A., & Triggs, B. (2006). A local basis representation for estimating human pose from cluttered images. In...
- et al.
3D human pose from silhouettes by relevance vector regression
- et al.
Recovering 3D human pose from monocular images
Pattern Analysis and Machine Intelligence
(2006)
Cited by (5)
TSwinPose: Enhanced monocular 3D human pose estimation with JointFlow
2024, Expert Systems with ApplicationsA3GC-IP: Attention-oriented adjacency adaptive recurrent graph convolutions for human pose estimation from sparse inertial measurements
2023, Computers and Graphics (Pergamon)Prior-knowledge-based self-attention network for 3D human pose estimation
2023, Expert Systems with ApplicationsFeature extraction in Brazilian Sign Language Recognition based on phonological structure and using RGB-D sensors
2014, Expert Systems with ApplicationsCitation Excerpt :More information about the sensor Kinect can be found in Cruz, Lucio, and Velho (2012) and Mankoff and Russo (2013). For HGR we can see the use of Kinect in the following work: (Chaaraoui, Padilla-López, Climent-Pérez, & Flórez-Revuelta, 2014; Chen et al., 2013; Dihl & Musse, in press; Frati & Prattichizzo, 2011; Li, 2012; Palacios et al., 2013; Ramey, González-Pacheco, & Salichs, 2011; Ramirez-Giraldo, Molina-Giraldo, Alvarez-Meza, Daza-Santacoloma, & Castellanos-Dominguez, 2012; Suarez & Murphy, 2012). In the specific case of SLR, Kinect has been used in the papers presented by Zafrulla, Brashear, Starner, Hamilton, and Presti (2011), Uebersax, Gall, Van den Bergh, and Van Gool (2011), Zaki and Shaheen (2011), Phadtare, Kushalnagar, and Cahill (2012), Boulares and Jemni (2012) and Oszust and Wysocki (2013).
Increasing postural deformity trends and body mass index analysis in school-age children
2018, Zdravstveno Varstvo