Elsevier

Expert Systems with Applications

Volume 41, Issue 14, 15 October 2014, Pages 6305-6314
Expert Systems with Applications

Recovering 3D human pose based on biomechanical constraints, postures comfort and image shading

https://doi.org/10.1016/j.eswa.2014.03.049Get rights and content

Highlights

  • We introduce a set of biomechanical constraints to reduce the number of postures.

  • We verify that in 341 images the correct posture is ranked in the first 10 positions.

  • The result was that for 92% of the images, the model ranked the correct answer.

Abstract

This paper presents a new model to identify 3D human poses in pictures, given a single input image. The proposed approach is based on a well known model found in the literature, including improvements in terms of biomechanical restrictions aiming to reduce the number of 3D possible postures that correctly represent the pose in the 2D image. Since the generated set of poses can have more than one possible posture, we propose a ranking system in order to suggest the best generated postures according to a “comfort” criterion and shading characteristics in the image as well. The comfort criterion adopts assumptions in terms of pose equilibrium, while the shading criterion eliminates the ambiguities of postures taken into account the image illumination. We must emphasize that the removal of ambiguous 3D poses related to a single image is the main focus of this work. The achieved results were analyzed w.r.t. visual inspection of users as well as a state of the art technique and indicate that our model contributed in some way to the solution of that challenge problem.

Introduction

The way a person poses in front of camera can indicate his/her emotions, attitudes and intentions. Recovering 3D human pose from video streams or images can be useful in areas such as sport performance analysis, automatic search in image databases, avatar reconstruction, and person identification systems, among other applications. Indeed, many applications can benefit from this technology that can deal with data coming from single images or videos. However, besides the large range of applications, this is an open research field due to the variability of possible human movements, partial occlusions and the restrictions imposed by loss of information when the 3D world is mapped into a single 2D image. Considerable effort has been done in order to solve such problems. Several papers surveying the state of the art in the area are available in the literature, providing good summaries of the models currently being developed (Agarwal and Triggs, 2006, Moeslund et al., 2006). The recovery of human poses can deal with information from a single image, where there is no depth information, or stream video, which presents motion and time information involved in the process. The output of pose recovery algorithms also can vary in nature, being a pose in two or three dimensions, depending on the needs of the solution.

An important problem in 3D human pose recovery is the ambiguity. Such ambiguity is generated when estimating three-dimensional positions from a 2D image, since many 3D postures can present same 2D projections, and it is inherent to the loss of depth information present in 2D images (Hua, Yang, & Wu, 2005). Many authors (Jiang, 2010, Pishchulin et al., 2012, Wei and Jinxiang, 2009, Fergie and Galata, 2013) mention this as one of the main problems to obtain the 3D human posture. In the study developed by Agarwal and Triggs (2004), the authors define this as an intrinsic challenge to estimate 3D poses. The treatment of ambiguity is dealt in several ways. For instance, Wei and Jinxiang (2009) propose a method that uses a set of biomechanical restrictions on the angles of the body joints to eliminate ambiguity. In the study by Lee and Cohen (2006), the ambiguity is handled through an approach that employs Markov Chains. Moeslund et al. (2006) uses techniques based on kinematic constraints and movements to treat ambiguity based on motion capture. Moreover, in the study of state of the art we find methods that are focused on a particular controlled situation, usually requiring databases containing posture samples. It is the case of the model proposed by Mori, Ren, Efros, and Malik (2004), who developed an approach for obtaining poses of baseball players.

In this paper, we propose a new model to handle the ambiguity problem, by initially using a set of biomechanical restrictions to obtain a set of possible 3D poses from 2D images. The resulting set is then ranked based on a comfort criterion that encodes assumptions in terms of pose equilibrium. In addition, a luminosity criterion that considers the original image lighting to improve the rank in the set of postures is also used. It is important to emphasize that our paper does not have a main focus of application, e.g. baseball players (Mori et al., 2004). However, our scope is more focused on frontal poses in images without perspective deformation. The main contributions are the use of the biomechanical model, which reduces the number of possible 3D poses by discarding impossible postures, and the ranking of more appropriate poses based on comfort of posture and luminosity of the scene. While the biomechanical model and posture comfort deal with joints positions, the luminosity of the scene considers the pixels of the original processed image. The experimental results show that our model brings new ground to the area when compared to the competitive approach.

This paper is organized as follows: Section 2 presents an overview of the main techniques currently developed. In Section 3 we present our model to recovery human 3D poses from single images. In Section 4, experimental results are discussed. Finally, the Section 5 shows the limitations and conclusions of the proposed model.

Section snippets

Related work

Although many efforts have been employed to obtain the 3D human poses, there is still no well-defined taxonomy in the literature to deal with such problem. Hu, Wang, Lin, and Yan (2009) suggest two main categories: models that get the human posture from video sequences (Agarwal and Triggs, 2006, Chen et al., 2011, Lee and Nevatia, 2007, Menier et al., 2006) and the models that recover the pose from static images such as photographs, which is the case of present work. The models based on static

Proposed model

The problem of estimating the 3D pose of a person in videos have received special attention in the literature of computer vision as discussed in Section 2. This is partly due to the fact that solutions to this problem can be employed in a wide range of applications. However, less attention has been given to the problem of determining the 3D human pose based on a single image. In fact, this problem is a challenge because the restrictions of 2D images are often not sufficient to determine poses

Results

In this section we describe the experiments performed in order to perform a validation of presented model. Initially, we selected 430 images containing people in various postures. These images were obtained in databases available on the Internet according to the work (Bourdev and Malik, 2009, Dalal and Triggs, 2005, Ferrari et al., 2008). For each image, it was generated the ground truth of the human pose in the image (defined through the sign of ΔZ of each joint i). The ground truth process

Conclusions

This paper described a model for the recovery of 3D poses from a single 2D image. We verified through the literature investigation that some challenges in this area are still open and there is no definitive solution to the problem. Characteristics such as perspective, lighting, noise, partial occlusions, different clothes and the ambiguity in 3D poses are examples of challenges that need to be solved. In particular, the approach proposed in this paper aimed to minimize the ambiguous 3D postures

References (31)

  • Andriluka, M., Roth, S., & Schiele, B. (2009). Pictorial structures revisited: People detection and articulated pose...
  • Bourdev, L., & Malik, J. (2009). Poselets: Body part detectors trained using 3D human pose annotations. In IEEE 12th...
  • Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In Computer vision and pattern...
  • Ferrari, V., Marin-Jimenez, M., & Zisserman, A. (2008). Progressive search space reduction for human pose estimation....
  • Hornung, A., Deckers, E., & Kobbelt, L. (2007). Character animation from 2D pictures and 3D motion data. In ACM...
  • Cited by (5)

    • Feature extraction in Brazilian Sign Language Recognition based on phonological structure and using RGB-D sensors

      2014, Expert Systems with Applications
      Citation Excerpt :

      More information about the sensor Kinect can be found in Cruz, Lucio, and Velho (2012) and Mankoff and Russo (2013). For HGR we can see the use of Kinect in the following work: (Chaaraoui, Padilla-López, Climent-Pérez, & Flórez-Revuelta, 2014; Chen et al., 2013; Dihl & Musse, in press; Frati & Prattichizzo, 2011; Li, 2012; Palacios et al., 2013; Ramey, González-Pacheco, & Salichs, 2011; Ramirez-Giraldo, Molina-Giraldo, Alvarez-Meza, Daza-Santacoloma, & Castellanos-Dominguez, 2012; Suarez & Murphy, 2012). In the specific case of SLR, Kinect has been used in the papers presented by Zafrulla, Brashear, Starner, Hamilton, and Presti (2011), Uebersax, Gall, Van den Bergh, and Van Gool (2011), Zaki and Shaheen (2011), Phadtare, Kushalnagar, and Cahill (2012), Boulares and Jemni (2012) and Oszust and Wysocki (2013).

    View full text