From circle to 3-sphere: Head pose estimation by instance parameterization

https://doi.org/10.1016/j.cviu.2015.03.008Get rights and content

Highlights

  • Coarse-to-fine framework for 3-dimensional head pose estimation.

  • Parameterize instance factors in a generative mannar.

  • Uniform embedding in a novel direction alleviates manifold degradation.

  • Outperform state-of-the-arts on multiple challenging databases.

Abstract

Three-dimensional head pose estimation from a single 2D image is a challenging task with extensive applications. Existing approaches lack the capability to deal with multiple pose-related and -unrelated factors in a uniform way. Most of them can provide only one-dimensional yaw estimation and suffer from limited representation ability for out-of-sample testing inputs. These drawbacks lead to limited performance when extensive variations exist on faces in-the-wild. To address these problems, we propose a coarse-to-fine pose estimation framework, where the unit circle and 3-sphere are employed to model the manifold topology on the coarse and fine layer respectively. It can uniformly factorize multiple factors in an instance parametric subspace, where novel inputs can be synthesized under a generative framework. Moreover, our approach can effectively avoid the manifold degradation problem when 3D pose estimation is performed. The results on both experimental and in-the-wild databases demonstrate the validity and superior performance of our approach compared with the state-of-the-arts.

Introduction

The task of inferring the orientation of human head is defined as head pose estimation. In the computer vision context, head pose estimation is specified as processing steps to transform a pixel-based digital image representation of a head into a high-level concept of direction [1]. Many tasks related to face analysis rely on accurate head pose estimation. For instance, a multi-pose face recognition system can carry out head pose estimation first and then select face images with similar poses for matching; a 3D face tracker can use head pose information to render the face model for the optimal fitting. Other applications of head pose estimation include inferring human gaze direction in the human–computer interaction (HCI) system, monitoring driver awareness for safety driving [2] and inferring the intentions of people in both verbal, and nonverbal communication environments [3].

It is often assumed that the human head is a rigid object and three Euler angles, pitch, yaw and roll, can be used to depict the head orientation [4]. Estimating the three angles from a single 2D image is a challenging task, since there exists extensive variations among pose-unrelated factors such as identity, facial expression, illumination condition and other latent variables. Fig. 1 shows the examples of the variations. In many cases, these pose-unrelated factors play a more significant role on the appearance variations than pose changes do [5], [6], [7]. Therefore, extracting information that ensures pose changes can dominate over pose-unrelated factors is a crucial point in designing the head pose estimator.

Numerous of approaches have been proposed over the past decades for head pose estimation. We arrange existing methods into four categories: classification-based approaches [8], [9], regression-based approaches [10], [11], [12], [7], [13], deformable-model-based approaches [14], [15], [16] and manifold-embedding based approaches [17], [18], [5], [6]. Classification-based approaches are limited to estimate 1-dimensional (only yaw angle) discrete head pose. Regression-based approaches can predict 3-dimensional continuous pose efficiently, but they are extremely sensitive to noise and pose-unrelated factors. Deformable-model-based approaches rely on the localization of facial landmarks, which limits their capability to handle extensive instance variations especially in low resolution images. Manifold-embedding-based approaches assume that facial images with consecutive head poses can be viewed as nearby points lying on a low-dimensional manifold embedded in the high-dimensional feature space. Head pose angles can be recovered by measuring the points’ distribution in the manifold embedding space.

Although manifold-embedding-based approaches have achieved great success in the former research, they still suffer from multiple limitations. First, there is no guarantee that pose-related factors can dominate over pose-unrelated factors in the manifold embedding process, since pose-unrelated factors will distort the manifold building process and result in geometry deformation across instance manifolds (different combinations of identity, expression and illumination) [1]. Though various approaches [17], [5], [6] have been proposed to partially solve this problem, they either focus on single pose-unrelated factor like identity while ignoring the others [17], or cannot handle multiple pose-unrelated factors in a uniform way [6]. Second, former methods tries to learn the mapping from the high-dimensional feature space to the low-dimensional manifold embedded representation. This mapping direction would cause manifold degradation (highly folded or self-intersection) [19] when the manifold topology is complicate (in the case of 3-dimensional pose estimation). Hence, most manifold-embedding-based estimators are limited to provide only 1-dimensional yaw estimation while ignoring the pitch and roll variations. Third, the projections from the image feature space to the low-dimensional manifold are defined only on the training space [20]. The entire embedding procedure has to be repeated since they lack of the ability to depict the out-of-sample inputs in an efficient way [5].

To address the limitations of existing methods, we propose a manifold-embedding-based coarse-to-fine framework for 3-dimensional head pose estimation. This approach employs the unit circle and 3-sphere to model the uniform manifold topology on the coarse and fine layer respectively. By learning the instance-dependent nonlinear mappings from the unit circle or 3-sphere to every instance manifold (certain person with certain expression under certain illumination condition), the pose-related and -unrelated factors can be decoupled in a latent instance parametric subspace. The basic idea is that pose-unrelated factors dominate the geometry deformations across different instance manifolds. Hence, we can factorize the instance variations, which are encoded in the geometry deformations, in the instance parametric subspace.

There are several merits of our approach. First, the coarse-to-fine framework guarantees the efficient and accurate 3-dimensional continuous head pose estimation. Second, it can uniformly parameterize multiple pose-related and -unrelated factors under a uniform framework in the latent space. Third, the designed mapping direction of the manifold embedding, which is completely different from the existing methods, can effectively avoid the manifold degradation problem when 3-dimensional pose estimation is performed. Last but not least, the out-of-sample data can be effectively synthesized in the instance parametric subspace, which guarantees the generative ability of our approach.

The remainder of this paper is organized as follows. We briefly review existing head pose estimation approaches in Section 2. Section 3 elaborates the motivation and details of our approach. In Section 4, We carry out experiments in multiple databases to verify our approach and compare its performance with the state-of-the-art. Section 5 summarizes paper.

Section snippets

Background

Head pose estimation from visual perception has been a broad and diverse field for decades. To motivate our approach, we summarize existing methods and briefly review the most representative and related works.

Approach

This section describes our approach in details. First, we discuss the motivation of the coarse-to-fine pose estimation framework. Then we propose the instance parametric subspace and the uniform geometry representation. The instance parameterization can be achieved by conducting instance-dependent mappings and pose-related/unrelated factorization in the subspace. Finally, an efficient pose referring solution is provided to estimate head pose in the testing image. An overview of our approach is

Experiments

In this section, we carry out a series of experiments to demonstrate the validity of our approach and evaluate its performance. Several state-of-the-art approaches are compared with our approach on both experimental and faces in-the-wild databases.

Conclusion

In this paper, we presented a novel head pose estimation approach. We propose the instance parametric subspace to handle multiple instance variations in a generative way. The coarse-to-fine framework, which employs a unit circle on the coarse layer and a 3-sphere on the fine layer to model the uniform geometry representation, can significantly alleviate the manifold degradation problem by learning instance-dependent nonlinear mappings in an unconventional direction. Experiments on both

References (37)

  • V. Balasubramanian, J. Ye, S. Panchanathan, Biased manifold embedding: a framework for person-independent head pose...
  • C. BenAbdelkader, Robust head pose estimation using supervised manifold learning, in: European Conference on Computer...
  • D. Huang, M. Storer, F. De la Torre, H. Bischof, Supervised local subspace learning for continuous head pose...
  • J. Huang, X. Shao, H. Wechsler, Face pose discrimination using support vector machines (SVM), in: International...
  • R. Rae et al.

    Recognition of human head orientation based on artificial neural networks

    IEEE Trans. Neural Netw.

    (1998)
  • Y. Ma, Y. Konishi, K. Kinoshita, S. Lao, M. Kawade, Sparse bayesian regression for head pose estimation, in:...
  • A. Ranganathan, M.-H. Yang, Online sparse matrix gaussian process regression and vision applications, in: European...
  • M. Haj, J. Gonzalez, L. Davis, On partial least squares in head pose estimation: how to simultaneously deal with...
  • Cited by (39)

    • Locality constraint distance metric learning for traffic congestion detection

      2018, Pattern Recognition
      Citation Excerpt :

      Thus, the influence among different scenes will affect the prediction of Metric Learning for Kernel Regression, and locality constraint can reduce that influence efficiently. To confirm the effectiveness of locality constraint metric learning, the Metric Learning for Kernel Regression (MLKR) [28], Local Linear Embedding (LLE) [46,47], and Robust Principal Component Analysis (RPCA) [48,49] are included as the comparison. A visualization of the regression results is shown in Fig. 11.

    • Multi-level structured hybrid forest for joint head detection and pose estimation

      2017, Neurocomputing
      Citation Excerpt :

      Wu et al. [15] proposed a two-stage framework for head pose estimation based on a geometrical structure. Peng et al. [16] proposed a coarse-to-fine pose estimation framework, where the unit circle and 3-sphere are employed to model the manifold topology on the coarse and fine layers, respectively. The pose-related and unrelated factors can be decoupled in a latent instance parametric subspace.

    • Disentangled Representation Learning and Its Application to Face Analytics

      2021, Advances in Computer Vision and Pattern Recognition
    • FASHE: A FrActal Based Strategy for Head Pose Estimation

      2021, IEEE Transactions on Image Processing
    • The use of 3D imaging to determine the orientation and location of the object based on the CAD model

      2019, Proceedings of SPIE - The International Society for Optical Engineering
    View all citing articles on Scopus
    View full text