Elsevier

Pattern Recognition Letters

Volume 45, 1 August 2014, Pages 145-153
Pattern Recognition Letters

Head pose estimation using image abstraction and local directional quaternary patterns for multiclass classification

https://doi.org/10.1016/j.patrec.2014.03.017Get rights and content

Highlights

  • Cartoon-like facial contour images can abstract properties of common facial poses.

  • Our methods reduced the noise caused by variations in identity, facial expressions.

  • Facial binary images delivered good results without any illumination compensation.

  • LDQP method with image abstraction outperformed state of the art algorithms.

Abstract

This study treats the problem of coarse head pose estimation from a facial image as a multiclass classification problem. Head pose estimation continues to be a challenge for computer vision systems because extraneous characteristics and factors that lack pose information can change the pixel values in facial images. Thus, to ensure robustness against variations in identity, illumination conditions, and facial expressions, we propose an image abstraction method and a new representation method (local directional quaternary patterns, LDQP), which can remove unnecessary information and highlight important information during facial pose classification. We verified the efficacy of the proposed methods in experiments, which demonstrated its effectiveness and robustness against different types of variation in the input images.

Introduction

The natural interaction between people and computers is an important research topic, which has recently attracted considerable attention. This research area addresses natural user interfaces (NUI) between humans and computers. A NUI is a human–machine interface that does not employ input devices. NUI is a natural interaction method that resembles communication between people. Various related research areas aim to achieve non-intrusive and natural human–computer interaction (HCI), such as face recognition, facial expression recognition, activity recognition, and gesture recognition. As a starting point of these techniques, head pose estimation is a crucial technology that aims to predict human intentions to facilitate the use of non-verbal cues for communication via NUIs. Thus, people can estimate the orientation of another persons head to understand whether they want to interact with them.

Head pose estimation is a technique that aims to determine three-dimensional (3D) orientation properties from an image of a human head. In 3D space, objects have geometric properties with six degrees of freedom for rigid body motion, i.e., three rotations and three translation vectors. Head pose estimation methods are usually designed to extract head angular information in terms of the pitch and yaw rotations of a facial image. The pitch and yaw are more difficult to estimate than other properties, such as the roll angle, two-dimensional (2D) translation, and scale, which can be calculated easily using 2D face detection techniques, because of occlusions by features such as glasses, beards, hair, and head angle changes. Different identity, illumination, and facial expression conditions are also serious hindrances when extracting the angular properties of head images.

Section snippets

Related works

In recent years, many methods have been developed for estimating 3D human head poses using facial RGB images. These studies can be categorized into methods based on classification and those based on regression with machine learning techniques. These methods simply aim to determine whether the pose space is discrete or continuous. The advantages of these classification approaches are that they are comparatively simple and control pose training datasets can be used for training sessions. These

System overview

Humans can recognize head poses by detecting simple sets of edges, similar to cartoon faces. People are capable of identifying simple head poses because they can abstract the features of faces intuitively. In particular, people innately recognize the shapes, configurations, or contours of trained features such as eyes, noses, mouths, eyebrows, foreheads, and chins. Thus, people can remember abstracted images of heads by inference from trained data. The basic concept of our system was designed

Image abstraction

Image abstraction removes unnecessary information and emphasizes the main contents by reinterpreting scene information. This process can help viewers to capture specific visual information. We use an image abstraction method to interpret facial images. Our image abstraction method is shown in Fig. 2. The proposed algorithm performs GrabCut segmentation [8] using the rectangular area of a face. Next, to generate a cartoon-like effect, bilateral filtering [9] is applied to remove some of the

Representation

The image abstraction method acquires pose information from a facial image, but it is possible that it also removes color and texture information, which could be valuable for pose estimation. This information could be regarded as noise during pose estimation, but it may be useful for obtaining location information related to the eyes, nose, or mouth. To overcome this problem, we propose our new LDQP representation method for binary images. LDQP is a variant of local binary patterns (LBP) [14]

Experiments

We evaluated our approach using the Labeled Faces in the Wild (LFW) database [15]), the Multi-PIE face database [16], and the Pointing database [17] in qualitative and quantitative experiments. The Viola–Jones face detector, GrabCut, bilateral filtering, and support vector machine (SVM) algorithms were implemented using the OpenCV library [18].

Conclusion and future works

In this study, we proposed an image abstraction and representation method for head pose estimation, which we applied successfully to the multiclass classification problem. Cartoon-like facial contour images were used to abstract the characteristics of common facial poses. These images reduced the noise caused by variations in identity, illumination conditions, and facial expressions. The representation method, LDQP, with the image abstraction outperformed state of the art methods in terms of

Acknowledgments

This work was supported by the IT R&D program of MKE & KEIT [10041610, The development of the recognition technology for user identity, behavior and location that has a performance approaching recognition rates of 99% on 30 people by using perception sensor network in the real environment]. This work was also supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (MEST).

References (24)

  • T. Cootes et al.

    Active shape models-their training and application

    Comput. Vision Image Underst.

    (1995)
  • D. Cristinacce et al.

    Automatic feature localisation with constrained local models

    Pattern Recognit.

    (2008)
  • R. Gross et al.

    Multi-PIE

    Image Vision Comput.

    (2010)
  • E. Murphy Chutorian et al.

    Head pose estimation in computer vision: a survey

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2009)
  • T. Cootes et al.

    Active appearance models

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2001)
  • H. Winnemöller et al.

    Real-time video abstraction

    ACM Trans. Graph.

    (2006)
  • A. Puri, B. Lall, Exploiting perception for face analysis: image abstraction for head pose estimation, in: computer...
  • B. Han, Y. Chae, Y.H. Seo, H. Yang, Head Pose Estimation Based on Image Abstraction for Multiclass Classification, in:...
  • C. Rother et al.

    GrabCut: interactive foreground extraction using iterated graph cuts

    ACM Trans. Graph.

    (2004)
  • C. Tomasi, R. Manduchi, Bilateral filtering for gray and color images, in: Sixth International Conference on Computer...
  • D. Lowe, Object recognition from local scale-invariant features, in: The Proceedings of the Seventh IEEE International...
  • Y. Boykov, M.P. Jolly, Interactive graph cuts for optimal boundary amp; region segmentation of objects in N-D images,...
  • Cited by (21)

    • Kernelized dual regression incorporating local information for image set classification

      2020, Pattern Recognition Letters
      Citation Excerpt :

      It should be noted that the proposed KDR can achieve better performance in face set recognition than the existing methods but it could be affected by pose variation. Recently, many pose estimation methods are proposed in smart cities such as web-shaped model [3] and local directional quaternary patterns [16]. Can these pose estimation methods be used for face set recognition with pose variation?

    • A deep Coarse-to-Fine network for head pose estimation from synthetic data

      2019, Pattern Recognition
      Citation Excerpt :

      The classification methods learned a mapping between images and a discretized space of poses. Given a new image, the classifiers assign it to a discrete class [11]. Since the majority of such methods have discretized outputs, only allowing coarse head pose estimation, it is difficult to derive a reliable continuous estimation from the results.

    • A novel quaternary pattern of local maximum quotient for heterogeneous face recognition

      2018, Pattern Recognition Letters
      Citation Excerpt :

      In literature, few quaternary patterns have been proposed for image analysis. Han et al. [53] proposed a local directional quaternary pattern for head pose estimation. From a binary image, the combination of the center pixel value and its one neighbor’s pixel value was used to represent the quaternary pattern.

    • A hyperspectral image classification framework and its application

      2015, Information Sciences
      Citation Excerpt :

      What is more, the N in N-states is set to P, and thus the computational complexity of LRP is about the O(M2 × P2 × θ) without regard to the procedure of N-states. The goal of dimension reduction is to find a minimum subset of features that can replace the original features and keep the inner data structural information as much as possible [11,15,30,32,47]. Inspired by Local Preserving Projection (LPP) [26], we propose a linear new manifold method called DLPP for classified datasets specially.

    View all citing articles on Scopus

    This paper has been recommended for acceptance by S. Wang.

    1

    Tel.: +82 42 350 7727; fax: +82 42 867 3567.

    View full text