Elsevier

Image and Vision Computing

Volume 64, August 2017, Pages 1-9
Image and Vision Computing

3D facial shape reconstruction using macro- and micro-level features from high resolution facial images

https://doi.org/10.1016/j.imavis.2017.05.001Get rights and content

Highlights

  • Qualitative results of the 3D reconstructions for the proposed and the previous method are supplemented.

  • Additional experiments for low-resolution images are supplemented.

  • Dense 3D facial shape reconstruction procedures (Section 2.4) are revised according to the review's comments.

Abstract

Three-dimensional (3D) facial modeling and stereo matching-based methods are widely used for 3D facial reconstruction from 2D single-view and multiple-view images. However, these methods cannot realistically reconstruct 3D faces because they use insufficient numbers of macro-level Facial Feature Points (FFPs). This paper proposes an accurate and person-specific 3D facial reconstruction method that uses ample numbers of macro- and micro-level FFPs to enable coverage of all facial regions of high resolution facial images. Comparisons of 3D facial images reconstructed using the proposed method for ground-truth 3D facial images from the Bosphorus 3D database show that the method is superior to a conventional Active Appearance Model-Structure from Motion (AAM + SfM)-based method in terms of average 3D root mean square error between the reconstructed and ground-truth 3D faces. Further, the proposed method achieved outstanding accuracy in local facial regions such as the cheek—areas where extraction of FFPs is difficult for existing methods.

Introduction

Researchers have focused considerable attention on three-dimensional (3D) facial reconstruction technologies because they are useful in various applications, such as 3D person-specific games and movie-character generation [1], surgical 3D simulation, virtual glasses simulation, frontal facial synthesis of suspects in surveillance camera systems, 3D teleconferencing, and pose-invariant facial recognition [2], [3], [4]. Further, with the rapid development of image-capturing device technology, High Resolution (HR) images can now be obtained easily and economically. Consequently, 3D facial reconstruction research needs to encompass exploration of the micro-level information existing only in HR images in order to more accurately and realistically reconstruct 3D faces.

Conventional 3D facial reconstruction technologies can be classified into hardware-based and software-based approaches (as shown in Table 1).

Hardware-based approaches reconstruct 3D faces using additional devices such as multiple stereo cameras [5], structured light [6], 3D laser scanners [7], and depth sensors [8] (e.g., Microsoft Kinect). Although these devices can be used to reconstruct 3D faces accurately and realistically, they incur additional high costs. They also have portability problems because they are bulky and require some amount of space.

Software-based approaches reconstruct 3D faces using image sequences only. These approaches can generally be categorized into 3D Morphable Model (3DMM)-based methods [9], [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22], Structure from Motion (SfM)-based methods [23], [24], [25], [26], [27], [28], [29], [30], [31], [32], [33], and Shape from Shading (SFS)-based methods [34], [35], [36], [37], [38], [39], [40]. However, these methods cannot realistically reconstruct 3D faces because they use an insufficient number (approximately 80) of corresponding macro-level Facial Feature Points (FFPs). Further, they cannot extract good features for matching from facial sub-regions (e.g., cheekbone, forehead, moustache, and chin) from low-resolution images. Thus, the reconstructed 3D shapes in these regions are either biased to the mean facial shape or are not sufficiently accurate.

Much research has been conducted using micro-level features to recognize faces. Park et al. [41] and Klare et al. [42] analyzed three levels of features for face recognition. They suggested that Level 3 features—which they define as micro-level features such as moles, scars, freckles, pores, and birthmarks—are useful for distinguishing faces in HR images and can be extracted using Difference of Gaussian (DoG) or Laplacian of Gaussian (LoG) [43]. However, they could extract only a very limited quantity of micro-level features from human faces using their method. Thus, although it is possible to use these features for face recognition, it is difficult to accurately reconstruct 3D faces using only these features.

Li et al. [44], [45] proposed a novel pore-scale facial feature framework for verification of faces in HR images. By adapting the Scale Invariant Feature Transform (SIFT) detector and descriptor to their proposed pore-scale facial feature framework, and using a candidate-constrained matching scheme, the framework is able to establish a large number of reliable correspondences of keypoints between two face images of the same subject with dense and distinguishable facial features, which are desirable aspects for face verification. However, because their method does not match the corresponding points between two face images in each local facial region, outliers can be generated in other local regions, resulting in the computational costs for matching descriptors increasing.

To address the issues outlined above, we propose an accurate and user-specific 3D facial reconstruction method that utilizes ample numbers of FFPs in the various local facial regions from HR facial images. The primary contributions of this paper and the novel aspects of the proposed method are as follows (Table 1):

  • -

    The proposed method is the first to reconstruct 3D faces by combining macro-level features (e.g., eyes, eyebrows, nose, mouth, and facial contour) with micro-level features (e.g., pores, hairs, fine wrinkles, freckle, fine wrinkles, acne, and spots) from HR images. To the best of our knowledge, to date no other method that combines macro- and micro-level features to reconstruct 3D faces has been proposed in the literature.

  • -

    The performance of the proposed method, which utilizes ample numbers of corresponding FFPs to reconstruct 3D faces, is better than that of previous sparse correspondence-based 3D reconstruction methods, which use an insufficient number of corresponding FFPs from macro-level features. Further, the proposed method achieves excellent accuracy in the local facial regions, such as cheeks, forehead, philtrum, and chin, from which FFPs are difficult to extract.

The remainder of this paper is organized as follows. Section 2 presents the proposed 3D facial reconstruction method. Section 3 outlines the experiments conducted and analyzes the results obtained. Section 4 presents our conclusions along with plans for future work.

Section snippets

The proposed method

The proposed method comprises the following steps: FFPs localization, corresponding points matching, outlier rejection, and sparse and dense 3D facial reconstruction. A flowchart of the proposed method is depicted in Fig. 1.

First, FFPs comprising macro- and micro-level features are extracted using Active Appearance Model (AAM) and DoG, respectively. Then, the facial region is divided into local regions based on AAM points, and the corresponding FFPs are matched using the Pore-SIFT (PSIFT)

The Bosphorus 3D face database

The proposed 3D reconstruction method using macro- and micro-level FFPs was evaluated on the Bosphorus database [51], which contains HR face images of 105 persons, collected as 3D data via Inspeck Mega Capturor II 3D—a commercial structured light-based 3D digitizer. A 1000 W halogen lamp was used in a dark room in order to obtain homogeneous lighting for good quality texture images. The resolution of the 2D images is approximately 1400 × 1200 pixels, and the mean facial width is approximately 1100

Conclusions

In this work, we proposed a novel 3D facial reconstruction method that combines macro- and micro-level FFPs to reconstruct accurate and person-specific high resolution 3D facial images. The proposed method comprises the following steps.

From frontal and side-view facial images, macro-level FFPs are first extracted from the AAM, and then micro-level FFPs are obtained by calculating an adaptive pore-index for each local region based on the extracted macro-level FFPs. The 2D corresponding points

Acknowledgments

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT & Future Planning (No. 2016R1A2B4006320).

References (51)

  • H. Liao et al.

    Rapid 3D face reconstruction by fusion of SFS and local morphable model

    J. Vis. Commun. Image Represent.

    (2012)
  • A. Patel et al.

    Driving 3D morphable models using shading cues

    Pattern Recogn.

    (2012)
  • D. Li et al.

    Design and learn distinctive features from pore-scale facial keypoints

    Pattern Recogn.

    (2015)
  • A. Maejima et al.

    Instant casting movie theater: the future cast system

    IEICE Trans. Inf. Syst.

    (2008)
  • V. Blanz et al.

    Face recognition based on fitting a 3D morphable model

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2003)
  • Y. Lin et al.

    Accurate 3D face reconstruction from weakly calibrated wide baseline images with profile contours

    IEEE Conf. Comput. Vis. Pattern Recognit.

    (2010)
  • L. Zhang et al.

    Space time faces: high resolution capture for modeling and animation

    ACM Trans. Graph. (SIGGRAPH)

    (2004)
  • B. Gökberk et al.

    3D face recognition: technology and applications

  • M. Zollhöfer et al.

    Automatic reconstruction of personalized avatars from 3D face scans

    Comput. Anim. Virtual Worlds

    (2011)
  • Y. Lee et al.

    Single view-based 3D face reconstruction robust to self-occlusion, EURASIP

    J. Adv. Signal Process.

    (2012)
  • O. Aldrian et al.

    A linear approach of 3D face shape and texture recovery using a 3D morphable model

  • H. Rara et al.

    Model-based 3D shape recovery from single images of unknown pose and illumination using a small number of feature points

  • V. Blanz et al.

    A morphable model for the synthesis of 3D faces

  • Y. Shan et al.

    Model-based bundle adjustment with application to face modeling

  • Z. Zhang et al.

    Robust and rapid generation of animated faces from video images: a model-based modeling approach

    Int. J. Comput. Vis.

    (2004)
  • Cited by (5)

    This paper has been recommended for acceptance by Stefanos Zafeiriou.

    View full text