Recursive head reconstruction from multi-view video sequences

https://doi.org/10.1016/j.cviu.2014.01.006Get rights and content

Highlights

  • A particle filter method is proposed to estimate pose and shape of faces in videos.

  • Noisy or outlier points are handled using multi-hypotheses sampling in the filter.

  • The approach is evaluated on synthetic data to validate the method.

  • Comparison with the Levenberg–Marquardt optimization is performed.

  • Visual and biometric results on real videos confirm the efficiency of our method.

Abstract

Face reconstruction from images has been a core topic for the last decades, and is now involved in many applications such as identity verification or human–computer interaction. The 3D Morphable Model introduced by Blanz and Vetter has been widely used to this end, because its specific 3D modeling offers robustness to pose variation and adaptability to the specificities of each face.

To overcome the limitations of methods using a single image, and since video has become more and more affordable, we propose a new method which exploits video sequences to consolidate the 3D head shape estimation using successive frames. Based on particle filtering, our algorithm updates the model estimation at each instant and it is robust to noisy observations. A comparison with the Levenberg–Marquardt global optimization approach on various sets of data shows visual improvements both on pose and shape estimation. Biometric performances confirm this trend with a mean reduction of 10% in terms of False Rejection Rate.

Introduction

The recent rise of biometric techniques stimulates their use to automate the process of people recognition in a wide variety of systems, from computer locking devices to people authentication in airports. For each application, a compromise has to be found between the recognition rate of the biometric system on the one hand, and its easiness of use, cost and computation time on the other hand. The different types of biometric identifiers used for human recognition (fingerprints, iris, face, veins, etc.) have different requirements in terms of acquisition and do not lead to the same recognition accuracy.

Among all of them, facial biometry offers the advantage of being easily acquired without any contact with sensors, but suffers from specific issues of acquisition conditions (illumination, pose, facial expression). This is especially the case in video surveillance or in recognition systems designed to avoid behavior constraints in order to simplify the process from the user point of view. As such systems are not intrusive for users and due to the easiness of face acquisition, specific work has focused on face reconstruction and comparison methods. Moreover, facial biometry is sometimes the only biometric identifier available. To solve the different problems outlined above, the field of face recognition has been an active research area for many years, first on still images [30], [9], [5], then on video [23]. This extension is particularly interesting since video-based systems become more and more affordable, and have the advantage of increasing available observations. When people move about in uncontrolled scenarios, the information from a face observed under different poses in the sequence can be merged, and is then compared to a reference picture. Among existing face recognition algorithms, a number of methods are based on the comparison of frontal views (the reference view is generally the frontal picture on ID documents). A frontal view therefore has to be generated from the acquisitions. This can be performed via a 3D reconstruction of the face using the acquired images, from which synthesized views at any pose can be derived. Given the specificity of the face reconstruction problem (as opposed to object reconstruction without prior knowledge), model-based methods are privileged as they limit the risk of aberrant reconstruction, achieving a compromise between the information coming from the observations and the prior knowledge on the class of faces.

Most existing algorithms designed to estimate parameters of such 3D models are based on a single image input and highly depend on the quality of the observations [5], [27]. Nevertheless, in order to obtain more accurate results, it is interesting to use several images to consolidate the reconstruction. In [2], the authors proposed to fuse images based on stereovision. The use of video sequences has not been widely exploited, except for structure from motion methods, where images are considered as an ensemble to estimate the model parameters [10]. In [32], the authors extend a single image based method to video sequences by fusing the estimations obtained at each instant independently, without verifying the model coherence. However, temporal constraints between states estimated at successive instants are not integrated in the process, which would improve results.

To propose a real-time working system, we have to exploit the incoming video frames on the fly. To this end, we propose a new method based on the update of a 3D head model by using a particle filter framework, which extends the work in [13], and has, to our knowledge, never been proposed. An important feature of the proposed approach is that previous observations are implicitly taken into account to estimate the model at the current instant. The key of our algorithm is to integrate the unknown shape coefficients in the particle state and to consider them as static parameters, unlike the pose which varies over time. Besides an adaptation to real data, we propose here an improved algorithm for face estimation, robust to noisy or aberrant detections thanks to multiple hypotheses handling, contrary to common gradient methods which optimize a unique solution associated with a given set of observations.

In Section 2, we first present the chosen head model, before giving an overview of methods which estimate the associated parameters, both for single and multiple input images. In Section 3, we detail how to adapt a particle filter method to handle static parameters for facial shape estimation in video sequences, and propose some alternatives to improve this static parameter estimation. Section 4 presents how the observations are exploited in the particle filter and used to generate the frontal view. Section 5 details a method which is compared to our particle filter-based method in Section 6. This alternative method is based on a Levenberg–Marquardt optimization to estimate the pose and the shape. Experiments are done on both synthetic and real data. They are first analyzed on visual illustrations, to demonstrate the improvements at the image level. Then, since our final goal is to improve facial recognition performances by improving the head reconstruction using video sequences, an evaluation based on biometric performances is also proposed, before concluding with the perspectives of our method.

Section snippets

State of art: 3D face reconstruction

The method we propose for face reconstruction from video sequences relies on a head model which is described in this section. We will then present the existing methods to estimate its parameters.

Static shape parameter estimation by particle filtering.

In this paper, our goal is to estimate the parameters of the shape model introduced in Section 2.2. The methods which have been presented previously iteratively update an initial estimate, and the output is a unique instance of the morphable model. Unlike these types of algorithms, we propose here representing the previous estimation as a density, which characterizes the probability of realization over the whole shape space. This allows us to cope with the inherent nature of noisy data and to

3D face reconstruction in videos

In our application, each particle state is decomposed into a dynamic part (the pose xt) and a static part (the scale κ and shape parameters αi, such that θt=κ,α1,,αM) and must be updated and evaluated with the incoming observations. In this part, we detail how we use the images for these steps, and introduce a new way to handle noisy observations based on the particle filter structure. Then, we present the texture extraction process once the shape evaluation is done.

Alternative algorithm: global optimization by Levenberg–Marquardt

To evaluate the proposed particle filter, we compare it to an optimization method based on the Levenberg–Marquardt (LM) algorithm [24]. This method attempts to iteratively minimize an error defined with criteria similar to those used in the particle filter, by mixing gradient descent and Gauss–Newton algorithms. Unlike the particle filter method, this method is global, meaning that it estimates jointly the poses for all frames and the shape parameters (the same for the whole sequence). We use

Evaluation

In this section, we will start by validating the proposed algorithm on a database of synthetic sequences for which the ground truth is available in terms of pose and shape parameters. After that, we will present the results of our method on real databases, both on visual aspects and biometric performances, which is the final purpose of the 3D face reconstruction in our case. Comparative results with the LM approach are presented in this second part, and show the interest of our approach.

Conclusion and future work

We have presented a novel approach to estimation of the 3D pose and shape of a head in a video sequence. Considering the shape parameters as part of the hidden state in the particle filter algorithm, our method allows us to update the parameter distribution at each instant. Moreover, using the multi-hypothesis structure of the set of particles, we handle outliers in the set of feature points by varying the initial pose for each particle. In this way, there is also less chance of getting a

Acknowledgments

This work has been supported in part by the National Agency for Research and Technology (ANRT). We are grateful to all people which took part in our acquisitions in the Morpho laboratory and agreed to appear in this article.

Catherine Herold graduated from the Ecole des Mines de Nancy, Nancy, France, in 2009, she received an MSc in computer science and image processing from the Paris VI University in 2010. She is currently a Ph.D. student in Telecom Paris- Tech and LIP6, Paris, in collaboration with Morpho, Safran. Her research interests include computer vision, especially in the areas of face tracking and reconstruction, and particle filter methods for dynamic and static parameter estimation.

References (36)

  • P. Minvielle et al.

    A Bayesian approach to joint tracking and identification of geometric shapes in video sequences

    Image Vis. Comput.

    (2010)
  • T. Ahonen et al.

    Face description with local binary patterns: application to face recognition

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2006)
  • B. Amberg, A. Blake, A. Fitzgibbon, S. Romdhani, T. Vetter, Reconstructing high quality face-surfaces using model-based...
  • M.S. Arulampalam et al.

    A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking

    IEEE Trans. Signal Process.

    (2002)
  • T. Beeler et al.

    High-quality single-shot capture of facial geometry

    ACM Trans. Graph. (SIGGRAPH)

    (2010)
  • V. Blanz, T. Vetter, A morphable model for the synthesis of 3D faces, in: SIGGRAPH, ACM Press/Addison-Wesley Publishing...
  • D. Bradley et al.

    High resolution passive facial performance capture

    ACM Trans. Graph. (SIGGRAPH)

    (2010)
  • P. Breuer, K.I. Kim, W. Kienzle, B. Schölkopf, V. Blanz, Automatic 3D face reconstruction from single images or video,...
  • A. Doucet et al.

    On sequential Monte Carlo sampling methods for Bayesian filtering

    Stat. Comput.

    (2000)
  • G.J. Edwards, C.J. Taylor, T.F. Cootes, Interpreting face images using active appearance models, in: IEEE International...
  • N. Faggian, A.P. Paplinski, J. Sherrah, 3D morphable model fitting from multiple views, in: IEEE International...
  • P. Fearnhead

    MCMC, sufficient statistics and particle filters

    J. Comput. Graph. Stat.

    (2002)
  • W.R. Gilks et al.

    Following a moving target – Monte Carlo inference for dynamic Bayesian models

    J. Roy. Stat. Soc.: Ser. B (Stat. Methodol.)

    (2001)
  • C. Herold, V. Despiegel, S. Gentric, S. Dubuisson, I. Bloch, Head shape estimation using a particle filter including...
  • R. Jafri et al.

    A survey of face recognition techniques

    J. Inform. Process. Syst.

    (2009)
  • S. Julier, J. Uhlmann, A new extension of the Kalman filter to nonlinear systems, in: International Symposium on...
  • R.E. Kalman

    A new approach to linear filtering and prediction problems

    Trans. ASME: J. Basic Eng.

    (1960)
  • N. Kantas, A. Doucet, S.S. Singh, J.M. Maciejowski, An overview of sequential Monte Carlo methods for parameter...
  • Cited by (5)

    • 3D facial shape reconstruction using macro- and micro-level features from high resolution facial images

      2017, Image and Vision Computing
      Citation Excerpt :

      Software-based approaches reconstruct 3D faces using image sequences only. These approaches can generally be categorized into 3D Morphable Model (3DMM)-based methods [9–22], Structure from Motion (SfM)-based methods [23–33], and Shape from Shading (SFS)-based methods [34–40]. However, these methods cannot realistically reconstruct 3D faces because they use an insufficient number (approximately 80) of corresponding macro-level Facial Feature Points (FFPs).

    • Single-view-based 3D facial reconstruction method robust against pose variations

      2015, Pattern Recognition
      Citation Excerpt :

      However, SfM requires multiple-view images from various angles and the corresponding points between these facial images have to be located. The single-view-based approach using the 3D model builds the 3D face model in a training process to represent facial shape, texture, illumination, and camera geometry with a number of model parameters [1,7,29–30,36–37,46]. Given a 2D facial image, the model-based method continuously optimizes 3D facial model parameters to minimize the shape and texture residuals between the 2D facial image input and a 2D facial image synthesized from model parameters.

    • Tracking with Particle Filter for High-dimensional Observation and State Spaces

      2014, Tracking with Particle Filter for High-dimensional Observation and State Spaces

    Catherine Herold graduated from the Ecole des Mines de Nancy, Nancy, France, in 2009, she received an MSc in computer science and image processing from the Paris VI University in 2010. She is currently a Ph.D. student in Telecom Paris- Tech and LIP6, Paris, in collaboration with Morpho, Safran. Her research interests include computer vision, especially in the areas of face tracking and reconstruction, and particle filter methods for dynamic and static parameter estimation.

    Vincent Despiegel received the Agrégation de Mathématiques degree in 2004, is a former student of the école Normale Supérieure de Lyon, Lyon, France, and received a Ph.D. degree in 2007 from the Université de Grenoble, Grenoble, France, on the study of Hyperelliptic curves and on how substitution boxes could be built for cryptographic applications. Since 2007, he has been a research and development staff member at Morpho, France, within the Biometric research team. From 2007 to 2011, he worked mainly on fingerprint algorithms improvement. In particular, he was involved in the European FP7 integrated project TURBINE (TrUsted Revocable Biometric IdeNtitiEs, 2008–2011) and worked on template protection and fingerprint templates binarization. Since 2011, he is the manager of a research team dedicated to face detection and tracking. His research interests include cryptography, image processing and pattern recognition dedicated to biometry.

    Stéphane Gentric is Research Unit Manager at Morpho (www.morpho.com). He received his Ph.D. in 1999, on Pattern Recognition at UPMC. From 1999 to 2002, he worked mainly on fingerprint algorithms. From 2002, he focused on Face Recognition, then Iris Recognition. He is now team leader for both biometries, driving all algorithmic aspects, from Acquisition Device to Large Scale Matching System. He was involved in most of Morphos projects in biometrics of the past 10 years, such as Smartgate Australian border crossing System as well as NIST benchmarks, or the UIDAI project. His current research interests stay pattern recognition for improvement of biometric systems.

    Séverine Dubuisson (M06) was born in 1975. She received the Ph.D. degree in system control from the Compi‘egne University of Technology, Compi‘egne, France, in 2001. Since 2002, she has been an Associate Professor with the Laboratory of Computer Sciences, University Pierre and Marie Curie (Paris 6), Paris, France. Her research interests include computer vision, probabilistic models for video sequence analysis, and tracking.

    Isabelle Bloch is graduated from the Ecole des Mines de Paris, Paris, France, in 1986, she received the Master’s degree from the University Paris 12, Paris, in 1987, the Ph.D. degree from the Ecole Nationale Supérieure des Télécommunications (Telecom ParisTech), Paris, in 1990, and the Habilitation degree from the University Paris 5, Paris, in 1995. She is currently a Professor with the Signal and Image Processing Department, Telecom ParisTech, in charge of the Image Processing and Understanding Group. Her research interests include 3D image and object processing, computer vision, 3D and fuzzy mathematical morphology, information fusion, fuzzy set theory, structural, graph-based, and knowledge-based object recognition, spatial reasoning, and medical imaging.

    This paper has been recommended for acceptance by Kevin W. Bowyer, Ph.D.

    1

    Part of this work has been done as S. Dubuisson was at LIP6 laboratory, Université Pierre et Marie Curie, 4 place Jussieu, Paris, France. C. Herold is also associated with this laboratory.

    View full text