Recursive head reconstruction from multi-view video sequences

doi:10.1016/j.cviu.2014.01.006

Computer Vision and Image Understanding

Volume 122, May 2014, Pages 182-201

https://doi.org/10.1016/j.cviu.2014.01.006 Get rights and content

Highlights

•
A particle filter method is proposed to estimate pose and shape of faces in videos.
•
Noisy or outlier points are handled using multi-hypotheses sampling in the filter.
•
The approach is evaluated on synthetic data to validate the method.
•
Comparison with the Levenberg–Marquardt optimization is performed.
•
Visual and biometric results on real videos confirm the efficiency of our method.

Abstract

Face reconstruction from images has been a core topic for the last decades, and is now involved in many applications such as identity verification or human–computer interaction. The 3D Morphable Model introduced by Blanz and Vetter has been widely used to this end, because its specific 3D modeling offers robustness to pose variation and adaptability to the specificities of each face.

To overcome the limitations of methods using a single image, and since video has become more and more affordable, we propose a new method which exploits video sequences to consolidate the 3D head shape estimation using successive frames. Based on particle filtering, our algorithm updates the model estimation at each instant and it is robust to noisy observations. A comparison with the Levenberg–Marquardt global optimization approach on various sets of data shows visual improvements both on pose and shape estimation. Biometric performances confirm this trend with a mean reduction of 10% in terms of False Rejection Rate.

Introduction

The recent rise of biometric techniques stimulates their use to automate the process of people recognition in a wide variety of systems, from computer locking devices to people authentication in airports. For each application, a compromise has to be found between the recognition rate of the biometric system on the one hand, and its easiness of use, cost and computation time on the other hand. The different types of biometric identifiers used for human recognition (fingerprints, iris, face, veins, etc.) have different requirements in terms of acquisition and do not lead to the same recognition accuracy.

Among all of them, facial biometry offers the advantage of being easily acquired without any contact with sensors, but suffers from specific issues of acquisition conditions (illumination, pose, facial expression). This is especially the case in video surveillance or in recognition systems designed to avoid behavior constraints in order to simplify the process from the user point of view. As such systems are not intrusive for users and due to the easiness of face acquisition, specific work has focused on face reconstruction and comparison methods. Moreover, facial biometry is sometimes the only biometric identifier available. To solve the different problems outlined above, the field of face recognition has been an active research area for many years, first on still images [30], [9], [5], then on video [23]. This extension is particularly interesting since video-based systems become more and more affordable, and have the advantage of increasing available observations. When people move about in uncontrolled scenarios, the information from a face observed under different poses in the sequence can be merged, and is then compared to a reference picture. Among existing face recognition algorithms, a number of methods are based on the comparison of frontal views (the reference view is generally the frontal picture on ID documents). A frontal view therefore has to be generated from the acquisitions. This can be performed via a 3D reconstruction of the face using the acquired images, from which synthesized views at any pose can be derived. Given the specificity of the face reconstruction problem (as opposed to object reconstruction without prior knowledge), model-based methods are privileged as they limit the risk of aberrant reconstruction, achieving a compromise between the information coming from the observations and the prior knowledge on the class of faces.

Most existing algorithms designed to estimate parameters of such 3D models are based on a single image input and highly depend on the quality of the observations [5], [27]. Nevertheless, in order to obtain more accurate results, it is interesting to use several images to consolidate the reconstruction. In [2], the authors proposed to fuse images based on stereovision. The use of video sequences has not been widely exploited, except for structure from motion methods, where images are considered as an ensemble to estimate the model parameters [10]. In [32], the authors extend a single image based method to video sequences by fusing the estimations obtained at each instant independently, without verifying the model coherence. However, temporal constraints between states estimated at successive instants are not integrated in the process, which would improve results.

To propose a real-time working system, we have to exploit the incoming video frames on the fly. To this end, we propose a new method based on the update of a 3D head model by using a particle filter framework, which extends the work in [13], and has, to our knowledge, never been proposed. An important feature of the proposed approach is that previous observations are implicitly taken into account to estimate the model at the current instant. The key of our algorithm is to integrate the unknown shape coefficients in the particle state and to consider them as static parameters, unlike the pose which varies over time. Besides an adaptation to real data, we propose here an improved algorithm for face estimation, robust to noisy or aberrant detections thanks to multiple hypotheses handling, contrary to common gradient methods which optimize a unique solution associated with a given set of observations.

In Section 2, we first present the chosen head model, before giving an overview of methods which estimate the associated parameters, both for single and multiple input images. In Section 3, we detail how to adapt a particle filter method to handle static parameters for facial shape estimation in video sequences, and propose some alternatives to improve this static parameter estimation. Section 4 presents how the observations are exploited in the particle filter and used to generate the frontal view. Section 5 details a method which is compared to our particle filter-based method in Section 6. This alternative method is based on a Levenberg–Marquardt optimization to estimate the pose and the shape. Experiments are done on both synthetic and real data. They are first analyzed on visual illustrations, to demonstrate the improvements at the image level. Then, since our final goal is to improve facial recognition performances by improving the head reconstruction using video sequences, an evaluation based on biometric performances is also proposed, before concluding with the perspectives of our method.

Section snippets

State of art: 3D face reconstruction

The method we propose for face reconstruction from video sequences relies on a head model which is described in this section. We will then present the existing methods to estimate its parameters.

Static shape parameter estimation by particle filtering.

In this paper, our goal is to estimate the parameters of the shape model introduced in Section 2.2. The methods which have been presented previously iteratively update an initial estimate, and the output is a unique instance of the morphable model. Unlike these types of algorithms, we propose here representing the previous estimation as a density, which characterizes the probability of realization over the whole shape space. This allows us to cope with the inherent nature of noisy data and to

3D face reconstruction in videos

In our application, each particle state is decomposed into a dynamic part (the pose $x_{t}$ ) and a static part (the scale $κ$ and shape parameters $α_{i}$ , such that $θ_{t} = \{κ, α_{1}, \dots, α_{M}\}$ ) and must be updated and evaluated with the incoming observations. In this part, we detail how we use the images for these steps, and introduce a new way to handle noisy observations based on the particle filter structure. Then, we present the texture extraction process once the shape evaluation is done.

Alternative algorithm: global optimization by Levenberg–Marquardt

To evaluate the proposed particle filter, we compare it to an optimization method based on the Levenberg–Marquardt (LM) algorithm [24]. This method attempts to iteratively minimize an error defined with criteria similar to those used in the particle filter, by mixing gradient descent and Gauss–Newton algorithms. Unlike the particle filter method, this method is global, meaning that it estimates jointly the poses for all frames and the shape parameters (the same for the whole sequence). We use

Evaluation

In this section, we will start by validating the proposed algorithm on a database of synthetic sequences for which the ground truth is available in terms of pose and shape parameters. After that, we will present the results of our method on real databases, both on visual aspects and biometric performances, which is the final purpose of the 3D face reconstruction in our case. Comparative results with the LM approach are presented in this second part, and show the interest of our approach.

Conclusion and future work

We have presented a novel approach to estimation of the 3D pose and shape of a head in a video sequence. Considering the shape parameters as part of the hidden state in the particle filter algorithm, our method allows us to update the parameter distribution at each instant. Moreover, using the multi-hypothesis structure of the set of particles, we handle outliers in the set of feature points by varying the initial pose for each particle. In this way, there is also less chance of getting a

Acknowledgments

This work has been supported in part by the National Agency for Research and Technology (ANRT). We are grateful to all people which took part in our acquisitions in the Morpho laboratory and agreed to appear in this article.

Catherine Herold graduated from the Ecole des Mines de Nancy, Nancy, France, in 2009, she received an MSc in computer science and image processing from the Paris VI University in 2010. She is currently a Ph.D. student in Telecom Paris- Tech and LIP6, Paris, in collaboration with Morpho, Safran. Her research interests include computer vision, especially in the areas of face tracking and reconstruction, and particle filter methods for dynamic and static parameter estimation.

References (36)

P. Minvielle et al.
A Bayesian approach to joint tracking and identification of geometric shapes in video sequences
Image Vis. Comput.
(2010)
T. Ahonen et al.
Face description with local binary patterns: application to face recognition
IEEE Trans. Pattern Anal. Mach. Intell.
(2006)
B. Amberg, A. Blake, A. Fitzgibbon, S. Romdhani, T. Vetter, Reconstructing high quality face-surfaces using model-based...
M.S. Arulampalam et al.
A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking
IEEE Trans. Signal Process.
(2002)
T. Beeler et al.
High-quality single-shot capture of facial geometry
ACM Trans. Graph. (SIGGRAPH)
(2010)
V. Blanz, T. Vetter, A morphable model for the synthesis of 3D faces, in: SIGGRAPH, ACM Press/Addison-Wesley Publishing...
D. Bradley et al.
High resolution passive facial performance capture
ACM Trans. Graph. (SIGGRAPH)
(2010)
P. Breuer, K.I. Kim, W. Kienzle, B. Schölkopf, V. Blanz, Automatic 3D face reconstruction from single images or video,...
A. Doucet et al.
On sequential Monte Carlo sampling methods for Bayesian filtering
Stat. Comput.
(2000)
G.J. Edwards, C.J. Taylor, T.F. Cootes, Interpreting face images using active appearance models, in: IEEE International...

N. Faggian, A.P. Paplinski, J. Sherrah, 3D morphable model fitting from multiple views, in: IEEE International...

P. Fearnhead

MCMC, sufficient statistics and particle filters

J. Comput. Graph. Stat.

(2002)

W.R. Gilks et al.

Following a moving target – Monte Carlo inference for dynamic Bayesian models

J. Roy. Stat. Soc.: Ser. B (Stat. Methodol.)

(2001)

C. Herold, V. Despiegel, S. Gentric, S. Dubuisson, I. Bloch, Head shape estimation using a particle filter including...

R. Jafri et al.

A survey of face recognition techniques

J. Inform. Process. Syst.

(2009)

S. Julier, J. Uhlmann, A new extension of the Kalman filter to nonlinear systems, in: International Symposium on...

R.E. Kalman

A new approach to linear filtering and prediction problems

Trans. ASME: J. Basic Eng.

(1960)

N. Kantas, A. Doucet, S.S. Singh, J.M. Maciejowski, An overview of sequential Monte Carlo methods for parameter...

Cited by (5)

3D facial shape reconstruction using macro- and micro-level features from high resolution facial images
2017, Image and Vision Computing
Citation Excerpt :
Software-based approaches reconstruct 3D faces using image sequences only. These approaches can generally be categorized into 3D Morphable Model (3DMM)-based methods [9–22], Structure from Motion (SfM)-based methods [23–33], and Shape from Shading (SFS)-based methods [34–40]. However, these methods cannot realistically reconstruct 3D faces because they use an insufficient number (approximately 80) of corresponding macro-level Facial Feature Points (FFPs).
Three-dimensional (3D) facial modeling and stereo matching-based methods are widely used for 3D facial reconstruction from 2D single-view and multiple-view images. However, these methods cannot realistically reconstruct 3D faces because they use insufficient numbers of macro-level Facial Feature Points (FFPs). This paper proposes an accurate and person-specific 3D facial reconstruction method that uses ample numbers of macro- and micro-level FFPs to enable coverage of all facial regions of high resolution facial images. Comparisons of 3D facial images reconstructed using the proposed method for ground-truth 3D facial images from the Bosphorus 3D database show that the method is superior to a conventional Active Appearance Model-Structure from Motion (AAM + SfM)-based method in terms of average 3D root mean square error between the reconstructed and ground-truth 3D faces. Further, the proposed method achieved outstanding accuracy in local facial regions such as the cheek—areas where extraction of FFPs is difficult for existing methods.
Improve a 3D distance measurement accuracy in stereo vision systems using optimization methods’ approach
2017, Opto-Electronics Review
This paper presents a 3D distance measurement accuracy improvement for stereo vision systems using optimization methods A Stereo Vision system is developed and tested to identify common uncertainty sources. As the optimization methods are used to train a neural network, the resulting equation can be implemented in real time stereo vision systems. Computational experiments and a comparative analysis are conducted to identify a training function with a minimal error performance for such method. The offered method provides a general purpose modelling technique, attending diverse problems that affect stereo vision systems. Finally, the proposed method is applied in the developed stereo vision system and a statistical analysis is performed to validate the obtained improvements.
Single-view-based 3D facial reconstruction method robust against pose variations
2015, Pattern Recognition
Citation Excerpt :
However, SfM requires multiple-view images from various angles and the corresponding points between these facial images have to be located. The single-view-based approach using the 3D model builds the 3D face model in a training process to represent facial shape, texture, illumination, and camera geometry with a number of model parameters [1,7,29–30,36–37,46]. Given a 2D facial image, the model-based method continuously optimizes 3D facial model parameters to minimize the shape and texture residuals between the 2D facial image input and a 2D facial image synthesized from model parameters.
The 3D Morphable Model (3DMM) and the Structure from Motion (SfM) methods are widely used for 3D facial reconstruction from 2D single-view or multiple-view images. However, model-based methods suffer from disadvantages such as high computational costs and vulnerability to local minima and head pose variations. The SfM-based methods require multiple facial images in various poses. To overcome these disadvantages, we propose a single-view-based 3D facial reconstruction method that is person-specific and robust to pose variations. Our proposed method combines the simplified 3DMM and the SfM methods. First, 2D initial frontal Facial Feature Points (FFPs) are estimated from a preliminary 3D facial image that is reconstructed by the simplified 3DMM. Second, a bilateral symmetric facial image and its corresponding FFPs are obtained from the original side-view image and corresponding FFPs by using the mirroring technique. Finally, a more accurate the 3D facial shape is reconstructed by the SfM using the frontal, original, and bilateral symmetric FFPs. We evaluated the proposed method using facial images in 35 different poses. The reconstructed facial images and the ground-truth 3D facial shapes obtained from the scanner were compared. The proposed method proved more robust to pose variations than 3DMM. The average 3D Root Mean Square Error (RMSE) between the reconstructed and ground-truth 3D faces was less than 2.6 mm when 2D FFPs were manually annotated, and less than 3.5 mm when automatically annotated.
Coarse-to-fine multiview 3d face reconstruction using multiple geometrical features
2018, Multimedia Tools and Applications
Tracking with Particle Filter for High-dimensional Observation and State Spaces
2014, Tracking with Particle Filter for High-dimensional Observation and State Spaces

Vincent Despiegel received the Agrégation de Mathématiques degree in 2004, is a former student of the école Normale Supérieure de Lyon, Lyon, France, and received a Ph.D. degree in 2007 from the Université de Grenoble, Grenoble, France, on the study of Hyperelliptic curves and on how substitution boxes could be built for cryptographic applications. Since 2007, he has been a research and development staff member at Morpho, France, within the Biometric research team. From 2007 to 2011, he worked mainly on fingerprint algorithms improvement. In particular, he was involved in the European FP7 integrated project TURBINE (TrUsted Revocable Biometric IdeNtitiEs, 2008–2011) and worked on template protection and fingerprint templates binarization. Since 2011, he is the manager of a research team dedicated to face detection and tracking. His research interests include cryptography, image processing and pattern recognition dedicated to biometry.

Stéphane Gentric is Research Unit Manager at Morpho (www.morpho.com). He received his Ph.D. in 1999, on Pattern Recognition at UPMC. From 1999 to 2002, he worked mainly on fingerprint algorithms. From 2002, he focused on Face Recognition, then Iris Recognition. He is now team leader for both biometries, driving all algorithmic aspects, from Acquisition Device to Large Scale Matching System. He was involved in most of Morphos projects in biometrics of the past 10 years, such as Smartgate Australian border crossing System as well as NIST benchmarks, or the UIDAI project. His current research interests stay pattern recognition for improvement of biometric systems.

Séverine Dubuisson (M06) was born in 1975. She received the Ph.D. degree in system control from the Compi‘egne University of Technology, Compi‘egne, France, in 2001. Since 2002, she has been an Associate Professor with the Laboratory of Computer Sciences, University Pierre and Marie Curie (Paris 6), Paris, France. Her research interests include computer vision, probabilistic models for video sequence analysis, and tracking.

Isabelle Bloch is graduated from the Ecole des Mines de Paris, Paris, France, in 1986, she received the Master’s degree from the University Paris 12, Paris, in 1987, the Ph.D. degree from the Ecole Nationale Supérieure des Télécommunications (Telecom ParisTech), Paris, in 1990, and the Habilitation degree from the University Paris 5, Paris, in 1995. She is currently a Professor with the Signal and Image Processing Department, Telecom ParisTech, in charge of the Image Processing and Understanding Group. Her research interests include 3D image and object processing, computer vision, 3D and fuzzy mathematical morphology, information fusion, fuzzy set theory, structural, graph-based, and knowledge-based object recognition, spatial reasoning, and medical imaging.

^☆: This paper has been recommended for acceptance by Kevin W. Bowyer, Ph.D.

¹: Part of this work has been done as S. Dubuisson was at LIP6 laboratory, Université Pierre et Marie Curie, 4 place Jussieu, Paris, France. C. Herold is also associated with this laboratory.

View full text

Recursive head reconstruction from multi-view video sequences☆

Highlights

Abstract

Introduction

Section snippets

State of art: 3D face reconstruction

Static shape parameter estimation by particle filtering.

3D face reconstruction in videos

Alternative algorithm: global optimization by Levenberg–Marquardt

Evaluation

Conclusion and future work

Acknowledgments

Image Vis. Comput.

Face description with local binary patterns: application to face recognition

IEEE Trans. Pattern Anal. Mach. Intell.

A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking

IEEE Trans. Signal Process.

High-quality single-shot capture of facial geometry

ACM Trans. Graph. (SIGGRAPH)

High resolution passive facial performance capture