Elsevier

Neurocomputing

Volume 100, 16 January 2013, Pages 197-205
Neurocomputing

Demographic classification from face videos using manifold learning

https://doi.org/10.1016/j.neucom.2011.10.040Get rights and content

Abstract

Research on automatic demographic classification is still in its infancy despite the vast potential applications. The few existing works are only based on static images while nowadays input data in many real-world applications consist of video sequences. From these observations and also inspired by studies in neuroscience emphasizing manifold ways of visual perception, we propose in this work a novel approach to demographic classification from video sequences which encodes and exploits the correlation between the face images through manifold learning. Our extensive experiments on the gender and age classification problems show that the proposed manifold learning based approach yields in excellent results outperforming those of traditional static image based methods. Furthermore, to gain insight into the proposed approach, we also investigate an LBP (local binary patterns) based spatiotemporal method as a baseline system for combining spatial and temporal information to demographic classification from videos.

Introduction

Automatic demographic classification from human faces generally includes gender recognition (i.e. man vs. woman), age categorization (e.g. child, youth, adult, middle-age and elderly) and ethnicity classification (e.g. Asian, Caucasian and African). This is very useful for more affective human–computer interaction (HCI) and smart environments in which the systems should adapt to the users whose behaviors and preferences are not only different at different ages but also specific to a given ethnic and/or gender. Automatic demographic classification is also useful in many other applications such as content-based image and video retrieval, restricting access to certain areas based on gender and/or age, enhancing the performance of biometric identification systems, collecting demographic information in public places, counting the number of women entering a retail store and so on.

Though there has been a great deal of progress in face analysis in the last years, demographic classification tasks have not been associated to that progress as most work has mainly focused on face detection and recognition problems. Consequently, the design of algorithms that are effective in discriminating between males and females, or classifying faces into different age and ethnic categories is still challenging and remains an open area of research.

First attempts of using computer vision based techniques to gender classification started in early 1990s. Since then, a significant progress has been made and several approaches have been reported in literature. Fundamentally, the proposed techniques differ in (i) the choice of the facial representation, ranging from the use of simple raw pixels to more complex features such as Gabor responses and (ii) the design of the classifier, ranging from the use of nearest neighbor (NN) and Fisher linear discriminant (FLD) classifiers to artificial neural networks (ANN), support vector machines (SVM) and boosting schemes. For instance, Moghaddam and Yang [1] used raw pixels as inputs to SVMs while Baluja and Rowley [2] adopted AdaBoost to combine weak classifiers, constructed using simple pixel comparisons, into single strong classifier. Both systems showed good classification rates. A comparative analysis on gender classification approaches can be found in [3].

While gender recognition has been explored by many other researchers [3], automatic age and ethnicity classification problems have received relatively far less attention despite the vast potential applications. Among the notable attempts are the works of Lanitis et al. [4] and Geng et al. [5]. Lanitis et al. used a simple quadratic aging function to model the relation between face and age, while Geng et al. modeled the sequence of a particular individual's face images sorted in time order by a subspace in which unseen faces are then projected for age estimation. A recent survey on different methods for age estimation can be found in [6].

Recently, the local binary pattern (LBP) features [7], [8] have been successfully applied to demographic classification from static images (e.g. by Yang and Ai [9]) and to age estimation (e.g. by Chen et al. [10]). More recently, the combination of global and local features has also been shown to provide very good results in age estimation [11].

It appears that most proposed approaches to demographic classification, including those cited above, are based only on static images and assume well aligned faces while nowadays the input data in real-world applications (such as in video surveillance and HCI) generally consist of video sequences and it is not always obvious to hold the face alignment assumption. So, the question which arises then is how to efficiently perform automatic demographic classification from face video sequences? A straightforward approach would be applying methods developed for still images to some selected frames and then fusing the results at decision or score levels. Obviously, such an approach is not optimal as it only exploits the abundance of frames in the videos and ignores the temporal correlation between the face images. Only recently have researchers started to also pay an increasing attention to the facial temporal information especially for face and facial expression recognition from videos (e.g. [12], [13], [14], [15], [16], [17], [18]) using for instance, spatiotemporal representations. However, demographic classification tasks have not yet been investigated from such points of view. To the best of our knowledge, no previous work has yet even addressed demographic classification from face video sequences.

Inspired by studies in neuroscience emphasizing manifold ways of visual perception [19] and also motivated by the psychophysical findings (e.g. [20], [21]) which indicate that facial temporal changes can provide valuable information to face and gender recognition, we consider in this work the problem of demographic classification from video sequences and propose a novel approach which exploits and encodes the correlation between the face images through manifold learning. Thus, we look at the problem of demographic classification from totally new perspectives.

The goal of face manifold learning is to discover the hidden low-dimensional structure of the face images. Thus, instead of treating each facial image as a “single” or “isolated” pattern in the image space and then fusing the results, we propose to learn and discover the hidden low-dimensional nonlinear manifolds of the faces in each demographic class (e.g. male class, female class, child class, etc.). In other terms, we cluster the face sequences in the low-dimensional space based on their intrinsic demographic characteristic. Then, a target face sequence can be projected into the manifolds for classification. The “closest” manifold, in terms of a newly introduced manifold distance measure, will then determine the gender (or age or ethnicity) of the person in the target sequence.

Recently, there has also been an increasing interest on face manifold learning but most of the works were devoted to face recognition with an aim of coping with pose and illumination changes in the videos (e.g. [22], [23], [24]). To gain insight into our proposed approach, we also derive and investigate a baseline system that uses an LBP based spatiotemporal representation for combining facial structure (i.e. spatial information) and dynamics (i.e. temporal information), and support vector machines (SVM) for classification. Our choice of adopting the LBP spatiotemporal representation is motivated by the recent success of using it for combining appearance and motion for face and facial expression recognition [18], [17] and also for dynamic texture recognition [17].

The preliminary results of the research on the gender recognition problem have been published in part as a conference paper in [25]. In this article, we include new experiments on two other demographic classification problems which are age estimation and ethnicty classification, and report the complete and improved formulation, thorough investigation and extended experimental evaluation of our methodology.

Among the salient contributions of this article are: (i) a novel manifold based method to gender and age classification from face sequences is presented and extensively evaluated; (ii) an extension to the locally linear embedding algorithm [26] to handle face sequences is proposed; (iii) a simple yet efficient manifold to manifold distance measure is introduced; and (iv) a comparison between still image and video based analysis for demographic classification is provided.

The rest of this paper is organized as follows. Section 2 describes our proposed approach to demographic classification from videos using manifold learning. Section 3 presents a baseline method using an LBP based spatiotemporal representation and SVMs. Then, we present in 4 Experiments on gender classification from videos, 5 Experiments on age classification from videos the extensive experiments on two demographic classification tasks namely gender and age classification from videos. Section 6 further discusses the results and presents preliminary experiments on ethnicity classification. Finally, we draw a conclusion in Section 7.

Section snippets

Proposed approach

We describe below our proposed approach to demographic classification from videos. For clarity, we explain our derivations on the gender recognition case. Later, we show how to also apply the described methodology to the age estimation problem.

The idea of approaching demographic classification from manifold learning perspective is inspired by neuroscience studies pointing out the manifold ways of visual perception [19]. Indeed, the facial images are not “isolated” patterns in the image space

LBP based spatiotemporal approach

For comparison, we also consider one of the state-of-the-art approaches to face analysis from videos which is based on a spatiotemporal representation for combining facial structure and dynamics. We adopt the LBP based spatiotemporal representation because of its recent excellent performance in modeling moving faces for face and facial expression recognition [18], [17] and also for dynamic texture recognition [17].

The original LBP operator, introduced by Ojala et al. [35], [8], forms labels for

Experimental data

To evaluate and compare the performance of the two approaches in gender recognition from videos, we considered three different publicly available video face databases (namely CRIM [37], VidTIMIT [38] and Cohn–Kanade [39]) containing several face sequences subject to changes caused by different factors including face image resolution, illumination variations, head movements and facial expressions. CRIM is a large set of 591 face sequences showing 20 persons (10 female and 10 male) reading

Experiments on age classification from videos

To evaluate the performance of the different approaches in age classification from videos, we collected from Internet a set of video sequences mainly showing many celebrities giving speeches in TV programs and News. For some videos of unknown individuals (especially children), we manually labeled them using our (human) perception of age. Then, we randomly segmented the videos and extracted about 2000 video shots of about 300 frames each. In the experiments, we adopted a 10-fold cross validation

Discussion and analysis

We looked at the problem of video based demographic classification from new viewpoints and proposed a novel approach based on manifold learning. The proposed approach is compared against a state-of-the-art method for face analysis from videos that uses a spatiotemporal representation for combining spatial and temporal information. We also implemented and considered two static image based methods as baseline systems. We extensively evaluated the performance of all these methods on large and

Conclusion

Research on automatic demographic classification is still in its infancy despite the vast potential applications. The few existing works are only based on static images while nowadays input data in many real-world applications consist of video sequences. From this observation and also motivated by studies in psychophysics which indicate that facial temporal changes can provide valuable information to face analysis, we looked at the problem of demographic classification from new perspectives. We

Acknowledgment

The financial support provided by the EU FP7 project TABULA RASA (grant agreement #257289), and the Academy of Finland is gratefully acknowledged.

Abdenour Hadid received his Engineer Diploma in Computing from the National Institute of Informatics (INI, Algiers), in 1997, and the Doctor of Science in Technology degree in electrical and information engineering from the University of Oulu, Finland, in 2005. Now, he is an adjunct professor and senior researcher in the Machine Vision Group, University of Oulu. His research interests include: biometrics and facial image analysis, local binary patterns, manifold learning, human-machine

References (40)

  • Y. Fu et al.

    Age synthesis and estimation via faces: a survey

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2010)
  • T. Ahonen et al.

    Face description with local binary patterns: application to face recognition

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2006)
  • T. Ojala et al.

    Multiresolution gray-scale and rotation invariant texture classification with local binary patterns

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2002)
  • Z. Yang, H. Ai, Demographic classification with local binary patterns, in: The 2nd International Conference on...
  • C. Chen, W. Yang, Y. Wang, K. Ricanek, K. Luu, Facial feature fusion and model selection for age estimation, in: IEEE...
  • W. Yang, C. Chen, K. Ricanek, C. Sun, Ensemble of global and local features for face age estimation, in: Proceedings of...
  • Y. Li, Dynamic Face Models: Construction and Applications, Ph.D. Thesis, Queen Mary, University of London,...
  • B. Li et al.

    Face verification through tracking facial features

    J. Opt. Soc. Am.

    (2001)
  • S. Zhou, R. Chellappa, Probabilistic human recognition from video, in: European Conf. on Computer Vision, 2002, pp....
  • S. Zhou, V. Krueger, R. Chellappa, Face recognition from video: a condensation approach, in: IEEE Int. Conf. on...
  • Cited by (25)

    • Two-stages based facial demographic attributes combination for age estimation

      2019, Journal of Visual Communication and Image Representation
      Citation Excerpt :

      Their framework produced a Mean Absolute Error (MAE) of 3.92 years on the MORPH-II database. Another approach to determine the demographic attributes proposed by Hadid and Pietikäinen [34] considered LBP based spatio-temporal representation as a baseline system for combining spatial (i.e., facial structure information) and temporal (i.e., dynamics information) features for facial demographic classification from video sequences. Moreover, the correlation between the frames through manifold learning has been encoded.

    • Pyramid multi-level features for facial demographic estimation

      2017, Expert Systems with Applications
      Citation Excerpt :

      For their experiments, the authors considered the FERET, PIE, and a snapshot databases. The LBP was also investigated in Hadid and Pietikinen (2013) for automatic demographic classification from human faces which includes age categorization, gender recognition and ethnicity classification. They focused on the LBP based spatiotemporal method as a baseline system for combining spatial and temporal information.

    • Automatic System to Detect Both Distraction and Drowsiness in Drivers Using Robust Visual Features

      2017, RIAI - Revista Iberoamericana de Automatica e Informatica Industrial
    • Multi-class Fukunaga Koontz discriminant analysis for enhanced face recognition

      2016, Pattern Recognition
      Citation Excerpt :

      LPP finds linear projective subspaces that optimally preserve the neighborhood proximity structure of the data. Some recent advances in manifold-based face recognition can be found in [8–11]. The aforementioned linear discriminant subspace learning methods such as LDA, UDP and LPP all suffer from the small-sample-size problem [12], whenever the number of samples is smaller than the sample dimensionality.

    • On soft biometrics

      2015, Pattern Recognition Letters
      Citation Excerpt :

      An example reporting the exploitation of facial behavior is available in [44]. However, current works on extracting soft biometric traits are mostly focused on static 2D face images while ignoring facial dynamics [35]. This applies to body dynamics too, and there has been resurgent interest in identification by gesture.

    • Double layer multiple task learning for age estimation with insufficient training samples

      2015, Neurocomputing
      Citation Excerpt :

      This makes age estimation the most challenging problem among various facial attribute recognition tasks. The age estimation algorithms can be divided into two categories: classification based approaches [3–7] and regression based approaches [8–12]. The earliest report of age estimation algorithm that studies the training sample problem is the AGES proposed by Geng et al. [4].

    View all citing articles on Scopus

    Abdenour Hadid received his Engineer Diploma in Computing from the National Institute of Informatics (INI, Algiers), in 1997, and the Doctor of Science in Technology degree in electrical and information engineering from the University of Oulu, Finland, in 2005. Now, he is an adjunct professor and senior researcher in the Machine Vision Group, University of Oulu. His research interests include: biometrics and facial image analysis, local binary patterns, manifold learning, human-machine interaction, and mobile applications. He has authored several papers in international conferences and journals, and served as a reviewer for many international conferences and journals. He is a member of the Pattern Recognition Society of Finland and the international Association for Pattern Recognition (IAPR). He served as a member of the organizing committee of several international workshops. He gave several invited talks and tutorials in international events. He has been visiting the Institute of Automation at the Chinese Academy of Science (Beijing, China) in spring 2006, the Institute of Industrial Science at the University of Tokyo (Tokyo, Japan) in summer 2009, and Eurecom Institute at Sophia Antipolis (France) in summer 2010. Recently, he co-authored a book titled Computer Vision Using Local Binary Patterns, published by Springer in 2011.

    Matti Pietikäinen received the Doctor of Science in Technology degree from the University of Oulu, Finland, in 1982. In 1981, he established the Machine Vision Group at the University of Oulu. This group has achieved a highly respected position in its field, and its research results have been widely exploited in industry. Currently, he is a professor of information engineering, scientific director of Infotech Oulu Research Center, and leader of the Machine Vision Group at the University of Oulu. From 1980 to 1981 and from 1984 to 1985, he visited the Computer Vision Laboratory at the University of Maryland. His research interests include texture-based computer vision, face analysis, activity analysis, and their applications in human–computer/robot interaction, person identification, visual surveillance, and image/video retrieval. He has authored over 250 refereed papers in international journals, books, and conference proceedings and about 100 other publications or reports. He has made pioneering and lasting contributions to local binary pattern (LBP) methodology, texture-based image and video analysis, and facial image analysis. His papers are frequently cited and research results are used in various applications around the world. He is an associate editor of Image and Vision Computing journal, and was an associate editor of IEEE Transactions on Pattern Analysis and Machine Intelligence and Pattern Recognition journals. He was guest editor (with L.F. Pau) of a two-part special issue on “Machine Vision for Advanced Production” for the International Journal of Pattern Recognition and Artificial Intelligence (also reprinted as a book by World Scientific in 1996). He was also the editor of the book Texture Analysis in Machine Vision (World Scientific, 2000) and a co-editor of the book Machine Learning for Vision-Based Motion Analysis: Theory and Techniques (Springer-Verlag, 2011). He has served as a reviewer for numerous journals and conferences. Recently, he co-authored a book titled Computer Vision Using Local Binary Patterns, published by Springer in 2011. He was the president of the Pattern Recognition Society of Finland from 1989 to 1992. From 1989 to 2007 he served as a member of the Governing Board of the International Association for Pattern Recognition (IAPR), and became one of the founding fellows of the IAPR in 1994. He regularly serves on program committees of the top conferences and workshops of his field. Recently, he was an area chair of IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2007), a co-chair of Workshops at International Conference on Pattern Recognition (ICPR 2008), and a co-chair of a workshop series on Machine Learning for Vision-based Motion Analysis (MLVMA) arranged in conjunction of European Conference on Computer Vision (ECCV 2008), International Conference on Computer Vision (ICCV 2009) and IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2011). He has lectured lecture tutorials on texture-based computer vision at SCIA 2005, ICPR 2006, ICCV 2009 and CVPR 2011 conferences.

    View full text