Elsevier

Pattern Recognition

Volume 41, Issue 3, March 2008, Pages 894-905
Pattern Recognition

Dynamic training using multistage clustering for face recognition

https://doi.org/10.1016/j.patcog.2007.06.017Get rights and content

Abstract

A novel face recognition algorithm that uses dynamic training in a multistage clustering scheme is presented and evaluated. This algorithm uses discriminant analysis to project the face classes and a clustering algorithm to partition the projected face data, thus forming a set of discriminant clusters. Then, an iterative process creates subsets, whose cardinality is defined by an entropy-based measure, that contain the most useful clusters. The best match to the test face is found when only a single face class is retained. This method was tested on the ORL, XM2VTS and FERET face databases, whereas the UMIST database was used in order to train the proposed algorithm. Experimental results indicate that the proposed framework provides a promising solution to the face recognition problem.

Introduction

Face recognition (FR) is an active research field that has received great attention in the past several years. A FR system usually attempts to determine the identity of the test face by computing and ranking all similarity scores between the test face and all human faces stored in the system database that constitute the training set. However, the performance of many state-of-the-art FR methods deteriorates rapidly when large, in terms of the number of faces, databases are considered [1], [2]. Specifically, the facial feature representation obtained by methods that use linear criteria, which normally require images to follow a convex distribution, is not capable of generalizing all the introduced variations due e.g. to large differences in viewpoint, illumination and facial expression, when large data sets are used. When nonlinear face representation methods are employed, problems such as over-fitting, computational complexity and difficulties in optimizing the involved parameters often appear [1]. Moreover, the performance of FR methods deteriorates when there is lack of a sufficiently large number of training samples for each face in the database as, in this case, the intra-person variations cannot be modelled properly. More specifically, linear methods, such as linear discriminant analysis (LDA), often suffer from the small sample size (SSS) problem, where the dimensionality of the samples is larger than the number of available training samples [3].

Recently, various methods have been proposed in order to restrict the maladies that are imposed by the two aforementioned types of problems on the recognition performance. The ‘divide and conquer’ principle, by which a database is decomposed into smaller sets in order to piecewise learn the complex distribution by a mixture of local linear models, has been widely used. In Ref. [1], a separability criterion is employed to partition a training set from a large database into a set of smaller maximal separability clusters (MSCs) by utilizing an LDA-like technique. Based on these MSCs, a hierarchical classification framework that consists of two levels of nearest neighbour (NN) classifiers is employed and the match is found. The work in Ref. [4] concentrates on the hierarchical partitioning of the feature spaces using hierarchical discriminant analysis (HDA). A space tessellation tree is generated using the most expressive features (MEF), by employing principal component analysis (PCA), and the most discriminating features (MDF), by employing LDA, at each tree level. This is done to avoid the limitations linked to global features, by deriving a recursively better-fitted set of features for each of the recursively subdivided sets of training samples. In general, hierarchical trees have been extensively used for pattern recognition purposes.

LDA is an important statistical tool that has been shown to be effective in FR or verification problems [5], [6]. Traditionally, in order to improve LDA-based methods and provide solutions for the SSS problem, LDA is applied in a lower-dimensional PCA subspace, so as to discard the null space (i.e., the subspace defined by the eigenvectors that correspond to zero eigenvalues) of the within-class scatter matrix of the training data set [5]. However, it has been shown [7] that significant discriminant information is contained in the discarded space and alternative solutions have been sought. Specifically, in Ref. [8] a direct-LDA (DLDA) algorithm is presented that discards the null space of the between-class scatter matrix, which is claimed to contain no useful information, rather than discard the null space of the within-class scatter matrix. More recently, in an attempt to address the SSS problem, the regularized LDA method (RLDA) was presented in Ref. [9], which employs a regularized Fisher's separability criterion. The purpose of regularization is to reduce the high variance related to the eigenvalue estimates of the within-class scatter matrix, at the expense of potentially increased classification bias.

The use of static training structures, where the input data is not involved in determining the system parameters, has been abundant when designing pattern classification systems. However, it has been demonstrated that the classification performance can be improved by employing dynamic training structures. In this spirit, the Dynamic face recognition Committee Machine (DCM) was presented in Ref. [10], consisting of five state-of-the-art pattern classification algorithms. The proposed dynamic structure requires for the input to be directly involved in the combining mechanism that employs an integrating unit to adjust the weight of each expert according to the input. A gating network is used to identify the situation that the input image is taken and assign particular weights to each expert. Experimental results indicate that using this dynamic structure gives higher recognition rates rather than using a static one where the weights for each expert are fixed. In Ref. [11], the authors derive an owner-specific LDA-subspace in order to create a personalized face verification (2-class classification) system, where the owner identity is the true identity. The training set is partitioned into a number of clusters and the cluster that contains face data that are most similar to the owner face is identified. The system assigns the owner training images to this particular cluster and this new data set is used to determine an LDA-subspace that is used to compute the verification thresholds and matching score, when a test face claims the identify of the owner. The authors show that verification performance is enhanced when owner-specific LDA-subspaces are utilized, rather than using the LDA space created by processing the entire training set.

This paper presents a novel framework that uses dynamic training in a multistage clustering process that employs discriminant analysis. For notation compactness, this algorithm shall be referred to as DTMC throughout the rest of this paper. This methodology is not restricted to FR, but is able to deal with any problem that fits into the same formalism. At this point, it is imperative that two terms that are frequently used in this paper are defined: ‘class’ refers to a set of face images from the same person, whereas ‘cluster’ refers to a set of classes.

Initially, facial feature extraction is carried out by making use of the multilevel 2-D wavelet decomposition (MWD2) algorithm [12], [13], which provides dimensionality reduction and its use has been shown to be appropriate for classification purposes [6], [14], [15]. Then, the training and test face feature vectors are projected onto a MDF-space that is created by employing the RLDA method of Ref. [9]. Subsequently, the k-means algorithm is used to partition the training data into a set of discriminant clusters. The distance of the test face from the cluster centroids is used to collect a subset of clusters that are closest to the test face. The cardinality of this subset is set through an entropy-based measure that is calculated by making use of the discrete probability histogram. Then, a new MDF-space is created from this cluster subset with its dimensions set so as to reduce classification problems that stem from possible large variations in the set of images of each face class. The training data projected to this new space are again clustered and a new subset that is closer to the test face is selected. This process is repeated in as many iterations as necessary, until a single cluster is selected that contains just one face class. The identity of this face class is set as the best match to the identity of the test face.

The proposed method is computationally efficient, compared to ‘divide and conquer’ techniques such as the one in Ref. [1] where multiple classification results are produced by applying an individual discriminant analysis process and a NN classifier to each cluster. Our method uses a single discriminant analysis operation at each clustering level, with the number of clustering levels being generally much smaller than the number of clusters since only a small subset of the training data is retained at each level. A heavy computational cost also accompanies algorithms that construct hierarchical trees or space tessellation, as is the case with using the HDA algorithm in Ref. [4]. The purpose of this type of algorithms is to provide a manageable discriminant solution for each and every face class by recursively subdividing the complete set of training samples into smaller classification problems. On the other hand, at each clustering step our algorithm only has to provide a discriminant solution for the face classes that are closer to the test face; the training data that correspond to the remaining face classes are discarded.

The structure of the DTMC algorithm is flexible to the adding of new training faces. Specifically, when a new training face is added to the database the only change needed in the DTMC process is to increase the dimension of the first MDF-space by one. The characteristics of the test face will determine which set of clusters, which may or may not contain the new face class, will be retained for the clustering level that follows. On the contrary, the hierarchical tree structure requires a complete re-learning of the full training space since the new MDF-space at the first tree level may lead to an entirely different decomposition result.

The MDF-spaces that the hierarchical tree or space tessellation structures utilize are generated in the learning phase and are not biased by the characteristics of the test face. On the contrary, the MDF-spaces created at each clustering level of the DTMC algorithm are indeed biased with respect to the characteristics of the test face. Based on the conclusions of Refs. [10], [11] that have been summarized above, more accurate classification results are to be expected by DTMC since it employs a dynamic classification structure that utilizes a series of test-face-specific subspaces.

The outline of this paper is as follows: Section 2 describes the feature extraction method that utilizes the MWD2 algorithm, reviews the RLDA method that is used to extract the MDF-spaces before each clustering process and presents the k-means algorithm that is used to partition the training data as well as the entropy-based measure that is used to define the number of clusters that are retained. Section 3 describes the complete DTMC FR methodology that is proposed in this paper. Experimental results are reported in Section 4, where the DTMC methodology is tested using the well-established UMIST [16], ORL [17], and XM2VTS [18] databases in order to assess its recognition capabilities on standard data sets. Moreover, the performance of DTMC is compared to a number of FR algorithms that have been recently proposed by the research community.

Section snippets

Feature selection and the DTMC building blocks

This section briefly describes how the MWD2 algorithm is utilized to extract features from the face images at a selected decomposition level. In addition, the RLDA and k-means algorithms that DTMC uses are briefly reviewed. Finally, the entropy-based measure that is used at each clustering level to select a subset of the training data is presented.

The DTMC FR methodology

The DTMC algorithm is a multilevel process that, at each level, attempts to solve a redefined classification problem that is formulated by making use of dynamic training. Let us assume that an image X of a test face is to be assigned to one of the Y distinct classes Yi,i=1,,Y, that lie in the training set space T. In addition, let us assume that each ith class in T is represented by NYi images and the total number of training images is NY. The face images that comprise the training set T can

Experimental evaluation of DTMC

In this section, the efficiency of the proposed methodology is evaluated on standard facial image data sets. The classification ability of DTMC is investigated by using data from the ORL, XM2VTS and FERET databases, whereas the UMIST database was used to set the values of the threshold TH and the regularization parameter R at each clustering level, i.e., at each iteration of the DTMC algorithm. Essentially, as in most FR applications, the classification experiments that are carried out fall

Conclusion

A novel FR methodology is proposed and its performance is evaluated. The DTMC algorithm uses dynamic training in a multistage clustering scheme in order to classify a test face by solving a set of simpler classification problems. This process iterates until one final cluster is selected that consists of a single face class, whose identity is set to be the best match to the identity of the test face. Certain parameters of DTMC are defined using the UMIST face database. This method was tested on

Acknowledgements

This work is funded by the network of excellence BioSecure IST-2002-507634 (Biometrics for Secure Authentication, http://www.biosecure.info), under Information Society Technologies (IST) priority of the 6th Framework Programme of the European Community.

About the AuthorMARIOS KYPEROUNTAS received the B.Sc. in Electrical Engineering in 2002 and the M.Sc. in Electrical Engineering in 2003, both from Florida Atlantic University in Boca Raton, Florida.

He was a research assistant at the FAU Imaging Technology Center from 2000 until 2003 where he worked on several high-resolution imaging R&D projects funded by NASA, DARPA and the US NAVY. Currently, he is a Ph.D. student at the Artificial Intelligence and Information Analysis lab of the Department

References (45)

  • H.-M. Tang, M.R. Lyu, I. King, Face recognition committee machines: dynamic vs. static structures, in: Proceedings of...
  • H.-C. Liu, C.-H. Su, Y.-H. Chiang, Y.-P. Hung, Personalized face verification system using owner-specific...
  • S.G. Mallat

    A theory for multi-resolution signal decomposition, the wavelet representation

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1989)
  • I. Daubechies, Ten lectures on wavelets, CBMS-NSF Conference Series in Applied Mathematics, SIAM (Ed.),...
  • B. Zhang et al.

    Face recognition by applying wavelet subband representation and kernel associative memories

    IEEE Trans. Neural Networks

    (2004)
  • M. Bicego, U. Castellani, V. Murino, Using hidden Markov models and wavelets for face recognition, in: Proceedings of...
  • D.B. Graham, N.M. Allinson, Characterizing virtual eigensignatures for general purpose face recognition, in: H....
  • AT & T Laboratories Cambridge, The Database of Faces,...
  • J. Luettin et al.

    Evaluation protocol for the extended M2VTS database (XM2VTSDB)

  • G. Strang et al.

    Wavelets and Filter Banks

    (1996)
  • C. Nastar et al.

    Frequency-based nonrigid motion analysis

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1996)
  • F. Camastra et al.

    A novel kernel method for clustering

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2005)
  • Cited by (20)

    • An efficient indexing scheme for face database using modified geometric hashing

      2013, Neurocomputing
      Citation Excerpt :

      Li et al. [13] have presented a recognition scheme using Bayesian based support vector machines and adaptive hierarchical agglomerative clustering. In [14] a dynamically trained multistage clustering based approach for face recognition is proposed. In case of iris database, an efficient indexing scheme for binary control point template using B+ tree has been proposed in [15].

    • Dynamic action recognition based on dynemes and Extreme Learning Machine

      2013, Pattern Recognition Letters
      Citation Excerpt :

      The dynamic nature of DCM is based on the adopted fusion strategy, where the experts’ weights are modified depending on the corresponding test sample. Kyperountas et al. proposed a dynamic classification scheme involving an iterative grouping procedure combined with LDA-based data classification (Kyperountas et al., 2008). The iterative procedure used in order to determine the optimal training set for LDA based data classification is intuitive and effective.

    • Improving subspace learning for facial expression recognition using person dependent and geometrically enriched training sets

      2011, Neural Networks
      Citation Excerpt :

      In the following two paragraphs the subspace learning methods, which are commonly used in appearance-based approaches, will be introduced. Subspace learning methods are based on principles originally used for statistical pattern recognition and have been successfully implemented in many computer vision problems, such as facial expression classification (Kyperountas et al., 2010), human face recognition (Kyperountas, Tefas, & Pitas, 2008) and object recognition (Leibe & Schiele, 2003). The problem that emerges, when it comes to appearance-based methods, is that usually initial images lie on a high dimensional space.

    • Methodological improvement on local Gabor face recognition based on feature selection and enhanced Borda count

      2011, Pattern Recognition
      Citation Excerpt :

      SFA is a special case of weighted kernel principal component analysis (KPCA) [23]. Dynamic Training Multistage Clustering (DTMC) [24] is a face recognition method that uses a discriminant analysis to project the face classes and does clustering to divide the projected data. To create the most useful clusters, an entropy-based measure is used.

    View all citing articles on Scopus

    About the AuthorMARIOS KYPEROUNTAS received the B.Sc. in Electrical Engineering in 2002 and the M.Sc. in Electrical Engineering in 2003, both from Florida Atlantic University in Boca Raton, Florida.

    He was a research assistant at the FAU Imaging Technology Center from 2000 until 2003 where he worked on several high-resolution imaging R&D projects funded by NASA, DARPA and the US NAVY. Currently, he is a Ph.D. student at the Artificial Intelligence and Information Analysis lab of the Department of Informatics at the Aristotle University of Thessaloniki and is working as an Image Processing Engineer in Santa Barbara, California. His research interests include high resolution and ultrasonic imaging, pattern recognition, DSP algorithms and real-time video processing.

    Kyperountas is a member of the Golden Key Honor Society, the Phi-Kappa-Phi Honor Society and the Tau-Beta-Pi Engineering Honor Society.

    About the AuthorANASTASIOS TEFAS received the B.Sc. in informatics in 1997 and the Ph.D. degree in informatics in 2002, both from the Aristotle University of Thessaloniki, Greece.

    Since 2006, he has been an Assistant Professor at the Department of Information Management, Technological Educational Institute of Kavala. From 1997 to 2002, he was a researcher and teaching assistant in the Department of Informatics, University of Thessaloniki. From 2003 to 2004, he was a temporary lecturer in the Department of Informatics, University of Thessaloniki where he is currently a senior researcher. He has co-authored over 50 journal and conference papers. His current research interests include computational intelligence, pattern recognition, digital signal and image processing, detection and estimation theory, and computer vision.

    About the AuthorIOANNIS PITAS received the Diploma of Electrical Engineering in 1980 and the Ph.D. degree in electrical engineering in 1985, both from the University of Thessaloniki, Greece.

    Since 1994 he has been a Professor at the Department of Informatics, University of Thessaloniki, Greece. From 1980 to 1993 he served as Scientific Assistant, Lecturer, Assistant Professor, and Associate Professor in the Department of Electrical and Computer Engineering at the same university. He served as Visiting Professor and ASI fellow at the University of British Columbia, Canada, as Visiting Professor at Ecole Polytechnique Federale de Lausanne, at Tampere University of Technology, Finland, as Visiting Assistant Professor at the University of Toronto and as a Visiting Research Associate at the University of Toronto, Canada and at the University of Erlangen-Nuernberg, Germany. He has published over 510 papers, contributed in 20 books and authored, co-authored, edited, co-edited seven books in his area of interest. His current interests are in the areas of digital image processing, multimedia signal processing, multidimensional signal processing and computer vision.

    Dr. Pitas has given 24 invited lectures, was member of the program committee of more than 115 scientific conferences and workshops and was chair of more than 35 conference sessions. He is/was Associate Editor of the IEEE Transactions on Circuits and Systems, IEEE Transactions on Neural Networks, IEEE Transactions on Image processing, IJIG, IEICE, Circuits Systems and Signal Processing (CSSP), co-editor of Multidimensional Systems and Signal Processing, member of the editorial board of six journals and guest editor in six special journal issues. He is member of the National Research Council of Greece.

    1

    Present address: 40 S. Patterson Avenue #207, Santa Barbara, CA 93111, USA.

    View full text