Dynamic training using multistage clustering for face recognition

doi:10.1016/j.patcog.2007.06.017

Pattern Recognition

Volume 41, Issue 3, March 2008, Pages 894-905

https://doi.org/10.1016/j.patcog.2007.06.017 Get rights and content

Abstract

A novel face recognition algorithm that uses dynamic training in a multistage clustering scheme is presented and evaluated. This algorithm uses discriminant analysis to project the face classes and a clustering algorithm to partition the projected face data, thus forming a set of discriminant clusters. Then, an iterative process creates subsets, whose cardinality is defined by an entropy-based measure, that contain the most useful clusters. The best match to the test face is found when only a single face class is retained. This method was tested on the ORL, XM2VTS and FERET face databases, whereas the UMIST database was used in order to train the proposed algorithm. Experimental results indicate that the proposed framework provides a promising solution to the face recognition problem.

Introduction

Face recognition (FR) is an active research field that has received great attention in the past several years. A FR system usually attempts to determine the identity of the test face by computing and ranking all similarity scores between the test face and all human faces stored in the system database that constitute the training set. However, the performance of many state-of-the-art FR methods deteriorates rapidly when large, in terms of the number of faces, databases are considered [1], [2]. Specifically, the facial feature representation obtained by methods that use linear criteria, which normally require images to follow a convex distribution, is not capable of generalizing all the introduced variations due e.g. to large differences in viewpoint, illumination and facial expression, when large data sets are used. When nonlinear face representation methods are employed, problems such as over-fitting, computational complexity and difficulties in optimizing the involved parameters often appear [1]. Moreover, the performance of FR methods deteriorates when there is lack of a sufficiently large number of training samples for each face in the database as, in this case, the intra-person variations cannot be modelled properly. More specifically, linear methods, such as linear discriminant analysis (LDA), often suffer from the small sample size (SSS) problem, where the dimensionality of the samples is larger than the number of available training samples [3].

Recently, various methods have been proposed in order to restrict the maladies that are imposed by the two aforementioned types of problems on the recognition performance. The ‘divide and conquer’ principle, by which a database is decomposed into smaller sets in order to piecewise learn the complex distribution by a mixture of local linear models, has been widely used. In Ref. [1], a separability criterion is employed to partition a training set from a large database into a set of smaller maximal separability clusters (MSCs) by utilizing an LDA-like technique. Based on these MSCs, a hierarchical classification framework that consists of two levels of nearest neighbour (NN) classifiers is employed and the match is found. The work in Ref. [4] concentrates on the hierarchical partitioning of the feature spaces using hierarchical discriminant analysis (HDA). A space tessellation tree is generated using the most expressive features (MEF), by employing principal component analysis (PCA), and the most discriminating features (MDF), by employing LDA, at each tree level. This is done to avoid the limitations linked to global features, by deriving a recursively better-fitted set of features for each of the recursively subdivided sets of training samples. In general, hierarchical trees have been extensively used for pattern recognition purposes.

LDA is an important statistical tool that has been shown to be effective in FR or verification problems [5], [6]. Traditionally, in order to improve LDA-based methods and provide solutions for the SSS problem, LDA is applied in a lower-dimensional PCA subspace, so as to discard the null space (i.e., the subspace defined by the eigenvectors that correspond to zero eigenvalues) of the within-class scatter matrix of the training data set [5]. However, it has been shown [7] that significant discriminant information is contained in the discarded space and alternative solutions have been sought. Specifically, in Ref. [8] a direct-LDA (DLDA) algorithm is presented that discards the null space of the between-class scatter matrix, which is claimed to contain no useful information, rather than discard the null space of the within-class scatter matrix. More recently, in an attempt to address the SSS problem, the regularized LDA method (RLDA) was presented in Ref. [9], which employs a regularized Fisher's separability criterion. The purpose of regularization is to reduce the high variance related to the eigenvalue estimates of the within-class scatter matrix, at the expense of potentially increased classification bias.

The use of static training structures, where the input data is not involved in determining the system parameters, has been abundant when designing pattern classification systems. However, it has been demonstrated that the classification performance can be improved by employing dynamic training structures. In this spirit, the Dynamic face recognition Committee Machine (DCM) was presented in Ref. [10], consisting of five state-of-the-art pattern classification algorithms. The proposed dynamic structure requires for the input to be directly involved in the combining mechanism that employs an integrating unit to adjust the weight of each expert according to the input. A gating network is used to identify the situation that the input image is taken and assign particular weights to each expert. Experimental results indicate that using this dynamic structure gives higher recognition rates rather than using a static one where the weights for each expert are fixed. In Ref. [11], the authors derive an owner-specific LDA-subspace in order to create a personalized face verification (2-class classification) system, where the owner identity is the true identity. The training set is partitioned into a number of clusters and the cluster that contains face data that are most similar to the owner face is identified. The system assigns the owner training images to this particular cluster and this new data set is used to determine an LDA-subspace that is used to compute the verification thresholds and matching score, when a test face claims the identify of the owner. The authors show that verification performance is enhanced when owner-specific LDA-subspaces are utilized, rather than using the LDA space created by processing the entire training set.

This paper presents a novel framework that uses dynamic training in a multistage clustering process that employs discriminant analysis. For notation compactness, this algorithm shall be referred to as DTMC throughout the rest of this paper. This methodology is not restricted to FR, but is able to deal with any problem that fits into the same formalism. At this point, it is imperative that two terms that are frequently used in this paper are defined: ‘class’ refers to a set of face images from the same person, whereas ‘cluster’ refers to a set of classes.

Initially, facial feature extraction is carried out by making use of the multilevel 2-D wavelet decomposition (MWD2) algorithm [12], [13], which provides dimensionality reduction and its use has been shown to be appropriate for classification purposes [6], [14], [15]. Then, the training and test face feature vectors are projected onto a MDF-space that is created by employing the RLDA method of Ref. [9]. Subsequently, the k-means algorithm is used to partition the training data into a set of discriminant clusters. The distance of the test face from the cluster centroids is used to collect a subset of clusters that are closest to the test face. The cardinality of this subset is set through an entropy-based measure that is calculated by making use of the discrete probability histogram. Then, a new MDF-space is created from this cluster subset with its dimensions set so as to reduce classification problems that stem from possible large variations in the set of images of each face class. The training data projected to this new space are again clustered and a new subset that is closer to the test face is selected. This process is repeated in as many iterations as necessary, until a single cluster is selected that contains just one face class. The identity of this face class is set as the best match to the identity of the test face.

The proposed method is computationally efficient, compared to ‘divide and conquer’ techniques such as the one in Ref. [1] where multiple classification results are produced by applying an individual discriminant analysis process and a NN classifier to each cluster. Our method uses a single discriminant analysis operation at each clustering level, with the number of clustering levels being generally much smaller than the number of clusters since only a small subset of the training data is retained at each level. A heavy computational cost also accompanies algorithms that construct hierarchical trees or space tessellation, as is the case with using the HDA algorithm in Ref. [4]. The purpose of this type of algorithms is to provide a manageable discriminant solution for each and every face class by recursively subdividing the complete set of training samples into smaller classification problems. On the other hand, at each clustering step our algorithm only has to provide a discriminant solution for the face classes that are closer to the test face; the training data that correspond to the remaining face classes are discarded.

The structure of the DTMC algorithm is flexible to the adding of new training faces. Specifically, when a new training face is added to the database the only change needed in the DTMC process is to increase the dimension of the first MDF-space by one. The characteristics of the test face will determine which set of clusters, which may or may not contain the new face class, will be retained for the clustering level that follows. On the contrary, the hierarchical tree structure requires a complete re-learning of the full training space since the new MDF-space at the first tree level may lead to an entirely different decomposition result.

The MDF-spaces that the hierarchical tree or space tessellation structures utilize are generated in the learning phase and are not biased by the characteristics of the test face. On the contrary, the MDF-spaces created at each clustering level of the DTMC algorithm are indeed biased with respect to the characteristics of the test face. Based on the conclusions of Refs. [10], [11] that have been summarized above, more accurate classification results are to be expected by DTMC since it employs a dynamic classification structure that utilizes a series of test-face-specific subspaces.

The outline of this paper is as follows: Section 2 describes the feature extraction method that utilizes the MWD2 algorithm, reviews the RLDA method that is used to extract the MDF-spaces before each clustering process and presents the k-means algorithm that is used to partition the training data as well as the entropy-based measure that is used to define the number of clusters that are retained. Section 3 describes the complete DTMC FR methodology that is proposed in this paper. Experimental results are reported in Section 4, where the DTMC methodology is tested using the well-established UMIST [16], ORL [17], and XM2VTS [18] databases in order to assess its recognition capabilities on standard data sets. Moreover, the performance of DTMC is compared to a number of FR algorithms that have been recently proposed by the research community.

Section snippets

Feature selection and the DTMC building blocks

This section briefly describes how the MWD2 algorithm is utilized to extract features from the face images at a selected decomposition level. In addition, the RLDA and k-means algorithms that DTMC uses are briefly reviewed. Finally, the entropy-based measure that is used at each clustering level to select a subset of the training data is presented.

The DTMC FR methodology

The DTMC algorithm is a multilevel process that, at each level, attempts to solve a redefined classification problem that is formulated by making use of dynamic training. Let us assume that an image $X$ of a test face is to be assigned to one of the $Y$ distinct classes $Y_{i}, i = 1, \dots, Y$ , that lie in the training set space $T$ . In addition, let us assume that each $i$ th class in $T$ is represented by $N_{Y_{i}}$ images and the total number of training images is $N_{Y}$ . The face images that comprise the training set $T$ can

Experimental evaluation of DTMC

In this section, the efficiency of the proposed methodology is evaluated on standard facial image data sets. The classification ability of DTMC is investigated by using data from the ORL, XM2VTS and FERET databases, whereas the UMIST database was used to set the values of the threshold $T_{H}$ and the regularization parameter R at each clustering level, i.e., at each iteration of the DTMC algorithm. Essentially, as in most FR applications, the classification experiments that are carried out fall

Conclusion

A novel FR methodology is proposed and its performance is evaluated. The DTMC algorithm uses dynamic training in a multistage clustering scheme in order to classify a test face by solving a set of simpler classification problems. This process iterates until one final cluster is selected that consists of a single face class, whose identity is set to be the best match to the identity of the test face. Certain parameters of DTMC are defined using the UMIST face database. This method was tested on

Acknowledgements

This work is funded by the network of excellence BioSecure IST-2002-507634 (Biometrics for Secure Authentication, http://www.biosecure.info), under Information Society Technologies (IST) priority of the 6th Framework Programme of the European Community.

About the Author—MARIOS KYPEROUNTAS received the B.Sc. in Electrical Engineering in 2002 and the M.Sc. in Electrical Engineering in 2003, both from Florida Atlantic University in Boca Raton, Florida.

References (45)

L.-F. Chen et al.
A new LDA-based face recognition system which can solve the small sample size problem
Pattern Recognition
(2000)
H. Yu et al.
A direct LDA algorithm for high-dimensional data with application to face recognition
Pattern Recognition
(2001)
J. Lu et al.
Regularization studies of linear discriminant analysis in small sample size scenarios with application to face recognition
Pattern Recognition Lett.
(2005)
J.H. Lai et al.
Face recognition using holistic Fourier invariant features
Pattern Recognition
(2001)
J. Lu, K.N. Plataniotis, Boosting face recognition on a large-scale database, in: Proceedings of the IEEE International...
G.D. Guo, H.J. Zhang, S.Z. Li, Pairwise face recognition, in: Proceedings of the Eighth IEEE International Conference...
J. Lu et al.
Face recognition using LDA based algorithms
IEEE Trans. Neural Networks
(2003)
D.L. Swets et al.
Hierarchical discriminant analysis for image retrieval
IEEE Trans. Pattern Anal. Mach. Intell.
(1999)
P.N. Belhumeur et al.
Eigenfaces vs. fisherfaces: recognition using class specific linear projection
IEEE Trans. Pattern Anal. Mach. Intell.
(1997)
K. Etemad et al.
Discriminant analysis for recognition of human face images
J. Opt. Soc. Am. A: Opt. Image Sci. Vision
(1997)

H.-M. Tang, M.R. Lyu, I. King, Face recognition committee machines: dynamic vs. static structures, in: Proceedings of...

H.-C. Liu, C.-H. Su, Y.-H. Chiang, Y.-P. Hung, Personalized face verification system using owner-specific...

S.G. Mallat

A theory for multi-resolution signal decomposition, the wavelet representation

IEEE Trans. Pattern Anal. Mach. Intell.

(1989)

I. Daubechies, Ten lectures on wavelets, CBMS-NSF Conference Series in Applied Mathematics, SIAM (Ed.),...

B. Zhang et al.

Face recognition by applying wavelet subband representation and kernel associative memories

IEEE Trans. Neural Networks

(2004)

M. Bicego, U. Castellani, V. Murino, Using hidden Markov models and wavelets for face recognition, in: Proceedings of...

D.B. Graham, N.M. Allinson, Characterizing virtual eigensignatures for general purpose face recognition, in: H....

AT & T Laboratories Cambridge, The Database of Faces,...

J. Luettin et al.

Evaluation protocol for the extended M2VTS database (XM2VTSDB)

G. Strang et al.

Wavelets and Filter Banks

(1996)

C. Nastar et al.

Frequency-based nonrigid motion analysis

IEEE Trans. Pattern Anal. Mach. Intell.

(1996)

F. Camastra et al.

A novel kernel method for clustering

IEEE Trans. Pattern Anal. Mach. Intell.

(2005)

Cited by (20)

An efficient indexing scheme for face database using modified geometric hashing
2013, Neurocomputing
Citation Excerpt :
Li et al. [13] have presented a recognition scheme using Bayesian based support vector machines and adaptive hierarchical agglomerative clustering. In [14] a dynamically trained multistage clustering based approach for face recognition is proposed. In case of iris database, an efficient indexing scheme for binary control point template using B+ tree has been proposed in [15].
This paper presents an efficient scheme to index the database of facial images. It has made use of the modified geometric hashing technique. It uses minimum amount of search space and memory to provide top best matches with high accuracy against a query image. Control points are extracted using Speeded-Up Robust Feature points (SURF) operator. A pre-processing technique consisting of mean centring, principal components, rotation and normalization has been proposed to make these control points invariant to translation, rotation and scaling. The modified geometric hashing is used to hash these control points to index of the hash table. The indexing scheme has been tested on FERET face database which has achieved 100% hit rate for top 4 best matches.
Dynamic action recognition based on dynemes and Extreme Learning Machine
2013, Pattern Recognition Letters
Citation Excerpt :
The dynamic nature of DCM is based on the adopted fusion strategy, where the experts’ weights are modified depending on the corresponding test sample. Kyperountas et al. proposed a dynamic classification scheme involving an iterative grouping procedure combined with LDA-based data classification (Kyperountas et al., 2008). The iterative procedure used in order to determine the optimal training set for LDA based data classification is intuitive and effective.
In this paper, we propose a novel method that performs dynamic action classification by exploiting the effectiveness of the Extreme Learning Machine (ELM) algorithm for single hidden layer feedforward neural networks training. It involves data grouping and ELM based data projection in multiple levels. Given a test action instance, a neural network is trained by using labeled action instances forming the groups that reside to the test sample’s neighborhood. The action instances involved in this procedure are, subsequently, mapped to a new feature space, determined by the trained network outputs. This procedure is performed multiple times, which are determined by the test action instance at hand, until only a single class is retained. Experimental results denote the effectiveness of the dynamic classification approach, compared to the static one, as well as the effectiveness of the ELM in the proposed dynamic classification setting.
Improving subspace learning for facial expression recognition using person dependent and geometrically enriched training sets
2011, Neural Networks
Citation Excerpt :
In the following two paragraphs the subspace learning methods, which are commonly used in appearance-based approaches, will be introduced. Subspace learning methods are based on principles originally used for statistical pattern recognition and have been successfully implemented in many computer vision problems, such as facial expression classification (Kyperountas et al., 2010), human face recognition (Kyperountas, Tefas, & Pitas, 2008) and object recognition (Leibe & Schiele, 2003). The problem that emerges, when it comes to appearance-based methods, is that usually initial images lie on a high dimensional space.
In this paper, the robustness of appearance-based subspace learning techniques in geometrical transformations of the images is explored. A number of such techniques are presented and tested using four facial expression databases. A strong correlation between the recognition accuracy and the image registration error has been observed. Although it is common-knowledge that appearance-based methods are sensitive to image registration errors, there is no systematic experiment reported in the literature. As a result of these experiments, the training set enrichment with translated, scaled and rotated images is proposed for confronting the low robustness of these techniques in facial expression recognition. Moreover, person dependent training is proven to be much more accurate for facial expression recognition than generic learning.
Methodological improvement on local Gabor face recognition based on feature selection and enhanced Borda count
2011, Pattern Recognition
Citation Excerpt :
SFA is a special case of weighted kernel principal component analysis (KPCA) [23]. Dynamic Training Multistage Clustering (DTMC) [24] is a face recognition method that uses a discriminant analysis to project the face classes and does clustering to divide the projected data. To create the most useful clusters, an entropy-based measure is used.
Face recognition has a wide range of possible applications in surveillance, human computer interfaces and marketing and advertising goods for selected customers according to age and gender. Because of the high classification rate and reduced computational time, one of the best methods for face recognition is based on Gabor jet feature extraction and Borda count classification. In this paper, we propose methodological improvements to increase face recognition rate by selection of Gabor jets using entropy and genetic algorithms. This selection of jets additionally allows faster processing for real-time face recognition. We also propose improvements in the Borda count classification through a weighted Borda count and a threshold to eliminate low score jets from the voting process to increase the face recognition rate. Combinations of Gabor jet selection and Borda count improvements are also proposed. We compare our results with those published in the literature to date and find significant improvements. Our best results on the FERET database are 99.8%, 99.5%, 89.2% and 86.8% recognition rates on the subsets Fb, Fc, Dup1 and Dup2, respectively. Compared to the best results published in the literature, the total number of recognition errors decreased from 163 to 112 (31%). We also tested the proposed method under illumination changes, occlusions with sunglasses and scarves and for small pose variations. Results on two different face databases (AR and Extended Yale B) with significant illumination changes showed over 90% recognition rate. The combination EJS–BTH–BIP reached 98% and 99% recognition rate in images with sunglasses and scarves from the AR database, respectively. The proposed method reached 93.5% recognition on faces with small pose variation of 25° rotation and 98.5% with 15% rotation in the FERET database.
Salient feature and reliable classifier selection for facial expression classification
2010, Pattern Recognition
A novel facial expression classification (FEC) method is presented and evaluated. The classification process is decomposed into multiple two-class classification problems, a choice that is analytically justified, and unique sets of features are extracted for each classification problem. Specifically, for each two-class problem, an iterative feature selection process that utilizes a class separability measure is employed to create salient feature vectors (SFVs), where each SFV is composed of a selected feature subset. Subsequently, two-class discriminant analysis is applied on the SFVs to produce salient discriminant hyper-planes (SDHs), which are used to train the corresponding two-class classifiers. To properly integrate the two-class classification results and produce the FEC decision, a computationally efficient and fast classification scheme is developed. During each step of this scheme, the most reliable classifier is identified and utilized, thus, a more accurate final classification decision is produced. The JAFFE and the MMI databases are used to evaluate the performance of the proposed salient-feature-and-reliable-classifier selection (SFRCS) methodology. Classification rates of 96.71% and 93.61% are achieved under the leave-one-sample-out evaluation strategy, and 85.92% under the leave-one-subject-out evaluation strategy.
Scalable Semi-Supervised Clustering for Face Recognition with Insufficient Labelled Samples
2022, Pattern Recognition and Image Analysis

View all citing articles on Scopus

He was a research assistant at the FAU Imaging Technology Center from 2000 until 2003 where he worked on several high-resolution imaging R&D projects funded by NASA, DARPA and the US NAVY. Currently, he is a Ph.D. student at the Artificial Intelligence and Information Analysis lab of the Department of Informatics at the Aristotle University of Thessaloniki and is working as an Image Processing Engineer in Santa Barbara, California. His research interests include high resolution and ultrasonic imaging, pattern recognition, DSP algorithms and real-time video processing.

Kyperountas is a member of the Golden Key Honor Society, the Phi-Kappa-Phi Honor Society and the Tau-Beta-Pi Engineering Honor Society.

About the Author—ANASTASIOS TEFAS received the B.Sc. in informatics in 1997 and the Ph.D. degree in informatics in 2002, both from the Aristotle University of Thessaloniki, Greece.

Since 2006, he has been an Assistant Professor at the Department of Information Management, Technological Educational Institute of Kavala. From 1997 to 2002, he was a researcher and teaching assistant in the Department of Informatics, University of Thessaloniki. From 2003 to 2004, he was a temporary lecturer in the Department of Informatics, University of Thessaloniki where he is currently a senior researcher. He has co-authored over 50 journal and conference papers. His current research interests include computational intelligence, pattern recognition, digital signal and image processing, detection and estimation theory, and computer vision.

About the Author—IOANNIS PITAS received the Diploma of Electrical Engineering in 1980 and the Ph.D. degree in electrical engineering in 1985, both from the University of Thessaloniki, Greece.

Since 1994 he has been a Professor at the Department of Informatics, University of Thessaloniki, Greece. From 1980 to 1993 he served as Scientific Assistant, Lecturer, Assistant Professor, and Associate Professor in the Department of Electrical and Computer Engineering at the same university. He served as Visiting Professor and ASI fellow at the University of British Columbia, Canada, as Visiting Professor at Ecole Polytechnique Federale de Lausanne, at Tampere University of Technology, Finland, as Visiting Assistant Professor at the University of Toronto and as a Visiting Research Associate at the University of Toronto, Canada and at the University of Erlangen-Nuernberg, Germany. He has published over 510 papers, contributed in 20 books and authored, co-authored, edited, co-edited seven books in his area of interest. His current interests are in the areas of digital image processing, multimedia signal processing, multidimensional signal processing and computer vision.

Dr. Pitas has given 24 invited lectures, was member of the program committee of more than 115 scientific conferences and workshops and was chair of more than 35 conference sessions. He is/was Associate Editor of the IEEE Transactions on Circuits and Systems, IEEE Transactions on Neural Networks, IEEE Transactions on Image processing, IJIG, IEICE, Circuits Systems and Signal Processing (CSSP), co-editor of Multidimensional Systems and Signal Processing, member of the editorial board of six journals and guest editor in six special journal issues. He is member of the National Research Council of Greece.

¹: Present address: 40 S. Patterson Avenue #207, Santa Barbara, CA 93111, USA.

View full text

Dynamic training using multistage clustering for face recognition

Abstract

Introduction

Section snippets

Feature selection and the DTMC building blocks

The DTMC FR methodology

Experimental evaluation of DTMC

Conclusion

Acknowledgements

Pattern Recognition

Pattern Recognition

Pattern Recognition Lett.

Pattern Recognition

Face recognition using LDA based algorithms

IEEE Trans. Neural Networks

Hierarchical discriminant analysis for image retrieval

IEEE Trans. Pattern Anal. Mach. Intell.

Eigenfaces vs. fisherfaces: recognition using class specific linear projection

IEEE Trans. Pattern Anal. Mach. Intell.

Discriminant analysis for recognition of human face images

J. Opt. Soc. Am. A: Opt. Image Sci. Vision

A theory for multi-resolution signal decomposition, the wavelet representation

IEEE Trans. Pattern Anal. Mach. Intell.

Face recognition by applying wavelet subband representation and kernel associative memories

IEEE Trans. Neural Networks

Evaluation protocol for the extended M2VTS database (XM2VTSDB)

Wavelets and Filter Banks

Frequency-based nonrigid motion analysis

IEEE Trans. Pattern Anal. Mach. Intell.

A novel kernel method for clustering

IEEE Trans. Pattern Anal. Mach. Intell.