Elsevier

Pattern Recognition

Volume 42, Issue 5, May 2009, Pages 764-779
Pattern Recognition

Perturbation LDA: Learning the difference between the class empirical mean and its expectation

https://doi.org/10.1016/j.patcog.2008.09.012Get rights and content

Abstract

Fisher's linear discriminant analysis (LDA) is popular for dimension reduction and extraction of discriminant features in many pattern recognition applications, especially biometric learning. In deriving the Fisher's LDA formulation, there is an assumption that the class empirical mean is equal to its expectation. However, this assumption may not be valid in practice. In this paper, from the “perturbation” perspective, we develop a new algorithm, called perturbation LDA (P-LDA), in which perturbation random vectors are introduced to learn the effect of the difference between the class empirical mean and its expectation in Fisher criterion. This perturbation learning in Fisher criterion would yield new forms of within-class and between-class covariance matrices integrated with some perturbation factors. Moreover, a method is proposed for estimation of the covariance matrices of perturbation random vectors for practical implementation. The proposed P-LDA is evaluated on both synthetic data sets and real face image data sets. Experimental results show that P-LDA outperforms the popular Fisher's LDA-based algorithms in the undersampled case.

Introduction

Data in some applications such as biometric learning are of high dimension, while available samples for each class are always limited. In view of this, dimension reduction is always desirable, and at the same time it is also expected that data of different classes can be more easily separated in the lower-dimensional subspace. Among the developed techniques for this purpose, Fisher's linear discriminant analysis (LDA)1 [1], [2], [3], [4] has been widely and popularly used as a powerful tool for extraction of discriminant features. The basic principle of Fisher's LDA is to find a projection matrix such that the ratio between the between-class variance and within-class variance is maximized in a lower-dimensional feature subspace.

Due to the curse of high dimensionality and the limit of training samples, within-class scatter matrix Sw is always singular, so that classical Fisher's LDA will fail. This kind of singularity problem is always called the small sample size problem [5], [6] in Fisher's LDA. So far, some well-known variants of Fisher's LDA have been developed to overcome this problem. Among them, Fisherface (PCA+LDA) [5], nullspace LDA (N-LDA) [6], [7], [8] and regularized LDA (R-LDA) [9], [10], [11], [12], [13] are three representative algorithms. In “PCA+LDA”, Fisher's LDA is performed in a principal component subspace, in which within-class covariance matrix will be of full rank. In N-LDA, the nullspace of within-class covariance matrix Sw is first extracted, and then data are projected onto that subspace and finally a discriminant transform is found there for maximization of the variance among between-class data. In R-LDA, a regularized term, such as λ·I where λ>0, is added to Sw. Some other approaches, such as Direct LDA [14], LDA/QR [15] and some constrained LDA [16], [17], are also developed. Recently, some efforts are made for development of two-dimensional LDA techniques (2D-LDA) [18], [19], [20], which perform directly on matrix-form data. A recent study [21] conducts comprehensive theoretical and experimental comparisons between the traditional Fisher's LDA techniques and some representative 2D-LDA algorithms in the undersampled case. It is experimentally shown that some two-dimensional LDA may perform better than Fisherface and some other traditional Fisher's LDA approaches in some cases, but R-LDA always performs better. However, estimation of the regularized parameter in R-LDA is hard. Though cross-validation (CV) is popularly used, it is time consuming. Moreover, it is still hard to fully interpret the impact of this regularized term.

From the geometrical view, Fisher's LDA makes different class means scatter and data of the same class close to their corresponding class means. However, since the number of samples for each class is always limited in some applications such as biometric learning, the estimates of class means are not accurate, and this would degrade the power of Fisher criterion. To specify this problem, we first re-visit the derivation of Fisher's LDA. Consider the classification problem of L classes C1,…,CL. Suppose the data space X (n) is a compact vector space and {(x11,y11),,(xN11,yN11),,(x1L,y1L),,(xNLL,yNLL)} is a set of finite samples. All data x11,,xN11,,x1L,,xNLL are iid, and xik (∈X) denotes the ith sample of class Ck with class label yik (i.e., yik=Ck) and Nk is the number of samples of class Ck. The empirical mean of each class is then given by u^k=1Nki=1Nkxik and the total sample mean is given by u^=k=1LNkNu^k, where N=k=1LNk is the number of total training samples. The goal of LDA under Fisher criterion is to find an optimal projection matrix by optimizing the following Eq. (1):W^opt=argmaxWtrace(WTS^bW)/trace(WTS^wW),where S^b and S^w are between-class covariance (scatter) matrix and within-class covariance (scatter) matrix, respectively, defined as follows:S^b=k=1LNkN(u^k-u^)(u^k-u^)T,S^w=k=1LNkNS^k,S^k=i=1Nk1Nk(xik-u^k)(xik-u^k)T.It has been proved in [22] that Eq. (2) could be written equivalently as follows:S^b=12k=1Lj=1LNkN×NjN(u^k-u^j)(u^k-u^j)T.For formulation of Fisher's LDA, two basic assumptions are always used. First, the class distribution is assumed to be Gaussian. Second, the class empirical mean is in practice used to approximate its expectation. Although Fisher's LDA has been getting its attraction for more than thirty years, as far as we know, there is little research work addressing the second assumption and investigating the effect of the difference between the class empirical mean and its expectation value in Fisher criterion. As we know, u^k is the estimate of Ex|Ck[x] based on the maximum likelihood criterion, where Ex|Ck[x] is the expectation of class Ck. The substitution of expectation Ex|Ck[x] with its empirical mean u^k is based on the assumption that the sample size for estimation is large enough to reflect the data distribution of each class. Unfortunately, this assumption is not always true in some applications, especially the biometric learning. Hence the impact of the difference between those two terms should not be ignored.

In view of this, this paper will study the effect of the difference between the class empirical mean and its expectation in Fisher criterion. We note that such difference is almost impossible to be specified, since Ex|Ck[x] is usually hard (if not impossible) to be determined. Hence, from the “perturbation” perspective, we introduce the perturbation random vectors to stochastically describe such difference. Based on the proposed perturbation model, we then analyze how perturbation random vectors take effect in Fisher criterion. Finally, perturbation learning will yield new forms of within-class and between-class covariance matrices by integrating some perturbation factors, and therefore a new Fisher's LDA formulation based on these two new estimated covariance matrices is called perturbation LDA (P-LDA). In addition, a semi-perturbation LDA, which gives a novel view to R-LDA, will be finally discussed.

Although there are some related work on covariance matrix estimation for designing classifier such as RDA [23] and its similar work [24], and EDDA [25], however, the objective of P-LDA is different from theirs. RDA and EDDA are not based on Fisher criterion and they are classifiers, while P-LDA is a feature extractor and does not predict class label of any data as output. P-LDA would exact a subspace for dimension reduction but RDA and EDDA do not. Moreover, the perturbation model used in P-LDA has not been considered in RDA and EDDA. Hence the methodology of P-LDA is different from the ones of RDA and EDDA. This paper focuses on Fisher criterion, while classifier analysis is beyond our scope. To the best of our knowledge, there is no similar work addressing Fisher criterion using the proposed perturbation model.

The remainder of this paper is outlined as follows. The proposed P-LDA will be introduced in Section 2. The implementation details will be presented in Section 3. Then P-LDA is evaluated using three synthetic data sets and three large human face data sets in Section 4. Discussions and conclusion of this paper are then given in 5 Discussion, 6 Conclusion, respectively.

Section snippets

P-LDA: a new formulation

The proposed method is developed based on the idea of perturbation analysis. A theoretical analysis is given and a new formulation is proposed by learning the difference between the class empirical mean and its expectation as well as its impact to the estimation of covariance matrices is Fisher criterion. In Section 2.1, we first consider the case when data of each class follow single Gaussian distribution. The theory is then extended to the mixture of Gaussian distribution case and reported in

Estimation of perturbation covariance matrices

For implementation of P-LDA, we need to properly estimate two perturbation covariance matrices SbΔ and SwΔ. Parameter estimation is challenging, since it is always ill-posed [3], [23] due to limited sample size and the curse of high dimensionality. A more robust and tractable way to overcome this problem is to perform some regularized estimation. It is indeed the motivation here. A method will be suggested to implement P-LDA with parameter estimation in an entire PCA subspace without discarding

Experimental results

The proposed P-LDA algorithm will be evaluated by both synthetic data and face image data. Face images are typical biometric data. Always, the number of available face training samples for each class is very small while the data dimensionality is very high.

This section is divided into three parts. The first and second parts report the experiment results on synthetic data and face data, respectively. In the third part, we verify our parameter estimation strategy on high-dimensional face image

Discussion

As shown in the experiment, the number of training samples for each class is really an impact of the performance of P-LDA. In this section, we explore some theoretical properties of P-LDA and the convergence of P-LDA will be shown. We also discuss P-LDA with some related methods.

Conclusion

This paper addresses a fundamental research issue in Fisher criterion—the class empirical mean is equal to its expectation. This is one of the assumptions made in deriving the Fisher's LDA formulation for practical computation. However, in many pattern recognition applications, especially the biometric learning, this assumption may not be true. In view of this, we introduce perturbation random vectors to learn the effect of the difference between the class empirical mean and its expectation in

Acknowledgments

This project was supported by the NSFC (60675016, 60633030), the 973 Program (2006CB303104), NSF of GuangDong (06023194, 2007B030603001) and Earmarked Research Grant HKBU2113/06E from Hong Kong Research Grant Council. The authors would also like to thank the great efforts made by (associate) editor and all reviewers for improvement of this paper.

About the Author—WEI-SHI ZHENG was born in Canton (Guangzhou), China, in 1981. He has recently received his Ph.D. degree in Applied Mathematics at Sun Yat-Sen University in China. He joined Queen Mary, University of London as a postdoctoral research assistant in August 2008. He is now working on the European SAMURAI Research Project with Prof. Gong Shaogang and Dr. Xiang Tao. Prior to that, he received his B.S. degree in both mathematics and computer science at Sun Yat-sen University in 2003.

References (39)

  • J.R. Price et al.

    Face recognition using direct, weighted linear discriminant analysis and modular subspaces

    Pattern Recognition

    (2005)
  • J. Yang et al.

    Why can LDA be performed in PCA transformed space?

    Pattern Recognition

    (2003)
  • R.A. Fisher

    The statistical utilization of multiple measurements

    Ann. Eugen.

    (1938)
  • D.L. Swets et al.

    Using discriminant eigenfeatures for image retrieval

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1996)
  • A.R. Webb

    Statistical Pattern Recognition

    (2002)
  • W. Zhao et al.

    Face recognition: a literature survey

    ACM Comput. Surv.

    (2003)
  • P.N. Belhumeur et al.

    Eigenfaces vs. Fisherfaces: recognition using class specific linear projection

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1997)
  • H. Cevikalp et al.

    Discriminative common vectors for face recognition

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2005)
  • R. Huang, Q. Liu, H. Lu, S. Ma, Solving the small sample size problem in LDA, ICPR...
  • Cited by (0)

    About the Author—WEI-SHI ZHENG was born in Canton (Guangzhou), China, in 1981. He has recently received his Ph.D. degree in Applied Mathematics at Sun Yat-Sen University in China. He joined Queen Mary, University of London as a postdoctoral research assistant in August 2008. He is now working on the European SAMURAI Research Project with Prof. Gong Shaogang and Dr. Xiang Tao. Prior to that, he received his B.S. degree in both mathematics and computer science at Sun Yat-sen University in 2003. From April 2006 to October 2006, he was a visiting student working with Prof. Li Stan Z. at the Center for Biometrics and Security Research & National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences. In 2007, he was an exchanged research student working with Prof. Yuen Pong C. at Hong Kong Baptist University from May 16 to November 15. Dr. Zheng is a member of the IEEE. His current research interests are in object categorization and semi-supervised learning; he is also interested in techniques for discriminant/sparse feature extraction and dimension reduction, kernel methods in machine learning, and face image analysis..

    About the Author—J.H. LAI was born in 1964. He received the M.Sc. degree in applied mathematics in 1989 and the Ph.D. degree in mathematics in 1999 from Sun Yat-sen University, Guangzhou, China. He joined Sun Yat-sen University in 1989, where currently, he is a Professor with the Department of Electronics and Communication Engineering, School of Information Science and Technology. He has published over 50 papers in the international journals, book chapters, and conferences. His current research interests are in the areas of digital image processing, pattern recognition, multimedia communication, wavelets and their applications. Dr. Lai had successfully organized the International Conference on Advances in Biometric Personal Authentication’ 2004, which was also the Fifth Chinese Conference on Biometric Recognition (Sinobiometrics’04), Guangzhou, in December 2004. He has taken charge of more than four research projects, including NSFC (number 60144001, 60 373 082, 60675016), the Key (Key grant) Project of Chinese Ministry of Education (number 105 134), and NSF of Guangdong, China (number 021 766, 06023194). Dr. Lai has published over 60 papers and he serves as a board member of the Image and Graphics Association of China and also serves as a board member and secretary-general of the Image and Graphics Association of Guangdong.

    About the Author—PONG C YUEN received his B.Sc. degree in Electronic Engineering with first class honours in 1989 from City Polytechnic of Hong Kong, and his Ph.D. degree in Electrical and Electronic Engineering in 1993 from The University of Hong Kong. He joined the Department of Computer Science, Hong Kong Baptist University in 1993 as an Assistant Professor and currently is a Professor.

    Dr. Yuen was a recipient of the University Fellowship to visit The University of Sydney in 1996. He was associated with the Laboratory of Imaging Science and Engineering, Department of Electrical Engineering and worked with Prof. Hong Yan. In 1998, Dr. Yuen spent a six-month sabbatical leave in the University of Maryland Institute for Advanced Computer Studies (UMIACS), University of Maryland at college park. He was associated with the Computer Vision Laboratory, CFAR and worked with Prof. Larry Davis. From June 2005 to January 2006, he was a visiting professor in GRAVIR laboratory (GRAphics, VIsion and Robotics) of INRIA Rhone Alpes, France. He was associated with PRIMA Group and work with Prof. James Crowley. Dr. Yuen was the director of Croucher Advanced Study Institute (ASI) on biometric authentication in 2004 and was the director of Croucher ASI on Biometric Security and Privacy in 2007. Dr. Yuen has been actively involved in many international conferences as an organizing committee and/or technical program committee member. Recently, he was the track co-chair of International Conference on Pattern Recognition 2006. Dr. Yuen is an editorial board member of Pattern Recognition.

    Dr. Yuen's current research interests include human face processing and recognition, biometric security and privacy, context modeling and learning for human activity recognition.

    About the Author—STAN Z. LI received his Ph.D. degree from Surrey University, UK. He is currently a professor at the National Laboratory of Pattern Recognition (NLPR), the director of Center for Biometrics and Security Research (CBSR), Institute of Automation, Chinese Academy of Sciences (CASIA); and co-director of Joint Laboratory for Intelligent Surveillance and Identification in Civil Aviation (CASIA-CAUC). He worked at Microsoft Research Asia as a researcher from 2000 to 2004. Prior to that, he was an associate professor at Nanyang Technological University, Singapore. His research interest includes pattern recognition and machine learning, image and vision processing, face recognition, biometrics, and intelligent video surveillance. He has published over 200 papers in international journals and conferences, and authored and edited 5 books including “Markov Random Field Modeling in Image Analysis” (Springer, 1st edition 1995, 2nd edition 2001, 3rd edition 2008). He is currently an associate editor of IEEE Transactions on Pattern Analysis and Machine Intelligence and has been actively participating in organizing a number of international conferences and workshops in the fields of his research interest. Stan Z. Li is an expert in face recognition, biometrics and intelligent video surveillance. The Eye-CU face recognition system he developed at Microsoft Research Asia was demonstrated by Bill Gate on a CNN interview. He has been leading several national and international collaboration projects in biometrics and intelligent video surveillance. The AuthenMetric face recognition system and intelligent video surveillance system have been deployed in many applications. He acted as the program chair for the Asian Biometrics Forum 2006 and a co-chair for the International Conference on Biometrics 2007 and 2009. He delivered a speech on Biometrics in China, on behalf of the China National Body, at the 2006 ISO/IEC JTC1 SC37 meeting in London. He co-edited Handbook of Face Recognition (Springer, 2005), and is acting as the editor-in-chief for the Encyclopedia of Biometrics (Springer Reference Work, to be published in 2009).

    View full text