Linear discriminant analysis using rotational invariant L1 norm
Introduction
In recent years, linear discriminant analysis (LDA) plays an important role in supervised learning with many successful applications of computer vision and pattern recognition. By maximizing the ratio of the inter-class distance to the intra-class distance, LDA aims to find a linear transformation to achieve the maximum class discrimination. Many variations of LDA with different properties have been proposed for discriminant subspace learning. The classical LDA [1], [2] tries to find an optimal discriminant subspace (spanned by the column vectors of a projection matrix) to maximize the inter-class separability and the intra-class compactness of the data samples in a low-dimensional vector space. In general, the optimal discriminant subspace can be obtained by performing the generalized eigenvalue decomposition on the inter-class and the intra-class scatter matrices. However, an intrinsic limitation of the classical LDA is that one of the scatter matrices is required to be nonsingular. Unfortunately, the dimension of the feature space is typically much larger than the size of the training set in many applications (e.g., face recognition), resulting in the singularity of one of the scatter matrices. This is well-known as the undersampled problem (USP). In order to address the USP, Fukunaga [3] proposes a regularization method (RM) which adds perturbations to the diagonal entries of the scatter matrices. But the solution obtained by RM is not optimal. In recent years, many algorithms have been developed to deal with the USP, including the direct linear discriminant analysis (DLDA) [5] and the null-space linear discriminant analysis (NLDA) [4]. NLDA extracts discriminant information from the null space of the intra-class scatter matrix. In comparison, DLDA extracts the discriminant information from the null space of the intra-class scatter matrix after discarding the null space of the inter-class scatter matrix. However, NLDA and DLDA may lose discriminant information which may be useful for classification. To fully utilize all the discriminant information reflected by the intra-class and inter-class scatter matrices, Wang and Tang [6] propose a dual-space LDA approach to make full use of the discriminative information in the feature space. Another approach to address the USP is to use PCA+LDA [7], [8] to extract the discriminant information (i.e., the data are pre-processed by PCA before LDA). However, PCA+LDA may lose important discriminant information in the stage of PCA.
More recent LDA algorithms work with higher-order tensor representations. Ye et al. [9] propose a novel LDA algorithm (i.e., 2DLDA) which works with the matrix-based data representation. Also in [9], 2DLDA+LDA is proposed for further dimension reduction by 2DLDA before LDA. Similar to [9], Li and Yuan [18] use image matrices directly instead of vectors for discriminant analysis. Xu et al. [19] propose a novel algorithm (i.e., Concurrent Subspaces Analysis) for dimension reduction by encoding images as 2nd or even higher order tensors. Vasilescu and Terzopoulos [15] apply multilinear subspace analysis to construct a compact representation of facial image ensembles factorized by different faces, expressions, viewpoints, and illuminations. Lei et al. [14] propose a novel face recognition algorithm based on discriminant analysis with a Gabor tensor representation. He et al. [11] present a tensor-based algorithm (i.e., tensor subspace analysis) for detecting the underlying nonlinear face manifold structure in the manner of tensor subspace learning. Yan et al. [10] and Tao et al. [13] propose their own subspace learning algorithms (i.e., DATER [10] and GTDA [13]) for discriminant analysis with tensor representations. Wang et al. [12] propose a convergent solution procedure for general tensor-based subspace analysis. Essentially, the aforementioned tensor-based LDA approaches perform well in uncovering the underlying data structures. As a result, they are able to handle the undersampled problem (USP) effectively.
However, all the aforementioned LDA approaches utilize the Frobenius norm to measure the inter-class and intra-class distances. In this case, their training processes may be dominated by outliers since the inter-class or intra-class distance is determined by the sum of squared distances. To reduce the influence of outliers, we propose a novel rotational invariant L1 norm (referred to as R1 norm [16], [17]) based discriminant criterion called DCL1 for robust discriminant analysis. Further, we develop three DCL1-based discriminant algorithms (i.e., 1DL1, 2DL1, and TDL1) for vector-based, matrix-based, and tensor-based representations of data, respectively. In contrast to the classical LDA [1], 2DLDA [9], and DATER [10], the developed 1DL1, 2DL1, and TDL1 can reduce the influence of outliers substantially.
Pang et al. [20] propose a L1-norm-based tensor analysis (TPCA-L1) algorithm which is robust to outliers. Compared to conventional tensor analysis algorithms, TPCA-L1 is more efficient due to its eigendecomposition-free property. Zhou and Tao [21] present a gender recognition algorithm called manifold elastic net (MEN). The algorithm can obtain a sparse solution to supervised subspace learning by using L1 manifold regularization. Especially in the cases of small training sets and lower-dimensional subspaces, it achieves better classification performances against traditional subspace learning algorithms. Pang and Yuan [22] develop an outlier-resiting graph embedding framework (referred to as LPP-L1) for subspace learning. The framework is not only robust to outliers, but also performs well in handling the USP. Zhang et al. [23] propose a discriminative locality alignment (DLA) algorithm for subspace learning. It takes advantage of discriminative subspace selection for distinguishing the dimension reduction contribution of each sample, and preserves discriminative information over local patches of each sample to avoid the USP. Liu et al. [24] make a semi-supervised extension of linear dimension reduction algorithm called transductive component analysis (TCA) and orthogonal transductive component analysis (OTCA), which leverage the intra-class smoothness and the inter-class separability by building two sorts of regularized graphs. Tao et al. [25] propose three criteria for subspace selection. As for the c-class classification task, these three criteria is able to effectively stop the merging of nearby classes in the projection to a subspace of the feature space if the dimension of the projected subspace is strictly lower than c−1. Tao et al. [26] incorporate tensor representation into existing supervised learning algorithms, and present a supervised tensor learning (STL) framework to overcome the USP. Furthermore, several convex optimization techniques and multilinear operations are used to solve the STL problem.
The remainder of the paper is organized as follows. In Section 2, the Frobenius and R1 norms are briefly reviewed. In Section 3, a brief introduction to Linear Discriminant Analysis using the Frobenius norm is given. In Section 4, the details of the proposed DCL1 and its algorithms (1DL1, 2DL1, and TDL1) are described. Experimental results are reported in Section 5. The paper is concluded in Section 6.
Section snippets
Frobenius and R1 norms
Given K data samples with , the Frobenius norm is defined asThe rotational invariant L1 norm (i.e., R1 norm) is defined asWhen n=1, the above norms are vector-based; when n=2, they are matrix-based; otherwise, they are tensor-based. In the Euclidean space, the Frobenius norm has a fundamental property—rotational invariance. In comparison,
The classical LDA
Given the L-class training samples with and , the classical LDA [1], [2] aims to find a linear transformation which embeds the original D-dimensional vector into the -dimensional vector space such that . Let be the trace of its matrix argument, be the inter-class scatter matrix in , and be the intra-class scatter matrix in . Thus, the inter-class and intra-class distances in are, respectively, measured by and
R1 norm based discriminant criterion (DCL1)
In the classical LDA, 2DLDA, and DATER, the Frobenius norm is applied to characterize the inter-class separability and intra-class compactness. Due to its sensitivity to outliers, the Frobenius norm is incompetent for robust discriminant analysis. In order to address this problem, we propose a novel R1 norm based discriminant criterion called DCL1, which uses the R1 norm to replace the Frobenius norm as the cost function. As a result, the proposed DCL1 is less sensitive to outliers. The details
Experiments
In order to evaluate the performances of the proposed algorithms, five datasets are used in the experiments. The first dataset is a toy set composed of ten samples categorized into two classes with an additional outlier sample. The second dataset is the 20 Newsgroups text dataset,1 which consists of 18 941 documents from 20 classes. To efficiently make classification performance evaluations, we randomly split this text dataset into
Conclusion
In this paper, we have proposed a novel discriminant criterion called DCL1 that better characterizes the intra-class compactness and the inter-class separability by using the R1 norm instead of the Frobenius norm. Based on the DCL1, three subspace learning algorithms (1DL1, 2DL1, and TDL1) have been developed for the vector-based, matrix-based, tensor-based representations of data, respectively. Compared with the classical LDA [1], 2DLDA [9], and DATER [10], the developed 1DL1, 2DL1, and TDL1
Xi Li received the B.Sc. degree in Communication Engineering from Beihang University, Beijing, China, in 2004. In 2009, he got his Doctoral degree from the National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China. He is now a Postdoctoral Researcher in CNRS, Telecom ParisTech, Paris, France. His research interests include computer vision, pattern recognition, and machine learning.
References (27)
- et al.
A direct LDA algorithm for high-dimensional data with application to face recognition
Pattern Recognition
(2001) - et al.
2d-lda: a novel statistical linear discriminant analysis for image matrix
Pattern Recognition Letters
(2005) - et al.
Outlier-resisting graph embedding
Neurocomputing
(2010) - et al.
Pattern Classification
(2000) Discriminant Analysis and Statistical Pattern Recognition
(1992)Introduction to Statistical Pattern Recognition, second ed.
(1990)- et al.
A new LDA-based face recognition system which can solve the small sample size problem
Pattern Recognition
(2000) - X. Wang, X. Tang, Dual-space linear discriminant analysis for face recognition, in: Proceedings of the CVPR, vol. 2,...
- et al.
Eigenfaces vs. fisherfaces: recognition using class specific linear projection
IEEE Transactions on Pattern Analysis and Machine Intelligence
(1997) - et al.
Using discriminant eigenfeatures for image retrieval
IEEE Transactions on Pattern Analysis and Machine Intelligence
(1996)
Tensor subspace analysis
Cited by (73)
Multi-view robust regression for feature extraction
2024, Pattern RecognitionGeneralized two-dimensional linear discriminant analysis with regularization
2021, Neural NetworksKronecker-decomposable robust probabilistic tensor discriminant analysis
2021, Information SciencesSelf-centralized jointly sparse maximum margin criterion for robust dimensionality reduction
2020, Knowledge-Based SystemsRobust Bhattacharyya bound linear discriminant analysis through an adaptive algorithm
2019, Knowledge-Based Systems
Xi Li received the B.Sc. degree in Communication Engineering from Beihang University, Beijing, China, in 2004. In 2009, he got his Doctoral degree from the National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China. He is now a Postdoctoral Researcher in CNRS, Telecom ParisTech, Paris, France. His research interests include computer vision, pattern recognition, and machine learning.
Weiming Hu received the Ph.D. degree from the Department of Computer Science and Engineering, Zhejiang University. From April 1998 to March 2000, he was a Postdoctoral Research Fellow with the Institute of Computer Science and Technology, Founder Research and Design Center, Peking University. Since April 2000, he has been with the National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences. Now he is a Professor and a Ph.D. Student Supervisor in the laboratory. In 2007, he became an IEEE Senior Member and an Associate Editor for IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics. His research interests are in video information processing and network information security recognition. He has published more than 90 papers on national and international journals, and international conferences.
Hanzi Wang is currently a Senior Research Fellow in the Department of Computer Science, the University of Adelaide, Australia. He was an Assistant Research Scientist (2007–2008) and a Postdoctor (2006–2007) at the Johns Hopkins University, and a Research Fellow at Monash University, Australia (2004–2006). He received the Ph.D. degree in Computer Vision from Monash University. He has been awarded the Douglas Lampard Electrical Engineering Research Prize and Medal for the best Ph.D. thesis in the Department. His research interests are concentrated on computer vision and pattern recognition including visual tracking, robust statistics, video segmentation, model fitting, optical flow calculation, fundamental matrix, image segmentation and related fields. He is a Senior Member of the IEEE and he was listed in Who's Who in Science and Engineering and Who's Who in the World.
Zhongfei (Mark) Zhang received the B.S. degree in Electronics Engineering (with honors) and the M.S. degree in Information Sciences from Zhejiang University, China, and the Ph.D. degree in Computer Science from the University of Massachusetts at Amherst. He is currently an Associate Professor of Computer Science at the Computer Science Department, State University of New York (SUNY) at Binghamton. He was on the Faculty of Computer Science and Engineering Department, and a Research Scientist at the Center of Excellence for Document Analysis and Recognition, both at SUNY Buffalo. His research interests include multimedia information indexing and retrieval, data mining and knowledge discovery, computer vision and image understanding, pattern recognition, as well as bioinformatics. His research is sponsored by the National Science Foundation, AirForce Office of Scientific Research, the Air Force Research Laboratory, and the New York State Government, as well as private industries, including Microsoft and Kodak. He has served as a reviewer/PC member for many conferences and journals, as well as a grant review panelist for governmental and private funding agencies. He has also served as a technical consultant for a number of industrial and governmental organizations. He was an Air Force Research Laboratory Faculty Visiting Fellow and a Microsoft Research Visiting Researcher.
Dr. Zhang is a recipient of the U.S. National Academies/National Research Council Visiting Fellow and he was the Western New York 2004 Inventor of the Year Individual Category 2nd Place. He won the SUNY Chancellor's Promising Inventor Award and the JSPS International Collaboration Award.
- 1
The author has moved to CNRS, TELECOM ParisTech, France.