Elsevier

Neurocomputing

Volume 73, Issues 13–15, August 2010, Pages 2571-2579
Neurocomputing

Linear discriminant analysis using rotational invariant L1 norm

https://doi.org/10.1016/j.neucom.2010.05.016Get rights and content

Abstract

Linear discriminant analysis (LDA) is a well-known scheme for supervised subspace learning. It has been widely used in the applications of computer vision and pattern recognition. However, an intrinsic limitation of LDA is the sensitivity to the presence of outliers, due to using the Frobenius norm to measure the inter-class and intra-class distances. In this paper, we propose a novel rotational invariant L1 norm (i.e., R1 norm) based discriminant criterion (referred to as DCL1), which better characterizes the intra-class compactness and the inter-class separability by using the rotational invariant L1 norm instead of the Frobenius norm. Based on the DCL1, three subspace learning algorithms (i.e., 1DL1, 2DL1, and TDL1) are developed for vector-based, matrix-based, and tensor-based representations of data, respectively. They are capable of reducing the influence of outliers substantially, resulting in a robust classification. Theoretical analysis and experimental evaluations demonstrate the promise and effectiveness of the proposed DCL1 and its algorithms.

Introduction

In recent years, linear discriminant analysis (LDA) plays an important role in supervised learning with many successful applications of computer vision and pattern recognition. By maximizing the ratio of the inter-class distance to the intra-class distance, LDA aims to find a linear transformation to achieve the maximum class discrimination. Many variations of LDA with different properties have been proposed for discriminant subspace learning. The classical LDA [1], [2] tries to find an optimal discriminant subspace (spanned by the column vectors of a projection matrix) to maximize the inter-class separability and the intra-class compactness of the data samples in a low-dimensional vector space. In general, the optimal discriminant subspace can be obtained by performing the generalized eigenvalue decomposition on the inter-class and the intra-class scatter matrices. However, an intrinsic limitation of the classical LDA is that one of the scatter matrices is required to be nonsingular. Unfortunately, the dimension of the feature space is typically much larger than the size of the training set in many applications (e.g., face recognition), resulting in the singularity of one of the scatter matrices. This is well-known as the undersampled problem (USP). In order to address the USP, Fukunaga [3] proposes a regularization method (RM) which adds perturbations to the diagonal entries of the scatter matrices. But the solution obtained by RM is not optimal. In recent years, many algorithms have been developed to deal with the USP, including the direct linear discriminant analysis (DLDA) [5] and the null-space linear discriminant analysis (NLDA) [4]. NLDA extracts discriminant information from the null space of the intra-class scatter matrix. In comparison, DLDA extracts the discriminant information from the null space of the intra-class scatter matrix after discarding the null space of the inter-class scatter matrix. However, NLDA and DLDA may lose discriminant information which may be useful for classification. To fully utilize all the discriminant information reflected by the intra-class and inter-class scatter matrices, Wang and Tang [6] propose a dual-space LDA approach to make full use of the discriminative information in the feature space. Another approach to address the USP is to use PCA+LDA [7], [8] to extract the discriminant information (i.e., the data are pre-processed by PCA before LDA). However, PCA+LDA may lose important discriminant information in the stage of PCA.

More recent LDA algorithms work with higher-order tensor representations. Ye et al. [9] propose a novel LDA algorithm (i.e., 2DLDA) which works with the matrix-based data representation. Also in [9], 2DLDA+LDA is proposed for further dimension reduction by 2DLDA before LDA. Similar to [9], Li and Yuan [18] use image matrices directly instead of vectors for discriminant analysis. Xu et al. [19] propose a novel algorithm (i.e., Concurrent Subspaces Analysis) for dimension reduction by encoding images as 2nd or even higher order tensors. Vasilescu and Terzopoulos [15] apply multilinear subspace analysis to construct a compact representation of facial image ensembles factorized by different faces, expressions, viewpoints, and illuminations. Lei et al. [14] propose a novel face recognition algorithm based on discriminant analysis with a Gabor tensor representation. He et al. [11] present a tensor-based algorithm (i.e., tensor subspace analysis) for detecting the underlying nonlinear face manifold structure in the manner of tensor subspace learning. Yan et al. [10] and Tao et al. [13] propose their own subspace learning algorithms (i.e., DATER [10] and GTDA [13]) for discriminant analysis with tensor representations. Wang et al. [12] propose a convergent solution procedure for general tensor-based subspace analysis. Essentially, the aforementioned tensor-based LDA approaches perform well in uncovering the underlying data structures. As a result, they are able to handle the undersampled problem (USP) effectively.

However, all the aforementioned LDA approaches utilize the Frobenius norm to measure the inter-class and intra-class distances. In this case, their training processes may be dominated by outliers since the inter-class or intra-class distance is determined by the sum of squared distances. To reduce the influence of outliers, we propose a novel rotational invariant L1 norm (referred to as R1 norm [16], [17]) based discriminant criterion called DCL1 for robust discriminant analysis. Further, we develop three DCL1-based discriminant algorithms (i.e., 1DL1, 2DL1, and TDL1) for vector-based, matrix-based, and tensor-based representations of data, respectively. In contrast to the classical LDA [1], 2DLDA [9], and DATER [10], the developed 1DL1, 2DL1, and TDL1 can reduce the influence of outliers substantially.

Pang et al. [20] propose a L1-norm-based tensor analysis (TPCA-L1) algorithm which is robust to outliers. Compared to conventional tensor analysis algorithms, TPCA-L1 is more efficient due to its eigendecomposition-free property. Zhou and Tao [21] present a gender recognition algorithm called manifold elastic net (MEN). The algorithm can obtain a sparse solution to supervised subspace learning by using L1 manifold regularization. Especially in the cases of small training sets and lower-dimensional subspaces, it achieves better classification performances against traditional subspace learning algorithms. Pang and Yuan [22] develop an outlier-resiting graph embedding framework (referred to as LPP-L1) for subspace learning. The framework is not only robust to outliers, but also performs well in handling the USP. Zhang et al. [23] propose a discriminative locality alignment (DLA) algorithm for subspace learning. It takes advantage of discriminative subspace selection for distinguishing the dimension reduction contribution of each sample, and preserves discriminative information over local patches of each sample to avoid the USP. Liu et al. [24] make a semi-supervised extension of linear dimension reduction algorithm called transductive component analysis (TCA) and orthogonal transductive component analysis (OTCA), which leverage the intra-class smoothness and the inter-class separability by building two sorts of regularized graphs. Tao et al. [25] propose three criteria for subspace selection. As for the c-class classification task, these three criteria is able to effectively stop the merging of nearby classes in the projection to a subspace of the feature space if the dimension of the projected subspace is strictly lower than c−1. Tao et al. [26] incorporate tensor representation into existing supervised learning algorithms, and present a supervised tensor learning (STL) framework to overcome the USP. Furthermore, several convex optimization techniques and multilinear operations are used to solve the STL problem.

The remainder of the paper is organized as follows. In Section 2, the Frobenius and R1 norms are briefly reviewed. In Section 3, a brief introduction to Linear Discriminant Analysis using the Frobenius norm is given. In Section 4, the details of the proposed DCL1 and its algorithms (1DL1, 2DL1, and TDL1) are described. Experimental results are reported in Section 5. The paper is concluded in Section 6.

Section snippets

Frobenius and R1 norms

Given K data samples X={χk}k=1K with χk=(xd1d2dnk)D1×D2×Dn, the Frobenius norm is defined asX=k=1Kd1=1D1d2=1D2dn=1Dn(xd1d2dnk)2=k=1Kχk2.The rotational invariant L1 norm (i.e., R1 norm) is defined asXR1=k=1Kd1=1D1d2=1D2dn=1Dn(xd1d2dnk)2=k=1Kχk2=k=1Kχk.When n=1, the above norms are vector-based; when n=2, they are matrix-based; otherwise, they are tensor-based. In the Euclidean space, the Frobenius norm has a fundamental property—rotational invariance. In comparison,

The classical LDA

Given the L-class training samples D={{yi}i=1N}=1L with yiRD×1 and N==1LN, the classical LDA [1], [2] aims to find a linear transformation URD×ζ which embeds the original D-dimensional vector yi into the ζ-dimensional vector space U such that ζ<D. Let Tr(·) be the trace of its matrix argument, SbU be the inter-class scatter matrix in U, and SwU be the intra-class scatter matrix in U. Thus, the inter-class and intra-class distances in U are, respectively, measured by Tr(SbU) and Tr(SwU)

R1 norm based discriminant criterion (DCL1)

In the classical LDA, 2DLDA, and DATER, the Frobenius norm is applied to characterize the inter-class separability and intra-class compactness. Due to its sensitivity to outliers, the Frobenius norm is incompetent for robust discriminant analysis. In order to address this problem, we propose a novel R1 norm based discriminant criterion called DCL1, which uses the R1 norm to replace the Frobenius norm as the cost function. As a result, the proposed DCL1 is less sensitive to outliers. The details

Experiments

In order to evaluate the performances of the proposed algorithms, five datasets are used in the experiments. The first dataset is a toy set composed of ten samples categorized into two classes with an additional outlier sample. The second dataset is the 20 Newsgroups text dataset,1 which consists of 18 941 documents from 20 classes. To efficiently make classification performance evaluations, we randomly split this text dataset into

Conclusion

In this paper, we have proposed a novel discriminant criterion called DCL1 that better characterizes the intra-class compactness and the inter-class separability by using the R1 norm instead of the Frobenius norm. Based on the DCL1, three subspace learning algorithms (1DL1, 2DL1, and TDL1) have been developed for the vector-based, matrix-based, tensor-based representations of data, respectively. Compared with the classical LDA [1], 2DLDA [9], and DATER [10], the developed 1DL1, 2DL1, and TDL1

Xi Li received the B.Sc. degree in Communication Engineering from Beihang University, Beijing, China, in 2004. In 2009, he got his Doctoral degree from the National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China. He is now a Postdoctoral Researcher in CNRS, Telecom ParisTech, Paris, France. His research interests include computer vision, pattern recognition, and machine learning.

References (27)

  • H. Yu et al.

    A direct LDA algorithm for high-dimensional data with application to face recognition

    Pattern Recognition

    (2001)
  • M. Li et al.

    2d-lda: a novel statistical linear discriminant analysis for image matrix

    Pattern Recognition Letters

    (2005)
  • Y. Pang et al.

    Outlier-resisting graph embedding

    Neurocomputing

    (2010)
  • R.O. Duda et al.

    Pattern Classification

    (2000)
  • J.M. Geoffrey

    Discriminant Analysis and Statistical Pattern Recognition

    (1992)
  • K. Fukunaga

    Introduction to Statistical Pattern Recognition, second ed.

    (1990)
  • F. Chen et al.

    A new LDA-based face recognition system which can solve the small sample size problem

    Pattern Recognition

    (2000)
  • X. Wang, X. Tang, Dual-space linear discriminant analysis for face recognition, in: Proceedings of the CVPR, vol. 2,...
  • P. Belhumeur et al.

    Eigenfaces vs. fisherfaces: recognition using class specific linear projection

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (1997)
  • D.L. Swets et al.

    Using discriminant eigenfeatures for image retrieval

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (1996)
  • J. Ye, R. Janardan, Q. Li, Two-dimensional linear discriminant analysis, in: NIPS, vol. 2, 2004, pp....
  • S. Yan, D. Xu, Q. Yang, L. Zhang, X. Tang, H. Zhang, Discriminant analysis with tensor representation, in: Proceedings...
  • X. He et al.

    Tensor subspace analysis

  • Cited by (73)

    View all citing articles on Scopus

    Xi Li received the B.Sc. degree in Communication Engineering from Beihang University, Beijing, China, in 2004. In 2009, he got his Doctoral degree from the National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China. He is now a Postdoctoral Researcher in CNRS, Telecom ParisTech, Paris, France. His research interests include computer vision, pattern recognition, and machine learning.

    Weiming Hu received the Ph.D. degree from the Department of Computer Science and Engineering, Zhejiang University. From April 1998 to March 2000, he was a Postdoctoral Research Fellow with the Institute of Computer Science and Technology, Founder Research and Design Center, Peking University. Since April 2000, he has been with the National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences. Now he is a Professor and a Ph.D. Student Supervisor in the laboratory. In 2007, he became an IEEE Senior Member and an Associate Editor for IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics. His research interests are in video information processing and network information security recognition. He has published more than 90 papers on national and international journals, and international conferences.

    Hanzi Wang is currently a Senior Research Fellow in the Department of Computer Science, the University of Adelaide, Australia. He was an Assistant Research Scientist (2007–2008) and a Postdoctor (2006–2007) at the Johns Hopkins University, and a Research Fellow at Monash University, Australia (2004–2006). He received the Ph.D. degree in Computer Vision from Monash University. He has been awarded the Douglas Lampard Electrical Engineering Research Prize and Medal for the best Ph.D. thesis in the Department. His research interests are concentrated on computer vision and pattern recognition including visual tracking, robust statistics, video segmentation, model fitting, optical flow calculation, fundamental matrix, image segmentation and related fields. He is a Senior Member of the IEEE and he was listed in Who's Who in Science and Engineering and Who's Who in the World.

    Zhongfei (Mark) Zhang received the B.S. degree in Electronics Engineering (with honors) and the M.S. degree in Information Sciences from Zhejiang University, China, and the Ph.D. degree in Computer Science from the University of Massachusetts at Amherst. He is currently an Associate Professor of Computer Science at the Computer Science Department, State University of New York (SUNY) at Binghamton. He was on the Faculty of Computer Science and Engineering Department, and a Research Scientist at the Center of Excellence for Document Analysis and Recognition, both at SUNY Buffalo. His research interests include multimedia information indexing and retrieval, data mining and knowledge discovery, computer vision and image understanding, pattern recognition, as well as bioinformatics. His research is sponsored by the National Science Foundation, AirForce Office of Scientific Research, the Air Force Research Laboratory, and the New York State Government, as well as private industries, including Microsoft and Kodak. He has served as a reviewer/PC member for many conferences and journals, as well as a grant review panelist for governmental and private funding agencies. He has also served as a technical consultant for a number of industrial and governmental organizations. He was an Air Force Research Laboratory Faculty Visiting Fellow and a Microsoft Research Visiting Researcher.

    Dr. Zhang is a recipient of the U.S. National Academies/National Research Council Visiting Fellow and he was the Western New York 2004 Inventor of the Year Individual Category 2nd Place. He won the SUNY Chancellor's Promising Inventor Award and the JSPS International Collaboration Award.

    1

    The author has moved to CNRS, TELECOM ParisTech, France.

    View full text