Linear discriminant analysis using rotational invariant L1 norm

doi:10.1016/j.neucom.2010.05.016

Neurocomputing

Volume 73, Issues 13–15, August 2010, Pages 2571-2579

https://doi.org/10.1016/j.neucom.2010.05.016 Get rights and content

Abstract

Linear discriminant analysis (LDA) is a well-known scheme for supervised subspace learning. It has been widely used in the applications of computer vision and pattern recognition. However, an intrinsic limitation of LDA is the sensitivity to the presence of outliers, due to using the Frobenius norm to measure the inter-class and intra-class distances. In this paper, we propose a novel rotational invariant L₁ norm (i.e., R₁ norm) based discriminant criterion (referred to as DCL₁), which better characterizes the intra-class compactness and the inter-class separability by using the rotational invariant L₁ norm instead of the Frobenius norm. Based on the DCL₁, three subspace learning algorithms (i.e., 1DL₁, 2DL₁, and TDL₁) are developed for vector-based, matrix-based, and tensor-based representations of data, respectively. They are capable of reducing the influence of outliers substantially, resulting in a robust classification. Theoretical analysis and experimental evaluations demonstrate the promise and effectiveness of the proposed DCL₁ and its algorithms.

Introduction

In recent years, linear discriminant analysis (LDA) plays an important role in supervised learning with many successful applications of computer vision and pattern recognition. By maximizing the ratio of the inter-class distance to the intra-class distance, LDA aims to find a linear transformation to achieve the maximum class discrimination. Many variations of LDA with different properties have been proposed for discriminant subspace learning. The classical LDA [1], [2] tries to find an optimal discriminant subspace (spanned by the column vectors of a projection matrix) to maximize the inter-class separability and the intra-class compactness of the data samples in a low-dimensional vector space. In general, the optimal discriminant subspace can be obtained by performing the generalized eigenvalue decomposition on the inter-class and the intra-class scatter matrices. However, an intrinsic limitation of the classical LDA is that one of the scatter matrices is required to be nonsingular. Unfortunately, the dimension of the feature space is typically much larger than the size of the training set in many applications (e.g., face recognition), resulting in the singularity of one of the scatter matrices. This is well-known as the undersampled problem (USP). In order to address the USP, Fukunaga [3] proposes a regularization method (RM) which adds perturbations to the diagonal entries of the scatter matrices. But the solution obtained by RM is not optimal. In recent years, many algorithms have been developed to deal with the USP, including the direct linear discriminant analysis (DLDA) [5] and the null-space linear discriminant analysis (NLDA) [4]. NLDA extracts discriminant information from the null space of the intra-class scatter matrix. In comparison, DLDA extracts the discriminant information from the null space of the intra-class scatter matrix after discarding the null space of the inter-class scatter matrix. However, NLDA and DLDA may lose discriminant information which may be useful for classification. To fully utilize all the discriminant information reflected by the intra-class and inter-class scatter matrices, Wang and Tang [6] propose a dual-space LDA approach to make full use of the discriminative information in the feature space. Another approach to address the USP is to use PCA+LDA [7], [8] to extract the discriminant information (i.e., the data are pre-processed by PCA before LDA). However, PCA+LDA may lose important discriminant information in the stage of PCA.

More recent LDA algorithms work with higher-order tensor representations. Ye et al. [9] propose a novel LDA algorithm (i.e., 2DLDA) which works with the matrix-based data representation. Also in [9], 2DLDA+LDA is proposed for further dimension reduction by 2DLDA before LDA. Similar to [9], Li and Yuan [18] use image matrices directly instead of vectors for discriminant analysis. Xu et al. [19] propose a novel algorithm (i.e., Concurrent Subspaces Analysis) for dimension reduction by encoding images as 2nd or even higher order tensors. Vasilescu and Terzopoulos [15] apply multilinear subspace analysis to construct a compact representation of facial image ensembles factorized by different faces, expressions, viewpoints, and illuminations. Lei et al. [14] propose a novel face recognition algorithm based on discriminant analysis with a Gabor tensor representation. He et al. [11] present a tensor-based algorithm (i.e., tensor subspace analysis) for detecting the underlying nonlinear face manifold structure in the manner of tensor subspace learning. Yan et al. [10] and Tao et al. [13] propose their own subspace learning algorithms (i.e., DATER [10] and GTDA [13]) for discriminant analysis with tensor representations. Wang et al. [12] propose a convergent solution procedure for general tensor-based subspace analysis. Essentially, the aforementioned tensor-based LDA approaches perform well in uncovering the underlying data structures. As a result, they are able to handle the undersampled problem (USP) effectively.

However, all the aforementioned LDA approaches utilize the Frobenius norm to measure the inter-class and intra-class distances. In this case, their training processes may be dominated by outliers since the inter-class or intra-class distance is determined by the sum of squared distances. To reduce the influence of outliers, we propose a novel rotational invariant L₁ norm (referred to as R₁ norm [16], [17]) based discriminant criterion called DCL₁ for robust discriminant analysis. Further, we develop three DCL₁-based discriminant algorithms (i.e., 1DL₁, 2DL₁, and TDL₁) for vector-based, matrix-based, and tensor-based representations of data, respectively. In contrast to the classical LDA [1], 2DLDA [9], and DATER [10], the developed 1DL₁, 2DL₁, and TDL₁ can reduce the influence of outliers substantially.

Pang et al. [20] propose a L₁-norm-based tensor analysis (TPCA-L₁) algorithm which is robust to outliers. Compared to conventional tensor analysis algorithms, TPCA-L₁ is more efficient due to its eigendecomposition-free property. Zhou and Tao [21] present a gender recognition algorithm called manifold elastic net (MEN). The algorithm can obtain a sparse solution to supervised subspace learning by using L₁ manifold regularization. Especially in the cases of small training sets and lower-dimensional subspaces, it achieves better classification performances against traditional subspace learning algorithms. Pang and Yuan [22] develop an outlier-resiting graph embedding framework (referred to as LPP-L₁) for subspace learning. The framework is not only robust to outliers, but also performs well in handling the USP. Zhang et al. [23] propose a discriminative locality alignment (DLA) algorithm for subspace learning. It takes advantage of discriminative subspace selection for distinguishing the dimension reduction contribution of each sample, and preserves discriminative information over local patches of each sample to avoid the USP. Liu et al. [24] make a semi-supervised extension of linear dimension reduction algorithm called transductive component analysis (TCA) and orthogonal transductive component analysis (OTCA), which leverage the intra-class smoothness and the inter-class separability by building two sorts of regularized graphs. Tao et al. [25] propose three criteria for subspace selection. As for the c-class classification task, these three criteria is able to effectively stop the merging of nearby classes in the projection to a subspace of the feature space if the dimension of the projected subspace is strictly lower than c−1. Tao et al. [26] incorporate tensor representation into existing supervised learning algorithms, and present a supervised tensor learning (STL) framework to overcome the USP. Furthermore, several convex optimization techniques and multilinear operations are used to solve the STL problem.

The remainder of the paper is organized as follows. In Section 2, the Frobenius and R₁ norms are briefly reviewed. In Section 3, a brief introduction to Linear Discriminant Analysis using the Frobenius norm is given. In Section 4, the details of the proposed DCL₁ and its algorithms (1DL₁, 2DL₁, and TDL₁) are described. Experimental results are reported in Section 5. The paper is concluded in Section 6.

Section snippets

Frobenius and R₁ norms

Given K data samples $X = {χ_{k}}_{k = 1}^{K}$ with $χ_{k} = (x_{d_{1} d_{2} \dots d_{n}}^{k})_{D_{1} \times D_{2} \dots \times D_{n}}$ , the Frobenius norm is defined as $∥ X ∥ = \sqrt{(\sum_{k = 1}^{K} \sum_{d_{1} = 1}^{D_{1}} \sum_{d_{2} = 1}^{D_{2}} \dots \sum_{d_{n} = 1}^{D_{n}} (x_{d_{1} d_{2} \dots d_{n}}^{k})^{2})} = \sqrt{\sum_{k = 1}^{K} ∥ χ_{k} ∥^{2}} .$ The rotational invariant L₁ norm (i.e., R₁ norm) is defined as $∥ X ∥_{R_{1}} = \sum_{k = 1}^{K} \sqrt{(\sum_{d_{1} = 1}^{D_{1}} \sum_{d_{2} = 1}^{D_{2}} \dots \sum_{d_{n} = 1}^{D_{n}} (x_{d_{1} d_{2} \dots d_{n}}^{k})^{2})} = \sum_{k = 1}^{K} \sqrt{∥ χ_{k} ∥^{2}} = \sum_{k = 1}^{K} ∥ χ_{k} ∥ .$ When n=1, the above norms are vector-based; when n=2, they are matrix-based; otherwise, they are tensor-based. In the Euclidean space, the Frobenius norm has a fundamental property—rotational invariance. In comparison,

The classical LDA

Given the L-class training samples $D = {{y_{i}^{ℓ}}_{i = 1}^{N_{ℓ}}}_{ℓ = 1}^{L}$ with $y_{i}^{ℓ} \in R^{D \times 1}$ and $N = \sum_{ℓ = 1}^{L} N_{ℓ}$ , the classical LDA [1], [2] aims to find a linear transformation $U \in R^{D \times ζ}$ which embeds the original D-dimensional vector $y_{i}^{ℓ}$ into the $ζ$ -dimensional vector space $U$ such that $ζ < D$ . Let $Tr (\cdot)$ be the trace of its matrix argument, $S_{b}^{U}$ be the inter-class scatter matrix in $U$ , and $S_{w}^{U}$ be the intra-class scatter matrix in $U$ . Thus, the inter-class and intra-class distances in $U$ are, respectively, measured by $Tr (S_{b}^{U})$ and $Tr (S_{w}^{U})$

R₁ norm based discriminant criterion (DCL₁)

In the classical LDA, 2DLDA, and DATER, the Frobenius norm is applied to characterize the inter-class separability and intra-class compactness. Due to its sensitivity to outliers, the Frobenius norm is incompetent for robust discriminant analysis. In order to address this problem, we propose a novel R₁ norm based discriminant criterion called DCL₁, which uses the R₁ norm to replace the Frobenius norm as the cost function. As a result, the proposed DCL₁ is less sensitive to outliers. The details

Experiments

In order to evaluate the performances of the proposed algorithms, five datasets are used in the experiments. The first dataset is a toy set composed of ten samples categorized into two classes with an additional outlier sample. The second dataset is the 20 Newsgroups text dataset,¹ which consists of 18 941 documents from 20 classes. To efficiently make classification performance evaluations, we randomly split this text dataset into

Conclusion

In this paper, we have proposed a novel discriminant criterion called DCL₁ that better characterizes the intra-class compactness and the inter-class separability by using the R₁ norm instead of the Frobenius norm. Based on the DCL₁, three subspace learning algorithms (1DL₁, 2DL₁, and TDL₁) have been developed for the vector-based, matrix-based, tensor-based representations of data, respectively. Compared with the classical LDA [1], 2DLDA [9], and DATER [10], the developed 1DL₁, 2DL₁, and TDL₁

Xi Li received the B.Sc. degree in Communication Engineering from Beihang University, Beijing, China, in 2004. In 2009, he got his Doctoral degree from the National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China. He is now a Postdoctoral Researcher in CNRS, Telecom ParisTech, Paris, France. His research interests include computer vision, pattern recognition, and machine learning.

References (27)

H. Yu et al.
A direct LDA algorithm for high-dimensional data with application to face recognition
Pattern Recognition
(2001)
M. Li et al.
2d-lda: a novel statistical linear discriminant analysis for image matrix
Pattern Recognition Letters
(2005)
Y. Pang et al.
Outlier-resisting graph embedding
Neurocomputing
(2010)
R.O. Duda et al.
Pattern Classification
(2000)
J.M. Geoffrey
Discriminant Analysis and Statistical Pattern Recognition
(1992)
K. Fukunaga
Introduction to Statistical Pattern Recognition, second ed.
(1990)
F. Chen et al.
A new LDA-based face recognition system which can solve the small sample size problem
Pattern Recognition
(2000)
X. Wang, X. Tang, Dual-space linear discriminant analysis for face recognition, in: Proceedings of the CVPR, vol. 2,...
P. Belhumeur et al.
Eigenfaces vs. fisherfaces: recognition using class specific linear projection
IEEE Transactions on Pattern Analysis and Machine Intelligence
(1997)
D.L. Swets et al.
Using discriminant eigenfeatures for image retrieval
IEEE Transactions on Pattern Analysis and Machine Intelligence
(1996)

J. Ye, R. Janardan, Q. Li, Two-dimensional linear discriminant analysis, in: NIPS, vol. 2, 2004, pp....

S. Yan, D. Xu, Q. Yang, L. Zhang, X. Tang, H. Zhang, Discriminant analysis with tensor representation, in: Proceedings...

X. He et al.

Tensor subspace analysis

Cited by (73)

Multi-view robust regression for feature extraction
2024, Pattern Recognition
Recently, Multi-view Discriminant Analysis (MVDA) has been proposed and achieves good performance in multi-view recognition tasks. However, as an extension of LDA, this method still suffers from the small-class problem and has the sensitivity to outliers. In order to address these drawbacks and achieve better performance on multi-view recognition tasks, we proposed Multi-view Robust Regression (MVRR) for multi-view feature extraction. MVRR is a regression based method that imposes $L_{2, 1}$ norm as the metric of the loss function and the regularization term to improve robustness and obtain jointly sparse projection matrices for effective feature extraction. Moreover, we incorporate an orthogonal matrix to regress the extracted features to their scaled label to avoid the small-class problem. Therefore, MVRR guarantees the projection matrix to break through the restriction of the number of class for solving the small-class problem. We also propose an iterative algorithm to compute the optimal solution of MVRR and the convergence of MVRR is proved. Experiments are conducted on four databases to verify the performance of MVRR and the result illustrates that MVRR is robust on multi-view feature extraction.
Capped ℓ<inf>p</inf>-norm linear discriminant analysis for robust projections learning
2022, Neurocomputing
Linear Discriminant Analysis (LDA) is one of the most representative supervised robust dimensionality reduction methods for handling high-dimensional data. High-dimensional datasets tend to contain more outliers and other sorts of noise, whereas most of the existing LDA models incorrectly consider the arithmetic mean of samples as the optimal mean, leading to the deviation of the data mean and thus reduce the robustness of LDA. In this paper, we propose a novel robust trace ratio objective in which the calculation of the difference between sample and class mean is converted to the calculation of the difference between each pair of samples. Besides, the within-class scatter and the total scatter are measured by capped $ℓ_{p}$ -norm. As a result, this novel reformulation can automatically avoid mean calculation and meanwhile mitigate the negative effect of outliers on the objective function. Furthermore, an iterative optimization algorithm is derived to obtain the solution of the model. Extensive experimental results on several benchmark datasets show the superior performance of the proposed method.
Generalized two-dimensional linear discriminant analysis with regularization
2021, Neural Networks
Recent advances show that two-dimensional linear discriminant analysis (2DLDA) is a successful matrix based dimensionality reduction method. However, 2DLDA may encounter the singularity issue theoretically, and also is sensitive to outliers. In this paper, a generalized Lp-norm 2DLDA framework with regularization for an arbitrary $p > 0$ is proposed, named G2DLDA. There are mainly two contributions of G2DLDA: one is G2DLDA model uses an arbitrary Lp-norm to measure the between-class and within-class scatter, and hence a proper $p$ can be selected to achieve robustness. The other one is that the introduced regularization term makes G2DLDA enjoy better generalization performance and avoid singularity. In addition, an effective learning algorithm is designed for G2LDA, which can be solved through a series of convex problems with closed-form solutions. Its convergence can be guaranteed theoretically when $1 \leq p \leq 2$ . Preliminary experimental results on three contaminated human face databases show the effectiveness of the proposed G2DLDA.
Kronecker-decomposable robust probabilistic tensor discriminant analysis
2021, Information Sciences
As a generative model, probabilistic linear discriminant analysis (PLDA) has achieved good performance in supervised learning tasks. The model incorporates both within-individual and between-individual variation, and remaining unexplained data variation is assumed to follow Gaussian distribution. However, the assumption of Gaussian distribution makes the model sensitive to the presence of noise and outliers in training set. To address this issue, this paper proposes a robust probabilistic linear discriminant analysis model by assuming Laplace prior on the noise term. Instead of solving high-dimensional linear systems, we embed a Kronecker-decomposable component in the new model for tensor data, significantly reducing the size of problems. As the non-conjugacy of Laplace distribution complicates the calculation of the posteriors of latent variables, we express it to a hierarchical architecture using an Inverse Gamma distribution and then adopt variational expectation–maximization (EM) algorithm to learn model parameters. The reconstruction and classification experiments on several public databases show the superiority of the proposed model compared with the state-of-the-art LDA-based algorithms.
Self-centralized jointly sparse maximum margin criterion for robust dimensionality reduction
2020, Knowledge-Based Systems
Linear discriminant analysis (LDA) is among the most popular supervised dimensionality reduction algorithms, which has been largely followed in the fields of pattern recognition and data mining. However, LDA has three major drawbacks. One is the challenge brought by small-sample-size (SSS) problem; second makes it sensitive to outliers due to the use of squared $L_{2}$ -norms in the scatter loss evaluation; the third is the case that the feature loadings in projection matrix are relatively redundant and there is a risk of overfitting. In this paper, we put forward a novel functional expression for LDA, which combines maximum margin criterion (MMC) with a weighted strategy formulated by $L_{1, 2}$ -norms to against outliers. Meanwhile, we simultaneously realize the adaptive calculation of weighted intra-class and global centroid to further reduce the influence of outliers, and employ the $L_{2, 1}$ -norm to constrain row sparsity so that subspace learning and feature selection could be performed cooperatively. Besides, an effective alternating iterative algorithm is derived and its convergence is verified. From the complexity analysis, our proposed algorithm can deal with large-scale data processing. Our proposed model can address the sensitivity problem of outliers and extract the most representative features while preventing overfitting effectively. Experiments performed on several benchmark databases demonstrate that the proposed algorithm is more effective than some other state-of-the-art methods and has better generalization performance.
Robust Bhattacharyya bound linear discriminant analysis through an adaptive algorithm
2019, Knowledge-Based Systems
In this paper, we propose a novel linear discriminant analysis (LDA) criterion via the Bhattacharyya error bound estimation based on a novel L1-norm (L1BLDA) and L2-norm (L2BLDA). Both L1BLDA and L2BLDA maximize the between-class scatters which are measured by the weighted pairwise distances of class means and meanwhile minimize the within-class scatters under the L1-norm and L2-norm, respectively. The proposed models can avoid the small sample size (SSS) problem and have no rank limit that may encounter in LDA. It is worth mentioning that, the employment of L1-norm gives a robust performance of L1BLDA, and L1BLDA is solved through an effective non-greedy alternating direction method of multipliers (ADMM), where all the projection vectors can be obtained once for all. In addition, the weighting constants of L1BLDA and L2BLDA between the between-class and within-class terms are determined by the involved data, which makes our L1BLDA and L2BLDA more adaptive. The experimental results on both benchmark data sets as well as the handwritten digit databases demonstrate the effectiveness of the proposed methods.

View all citing articles on Scopus

Weiming Hu received the Ph.D. degree from the Department of Computer Science and Engineering, Zhejiang University. From April 1998 to March 2000, he was a Postdoctoral Research Fellow with the Institute of Computer Science and Technology, Founder Research and Design Center, Peking University. Since April 2000, he has been with the National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences. Now he is a Professor and a Ph.D. Student Supervisor in the laboratory. In 2007, he became an IEEE Senior Member and an Associate Editor for IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics. His research interests are in video information processing and network information security recognition. He has published more than 90 papers on national and international journals, and international conferences.

Hanzi Wang is currently a Senior Research Fellow in the Department of Computer Science, the University of Adelaide, Australia. He was an Assistant Research Scientist (2007–2008) and a Postdoctor (2006–2007) at the Johns Hopkins University, and a Research Fellow at Monash University, Australia (2004–2006). He received the Ph.D. degree in Computer Vision from Monash University. He has been awarded the Douglas Lampard Electrical Engineering Research Prize and Medal for the best Ph.D. thesis in the Department. His research interests are concentrated on computer vision and pattern recognition including visual tracking, robust statistics, video segmentation, model fitting, optical flow calculation, fundamental matrix, image segmentation and related fields. He is a Senior Member of the IEEE and he was listed in Who's Who in Science and Engineering and Who's Who in the World.

Zhongfei (Mark) Zhang received the B.S. degree in Electronics Engineering (with honors) and the M.S. degree in Information Sciences from Zhejiang University, China, and the Ph.D. degree in Computer Science from the University of Massachusetts at Amherst. He is currently an Associate Professor of Computer Science at the Computer Science Department, State University of New York (SUNY) at Binghamton. He was on the Faculty of Computer Science and Engineering Department, and a Research Scientist at the Center of Excellence for Document Analysis and Recognition, both at SUNY Buffalo. His research interests include multimedia information indexing and retrieval, data mining and knowledge discovery, computer vision and image understanding, pattern recognition, as well as bioinformatics. His research is sponsored by the National Science Foundation, AirForce Office of Scientific Research, the Air Force Research Laboratory, and the New York State Government, as well as private industries, including Microsoft and Kodak. He has served as a reviewer/PC member for many conferences and journals, as well as a grant review panelist for governmental and private funding agencies. He has also served as a technical consultant for a number of industrial and governmental organizations. He was an Air Force Research Laboratory Faculty Visiting Fellow and a Microsoft Research Visiting Researcher.

Dr. Zhang is a recipient of the U.S. National Academies/National Research Council Visiting Fellow and he was the Western New York 2004 Inventor of the Year Individual Category 2nd Place. He won the SUNY Chancellor's Promising Inventor Award and the JSPS International Collaboration Award.

¹: The author has moved to CNRS, TELECOM ParisTech, France.

View full text

Linear discriminant analysis using rotational invariant L1 norm

Abstract

Introduction

Section snippets

Frobenius and R1 norms

The classical LDA

R1 norm based discriminant criterion (DCL1)

Experiments

Conclusion

Pattern Recognition

Pattern Recognition Letters

Neurocomputing

Pattern Classification

Discriminant Analysis and Statistical Pattern Recognition

Introduction to Statistical Pattern Recognition, second ed.

A new LDA-based face recognition system which can solve the small sample size problem

Pattern Recognition

Eigenfaces vs. fisherfaces: recognition using class specific linear projection

IEEE Transactions on Pattern Analysis and Machine Intelligence

Using discriminant eigenfeatures for image retrieval

IEEE Transactions on Pattern Analysis and Machine Intelligence

Tensor subspace analysis

Linear discriminant analysis using rotational invariant L₁ norm

Frobenius and R₁ norms

R₁ norm based discriminant criterion (DCL₁)