Representative null space LDA for discriminative dimensionality reduction

doi:10.1016/j.patcog.2020.107664

Pattern Recognition

Volume 111, March 2021, 107664

https://doi.org/10.1016/j.patcog.2020.107664 Get rights and content

Highlights

•
The research reveals the main problem of the classic null space LDA method: the intrinsic overfitting problem.
•
A new approach, representative null space LDA (RNLDA), is proposed to solve the overfitting problem.
•
Practical and efficient RNLDA algorithms and an automatic parameter setting algorithm are proposed.

Abstract

Null space Linear Discriminant Analysis (NLDA) was proposed twenty years ago to overcome the singularity problem of LDA in practical applications. With two decades of technique development, many Discriminative Dimensionality Reduction (DDR) methods that outperform NLDA have been proposed. This paper provides new insight into NLDA and illustrates that NLDA is much more powerful after solving its inherent problem. The main problem of NLDA is the intrinsic overfitting problem. An ideal NLDA model is proposed to analyze its overfitting problem. Based on the ideal NLDA model, a more reasonable Representative NLDA (RNLDA) method is proposed to prevent overfitting. Two simple but efficient RNLDA algorithms are proposed to implement the RNLDA method with a theoretical proof. This study theoretically analyzed and indicated that applying the classical but simple hold-out pretraining method can automatically set the only parameter to achieve high performance. Extensive experiments with eight databases demonstrate the superior performance of the RNLDA method over state-of-the-art DDR methods.

Introduction

As the technique development of sensing and storage hardware improves, the processing of an increasing number of high-dimensional data is needed in many fields. Due to the curse of dimensionality [1], processing data in their original high dimension is computationally expensive or even impossible. Therefore, Dimensionality Reduction (DR) has an important role in processing these data. The main goal of DR is to map high-dimension data into a low dimension with a minimal loss of desired information. In most applications, including machine learning and pattern recognition, discriminative information is desired. Therefore, DR also refers to Discriminative Dimensionality Reduction (DDR). In addition to reducing computation and storage, DDR usually also has another role- feature selection, which renders subsequent classification more precise in the obtained low-dimensional subspace than in the original high dimension. Therefore, DDR methods are also referred to as feature selection methods [2], [3]. Depending on whether the label information is employed in training, DDR methods can be classified into three types of methods: unsupervised methods [4], [5], semisupervised methods [6], [7], and supervised [8], [9] methods. Generally, supervised DDR methods outperform the other two types of methods because they utilize label information during training.

Linear Discriminant Analysis (LDA) is a classical and famous supervised DDR method [8], [10]. LDA introduced the idea of obtaining a low-dimensional subspace that maximizes the ratio of the between-class difference to the within-class difference. Numerous variations of the original LDA method have been proposed to improve it from different perspectives.

The main problem that LDA confronted in many practical applications is the singularity, or Small Sample Size (SSS) problem [11]. When the dimension of data is considerably larger than the number of training samples, such that the within-class and total scatter matrices are singular, LDA can not be applied. To address the singularity issue, numerous variations of LDA have been proposed. Direct LDA (DLDA) [12] employs two main steps to overcome the singularity problem. First, a transformation matrix is computed to transform the training data to the range space of the between-class scatter matrix. Second, the dimensionality of the transformed data is further transformed using regulating matrices. In the Regularized LDA (RLDA) [13], [14], a small perturbation is added to the within-class scatter matrix to make it nonsingular. Subspace LDA (SLDA) is the most popular variation with extensive applications. First, the original data are projected to a lower-dimensional subspace using PCA, which makes the transformed within-class scatter matrix full rank. Second, LDA is utilized to further reduce the dimensionality [15]. Therefore, the method is also referred to as PCA + LDA. Null space LDA (NLDA) [16] projects data into the null space of the within-class scatter matrix and then maximizes the between-class difference in the null space. Many other variations overcome the singularity problem in different ways, such as Pseudo-inverse LDA [17], Orthogonal LDA [18], and Angle Linear Discriminant Embedding [19].

In addition to addressing the singularity issue, many other methods attempt to improve LDA in modeling. LDA utilizes the between-class, within-class or total scatter matrices to model the global structures of data. Many researchers have reported that leaning the local structure of data is advantageous. Based on this idea, numerous local structure-based DDR methods have been proposed. For example, Locality Sensitive Discriminant Analysis (LSDA) considers the k-nearest neighbors of each training sample [20]. Among the neighbors, pairs from the same classes are used to represent the within-class difference, and pairs from different classes are used to represent the between-class difference. In Marginal Fisher Analysis (MFA) [9], k₁-nearest neighbors within the same class of each sample are used to represent the within-class difference, and k₂ pairs of nearest samples from the different classes are used to represent the between-class difference. Manifold Partition Discriminant Analysis(MPDA) [21] combines the pairwise differences of neighboring samples with the tangent space [22] and achieves high performance. Many other local structure-based methods benefit from neighboring samples in different ways [23], [24], [25], [26]. On the other hand, in recent years, many researchers claimed that LDA is not sufficiently robust due to the use of L₂-norm and attempted to improve LDA by introducing the L₁-norm. For example, L₁-LDA [27] was reported to be more robust than LDA both theoretically and experimentally. Recently, a new formulation of LDA-Robust LDA (RLDA) [28] was proposed to improve the robustness using joint L_2,1-norm minimization on the objective function. An efficient iterative algorithm was also presented to solve the L_2,1-norm minimization optimization problem. The are also a kind of variations, that attempt to introduce a nonlinear property [29]. The underlying principle in these methods is the sparse nature of signals, and L₁-norm is a useful tool to take use of it. Very recently, further developments based on sparsity and low-rank preservation, which benefit from the sparse nature, have been made [30], [31].

As previously mentioned, NLDA, which was proposed 20 years ago, is a typical variation of LDA that is aimed to overcome the singularity problem. However, compared with SLDA (PCA+LDA), its application is not extensive. In addition, with the development of new DDR methods that were reported to achieve higher performance, e.g., the local structure-based methods and L₁-norm-based methods, the research and application of NLDA has become scarce. This paper provides new insight into NLDA and illustrates that the performance of NLDA is affected by its intrinsic overfitting problem. By solving the overfitting problem, NLDA can achieve very high performance that is even superior to that of state-of-the-art DDR methods.

The main contributions of this paper are summarized as follows:

(1)
It is pointed out that the main problem of NLDA is the intrinsic overfitting problem. An ideal NLDA model is proposed to analyze this overfitting problem. Based on the idea NLDA model, a more reasonable Representative NLDA (RNLDA) method is proposed to prevent overfitting.
(2)
Practical and efficient RNLDA algorithms and an automatic parameter setting algorithm are proposed to implement the RNLDA method. First, two simple but efficient RNLDA algorithms are proposed to implement the RNLDA method. Second, the results of the algorithms are theoretically proven to be the solution of the RNLDA method. Last, the study analyzed and proved that applying the classical but simple hold-out pretraining method can automatically set the only parameter to achieve high performance.

Extensive experiments on eight extensively employed and publicly available databases were conducted to verify the proposed method. The results demonstrate the superior performance of the RNLDA method over state-of-the-art DDR methods.

Section snippets

Null space LDA and the intrinsic overfitting problem

In many pattern recognition applications, the raw data, e.g., images, are not easily distinguished for computers. Learning the differences among distinct subjects from the training samples is necessary to identify new images. The classic LDA achieves this task by obtaining a balance between pushing away samples from different classes and pulling close samples from the same class. Let $X = [x_{1}^{1}, \dots, x_{1}^{k}, \dots, x_{c}^{1}, \dots, x_{c}^{k}] \in R^{D \times N}$ be N training samples that belong to c classes, and assume that each class

Ideal NLDA model

Assume that we have an ideal training data set without noise, in which the ideal image of each subject is known: $X^{0} = [x_{1}^{0}, x_{2}^{0}, \dots, x_{c}^{0}] .$ The K within-class variation patterns contain all possible variations in the practical data set, $E = [e^{1}, e^{2}, \dots, e^{K}] .$ The ideal training set can be represented as $\hat{X} = [{\hat{x}}_{1}^{1}, \dots, {\hat{x}}_{1}^{K}, \dots, {\hat{x}}_{c}^{1}, \dots, {\hat{x}}_{c}^{K}],$ where ${\hat{x}}_{i}^{j} = x_{i}^{0} + e^{j}$ . Recall that a practical data set is $X = [x_{1}^{1}, \dots, x_{1}^{k}, \dots, x_{c}^{1}, \dots, x_{c}^{k}],$ where $x_{i}^{j} (j \in {1, \dots, k})$ can be considered to be randomly sampled from $[{\hat{x}}_{i}^{1}, \dots, {\hat{x}}_{i}^{K}]$ combined with the

Data sets

In the experiments, eight prevalent and publicly available databases are utilized: six face image databases-the AR, CMU PIE, UE95, UE96, ORL, GT databases, one handwritten image database-the MNIST database,and one object images databases-the COIL-100 database.

The AR database [34] contain face images of 126 individuals; 26 different images for each individual taken in two different sessions separated by two weeks. The face images contain variations such as illumination changes, expressions and

Conclusions

This paper provided new insight into NLDA. The results concluded that the intrinsic overfitting problem of NLDA limits its performance. An ideal NLDA model was proposed. Based on the ideal NLDA model, a new method-RNLDA-was proposed to prevent overfitting. Practical RNLDA algorithms and an automatic parameter setting algorithm were also proposed to implement the RNLDA method. Theoretical proofs concerning the method and algorithms were provided. Extensive experiments with eight comparative

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported by the National Natural Science Foundation of China [Grant numbers 51775497 and 51775498].

Zaixing He received his B.S. and M.S. degrees in Engineering from Zhejiang University, China in 2006 and 2008, respectively, and then the Ph.D. degree in Graduate School of Information Science and Technology from Hokkaido University, Japan in 2012. He is currently an associate professor at Zhejiang University, China. His research interests include image processing, and pattern recognition.

References (41)

S. Wold et al.
Principal component analysis
Chemom. Intell. Lab. Syst.
(1987)
P. Chen et al.
Semi-supervised double sparse graphs based discriminant analysis for dimensionality reduction
Pattern Recognit.
(2017)
H. Yu et al.
A direct LDA algorithm for high-dimensional data with application to face recognition
Pattern Recognit.
(2001)
L. Chen et al.
A new LDA-based face recognition system which can solve the small sample size problem
Pattern Recognit.
(2000)
Z. Liu et al.
Discriminative low-rank preserving projection for dimensionality reduction
Appl. Soft Comput.
(2019)
Z. Liu et al.
Structured optimal graph based sparse feature extraction for semi-supervised learning
Signal Process.
(2020)
K. Fukunaga
Introduction to Statistical Pattern Recognition
(2013)
W. Malina
On an extended fisher criterion for feature selection
IEEE Trans. Pattern Anal. Mach. Intell.
(1981)
Z. Zhao et al.
On similarity preserving feature selection
IEEE Trans. Knowl. Data Eng.
(2013)
X. He et al.
Locality preserving projections
Proc. Neural Inf. Process. Syst.
(2004)

M. Sugiyama et al.

Semi-supervised local fisher discriminant analysis for dimensionality reduction

Mach. Learn.

(2010)

R.A. Fisher

The use of multiple measurements in taxonomic problems

Ann. Eugen.

(1936)

S. Yan et al.

Graph embedding and extensions: a general framework for dimensionality reduction

IEEE Trans. Pattern Anal. Mach. Intell.

(2007)

R.C. Rao

The utilization of multiple measurements in problems of biological classification

J. R. Stat. Soc.

(1948)

W.J. Krzanowski et al.

Discriminant analysis with singular covariance matrices: methods and applications to spectroscopic data

Appl. Stat.

(1995)

D. Dai et al.

Face recognition by regularized discriminant analysis

IEEE Trans. Syst. Man. Cybern. Part B (Cybernetics)

(2007)

J. Friedman

Regularized discriminant analysis

J. Am. Stat. Assoc.

(1989)

P.N. Belhumeour et al.

Eigenfaces vs. fisherfaces: recognition using class specific linear projection

IEEE Trans. Pattern Anal. Mach. Intell.

(1997)

M. Skurichina et al.

Stabilizing classifiers for very small sample sizes

Proceedings of the 13th IEEE International Conference on Pattern Recognition

(1996)

J. Ye

Characterization of a family of algorithms for generalized discriminant analysis on undersampled problems

J. Mach. Learn. Res.

(2005)

Cited by (0)

Mengtian Wu received his B.Sc. degree in Mechanical Engineering from Sichuan University, China in 2014. He is currently a Ph.D. candidate in the School of Mechanical Engineering, Zhejiang University, China. His research interests include image processing, computer vision, and deep learning.

Xinyue Zhao received the M.S. degree in Mechanical Engineering from Zhejiang University, China in 2008, and the Ph.D degree in Graduate School of Information Science and Technology from Hokkaido University, Japan in 2012. She is currently an associate professor at Zhejiang University, China. Her research interests include computer vision and image processing.

Shuyou Zhang received the M.S. degree in Mechanical Engineering and the Ph.D. degree in State Key Lab. Of CAD&CG from Zhejiang University, China, in 1991 and 1999, respectively. He is currently a professor at Department of Mechanical Engineering, Zhejiang University, China. He is also the vice administer of Institute of Engineering & Computer Graphics in Zhejiang University, assistant director of Computer Graphics Professional Committee for China Engineering Graphic Society, member of Product Digital Design Professional Committee, and chairman of Zhejiang Engineering Graphic Society. His research interests include product digital design, design and stimulation for complex equipments, and engineering and computer graphics.

Jianrong Tan received the M.S. from Huazhong University of Science and Technology, Wuhan, China in 1987 and the Ph.D. from Zhejiang University in 1992. He is currently a professor at State Key Laboratory of CAD & CG, Zhejiang University. He is an academician of China Engineering Academy. His main research interests include virtual-reality-based simulation, machine learning, CAX and robotics.

View full text

Representative null space LDA for discriminative dimensionality reduction

Highlights

Abstract

Introduction

Section snippets

Null space LDA and the intrinsic overfitting problem

Ideal NLDA model

Data sets

Conclusions

Declaration of Competing Interest

Acknowledgments

Chemom. Intell. Lab. Syst.

Pattern Recognit.

Pattern Recognit.

Pattern Recognit.

Appl. Soft Comput.

Signal Process.

Introduction to Statistical Pattern Recognition

On an extended fisher criterion for feature selection

IEEE Trans. Pattern Anal. Mach. Intell.

On similarity preserving feature selection

IEEE Trans. Knowl. Data Eng.

Locality preserving projections

Proc. Neural Inf. Process. Syst.

Semi-supervised local fisher discriminant analysis for dimensionality reduction

Mach. Learn.

The use of multiple measurements in taxonomic problems

Ann. Eugen.

Graph embedding and extensions: a general framework for dimensionality reduction

IEEE Trans. Pattern Anal. Mach. Intell.

The utilization of multiple measurements in problems of biological classification

J. R. Stat. Soc.

Discriminant analysis with singular covariance matrices: methods and applications to spectroscopic data

Appl. Stat.

Face recognition by regularized discriminant analysis

IEEE Trans. Syst. Man. Cybern. Part B (Cybernetics)

Regularized discriminant analysis

J. Am. Stat. Assoc.

Eigenfaces vs. fisherfaces: recognition using class specific linear projection

IEEE Trans. Pattern Anal. Mach. Intell.

Stabilizing classifiers for very small sample sizes

Proceedings of the 13th IEEE International Conference on Pattern Recognition

Characterization of a family of algorithms for generalized discriminant analysis on undersampled problems

J. Mach. Learn. Res.