Complexity-reduced implementations of complete and null-space-based linear discriminant analysis
Introduction
Many applications including face recognition, machine learning and data mining require to deal with data in high-dimensional spaces. Such high-dimensional data sets present new challenges in data processing and analysis. It is essential to reduce the dimensionality significantly in order to discover the intrinsic data structure of high-dimensional data and manipulate the high-dimensional data efficiently. Then, dimensionality reduction becomes a ubiquitous preprocessing step in manipulating the high-dimensional data. After dimensionality reduction, the high-dimensional data can be transformed into a low-dimensional space with limited loss of information.
In past decades, many dimensionality reduction algorithms have been proposed to handle such high-dimensional data. Two well-known dimensionality reduction methods are principal component analysis (PCA), which is an unsupervised learning algorithm, and linear discriminant analysis (Duda et al., 2000, Fukunaga, 1990), which is a supervised learning technique. Generally, LDA is more suitable than PCA for classification problem (Duda et al., 2000).
LDA attempts to find an optimal matrix on which the samples of different classes are as possible as far from each other while the samples of the same class are as possible as near to each other. By applying a generalized eigen-decomposition on the within-class scatter matrix () and between-class scatter matrix () of the given training samples, the optimal projection matrix can be readily computed. LDA has been widely applied for decades in many applications such as text classification (Cai, He, & Han, 2008), face recognition (Belhumeour et al., 1997, Jin, Yang, Hu et al., 2001, Lu et al., 2003), information retrieval (Berry et al., 1995, Howland et al., 2003), pattern recognition (Bishop, 2006, Fukunaga, 1990), and microarray data analysis (Baldi and Hatfield, 2002, Dudoit et al., 2002).
The classical LDA requires the total scatter matrix () should be nonsingular. In many applications, however, the number of samples is generally smaller than the dimensionality of the samples, which is known as the small sample size (SSS) problem or undersampled problem (Howland et al., 2006, Krzanowski et al., 1995), such that all the scatter matrices are singular. As a result, the classical LDA cannot be used to the SSS problem directly.
In order to make the classical LDA applicable for the SSS problem, many LDA-based methods, e.g. PCA + LDA (Belhumeour et al., 1997), uncorrelated LDA (ULDA) (Jin, Yang, Hu et al., 2001, Jin, Yang, Tang et al., 2001, Ye, 2005, Ye et al., 2006), null-space-based LDA (NLDA) (Chen et al., 2000, Chu and Thye, 2010, Huang et al., 2002, Sharma and Paliwal, 2012), LDA/GSVD (Howland et al., 2003, Howland and Park, 2004, Ye et al., 2004), orthogonal LDA (Ching et al., 2012, Chu and Goh, 2010, Park et al., 0000, Ye, 2005, Ye and Xiong, 2006), complete LDA (CLDA) (Lu et al., 2012, Yang and Yang, 2003), dual-space LDA (Wang and Tang, 2004, Zheng and Tang, 2009) (DSLDA), Bayes optimal LDA (Hamsici & Martinez, 2008) and least squares LDA (Ye, 2007), have been proposed in the literature. These extensions of LDA can deal with the high-dimensional samples and learn the optimal projection matrix.
Among these LDA-based algorithms, NLDA and CLDA can provide good classification performances for the SSS problem. Chen et al. (2000) firstly proposed the original NLDA method which makes use of the discriminant information in the null space of . However, the implementation of NLDA in Chen et al. (2000) is computationally expensive since it need compute the total null space of , which is often very large for high-dimensional samples. In order to reduce the computational complexity of NLDA, Huang et al. (2002) proposed the PCA + NLDA method, which is equivalent to the original NLDA method theoretically but is more efficient. Chu and Thye (2010) proposed another implementation of NLDA, which is carried out by only doing QR decomposition and then more efficient. Recently, Sharma and Paliwal (2012) proposed a new computationally fast procedure for NLDA, which is the fastest one among the existing implementation of NLDA. A drawback of NLDA is that it loses some useful discriminant information in the principal space of the training samples.
The CLDA method, proposed by Yang and Yang (2003), can make full use of discriminant information. That is, it not only derives its discriminant vectors from the null space of , but derives its discriminant vectors from the range space of . As a result, CLDA can extract more useful discriminant information than the other methods. However, the implementation of CLDA in Yang and Yang (2003) suffers from a complexity burden. Recently, Lu et al. (2012) proposed another computationally fast implementation of CLDA, which is faster than Yang’s implementation.
In order to further reduce the computational burden for CLDA, in this paper, we propose a new and fast implementation of CLDA. Our new procedure for CLDA is faster than the existing ones, but is equivalent to the existing ones in theory. Since CLDA is an extension on NLDA, our implementation of CLDA also provides a fast implementation of NLDA. Compared with the other existing implementations of NLDA, our proposed implementation is the most efficient.
The organization of the rest of this paper is as follows. In Section 2, we review briefly the related works on LDA, NLDA and CLDA algorithms. In Section 3, we propose a new and fast implementation of CLDA and NLDA. Section 4 is devoted to the experiments. Finally, we conclude the paper in Section 5.
Section snippets
Outline of LDA
Given a data matrix , where , for , is the th training sample in a dimensional space, , for , is a collection of training samples from the th class and . Let be the set of column indices that belongs to the th class, i.e., , for , belongs to the th class. In the classical LDA, three scatter matrices, i.e., the within-class, between-class and total scatter matrices, are defined, respectively, as follows:
New implementations of CLDA and NLDA
In this section, we will present our new and fast implementation of CLDA, which is much faster than the existing ones. Since CLDA is an extension of NLDA, our implementation of CLDA also provides a fast implementation of NLDA. When dealing with the high-dimensional data, we can assume that the training samples are linearly independent (Ye & Xiong, 2006). Then we have and .
Experiments and results
In this section, we will test our proposed implementations of CLDA and NLDA on ORL, AR and FERET face databases in terms of computational efficiency and the recognition accuracy. We compare our proposed implementations of NLDA and CLDA with the existing different implementations of NLDA, i.e., PCA + NLDA (Huang et al., 2002), QR + NLDA (Chu & Thye, 2010) and Sharma’s NLDA (Sharma & Paliwal, 2012), and CLDA, i.e., Yang’s CLDA (Yang & Yang, 2003) and Lu’s CLDA (Lu et al., 2012). The Matlab code
Conclusions
In this paper, we have derived a new and fast implementation of complete linear discriminant analysis (CLDA). Besides, we also proposed a new and fast implementation of null-space-based linear discriminant analysis (NLDA). The proposed implementations of CLDA and NLDA are shown to be faster than the original ones. This computational advantage can be achieved without any degradation in classification performance. Experiments on ORL, AR and FERET databases demonstrated the effectiveness of our
Acknowledgments
This research is supported by Anhui Provincial Natural Science Foundation (No. 1308085MF95), the Key Laboratory of Intelligent Perception and Systems for High-Dimensional Information (Nanjing University of Science and Technology), Ministry of Education (Grant No. 30920130122005), China Postdoctoral Science Foundation (2013M531251), NSFC of China (Nos 61231002, 61073137, 61203243), the Natural Science Foundation of the Anhui Higher Education Institutions of China (No. KJ2013B031) and Jiangxi
References (33)
- et al.
A new LDA-based face recognition system which can solve the small sample size problem
Pattern Recognition
(2000) - et al.
Regularized orthogonal linear discriminant analysis
Pattern Recognition
(2012) - et al.
A new and fast implementation for null space based linear discriminant analysis
Pattern Recognition
(2010) - et al.
Face recognition based on the uncorrelated discriminant transformation
Pattern Recognition
(2001) - et al.
A theorem on the uncorrelated optimal discriminant vectors
Pattern Recognition
(2001) - et al.
Incremental complete LDA for face recognition
Pattern Recognition
(2012) - et al.
A new perspective to null linear discriminant analysis method and its fast implementation using random matrix multiplication with scatter matrices
Pattern Recognition
(2012) - et al.
Why can LDA be performed in PCA transformed space?
Pattern Recognition
(2003) - et al.
DNA microarrays and gene expression: from experiments to data analysis and modeling
(2002) - et al.
Eigenfaces vs. Fisherfaces: recognition using class specific linear projection
IEEE Transactions on Pattern Analysis and Machine Intelligence
(1997)
Using linear algebra for intelligent information retrieval
SIAM Review
Pattern recognition and machine learning
SRDA: an efficient algorithm for large scale discriminant analysis
IEEE Transactions on Knowledge and Data Engineering
A new and fast orthogonal linear discriminant analysis on undersampled problems
SIAM Journal on Scientific Computing
Pattern classification
Comparison of discrimination methods for the classfication of tumors using gene expression data
Journal of the American Statistical Association
Cited by (13)
An evolutionary supply chain management service model based on deep learning features for automated glaucoma detection using fundus images
2024, Engineering Applications of Artificial IntelligenceA regularized least square based discriminative projections for feature extraction
2016, NeurocomputingCitation Excerpt :Fisher LDA often meets small sample size problem (SSS) [2] in which the within-class is singular. Many LDA variants are developed in the past decades [5–16]. Since linear methods could not process the nonlinear variants (e.g. illumination, gesture and so on).
Independent components analysis to increase efficiency of discriminant analysis methods (FDA and LDA): Application to NMR fingerprinting of wine
2015, TalantaCitation Excerpt :Factorial discriminant analysis (FDA) and linear discriminant analysis (LDA) are among two of the most popular and successful classification methods [1–3]. LDA is based on the determination of linear discriminant functions, which simultaneously maximizes the ratio of between-class variance and minimizes the within-class variance by applying a generalized eigen-decomposition [4]. In LDA, classes are assumed to follow a multivariate normal distribution and be linearly separable.
Feature extraction using two-dimensional maximum embedding difference
2014, Information SciencesCitation Excerpt :There are lots of methods successful applying to linear data, such as PCA, LDA and their extensions involve probabilistic principal component analysis (PPCA) [20], mixture of PCA [21], independent component analysis (ICA) [22], and incremental PCA [23]. However, they may fail to explore the essential structure of data with nonlinear distribution [24,25]. In real world applications, the nonlinearity encountered mostly appears in non-Gaussian or manifold-value data.