Elsevier

Neural Networks

Volume 46, October 2013, Pages 165-171
Neural Networks

Complexity-reduced implementations of complete and null-space-based linear discriminant analysis

https://doi.org/10.1016/j.neunet.2013.05.010Get rights and content

Abstract

Dimensionality reduction has become an important data preprocessing step in a lot of applications. Linear discriminant analysis (LDA) is one of the most well-known dimensionality reduction methods. However, the classical LDA cannot be used directly in the small sample size (SSS) problem where the within-class scatter matrix is singular. In the past, many generalized LDA methods has been reported to address the SSS problem. Among these methods, complete linear discriminant analysis (CLDA) and null-space-based LDA (NLDA) provide good performances. The existing implementations of CLDA are computationally expensive. In this paper, we propose a new and fast implementation of CLDA. Our proposed implementation of CLDA, which is the most efficient one, is equivalent to the existing implementations of CLDA in theory. Since CLDA is an extension of null-space-based LDA (NLDA), our implementation of CLDA also provides a fast implementation of NLDA. Experiments on some real-world data sets demonstrate the effectiveness of our proposed new CLDA and NLDA algorithms.

Introduction

Many applications including face recognition, machine learning and data mining require to deal with data in high-dimensional spaces. Such high-dimensional data sets present new challenges in data processing and analysis. It is essential to reduce the dimensionality significantly in order to discover the intrinsic data structure of high-dimensional data and manipulate the high-dimensional data efficiently. Then, dimensionality reduction becomes a ubiquitous preprocessing step in manipulating the high-dimensional data. After dimensionality reduction, the high-dimensional data can be transformed into a low-dimensional space with limited loss of information.

In past decades, many dimensionality reduction algorithms have been proposed to handle such high-dimensional data. Two well-known dimensionality reduction methods are principal component analysis (PCA), which is an unsupervised learning algorithm, and linear discriminant analysis (Duda et al., 2000, Fukunaga, 1990), which is a supervised learning technique. Generally, LDA is more suitable than PCA for classification problem (Duda et al., 2000).

LDA attempts to find an optimal matrix on which the samples of different classes are as possible as far from each other while the samples of the same class are as possible as near to each other. By applying a generalized eigen-decomposition on the within-class scatter matrix (Sw) and between-class scatter matrix (Sb) of the given training samples, the optimal projection matrix can be readily computed. LDA has been widely applied for decades in many applications such as text classification (Cai, He, & Han, 2008), face recognition (Belhumeour et al., 1997, Jin, Yang, Hu et al., 2001, Lu et al., 2003), information retrieval (Berry et al., 1995, Howland et al., 2003), pattern recognition (Bishop, 2006, Fukunaga, 1990), and microarray data analysis (Baldi and Hatfield, 2002, Dudoit et al., 2002).

The classical LDA requires the total scatter matrix (St) should be nonsingular. In many applications, however, the number of samples is generally smaller than the dimensionality of the samples, which is known as the small sample size (SSS) problem or undersampled problem (Howland et al., 2006, Krzanowski et al., 1995), such that all the scatter matrices are singular. As a result, the classical LDA cannot be used to the SSS problem directly.

In order to make the classical LDA applicable for the SSS problem, many LDA-based methods, e.g. PCA + LDA (Belhumeour et al., 1997), uncorrelated LDA (ULDA) (Jin, Yang, Hu et al., 2001, Jin, Yang, Tang et al., 2001, Ye, 2005, Ye et al., 2006), null-space-based LDA (NLDA) (Chen et al., 2000, Chu and Thye, 2010, Huang et al., 2002, Sharma and Paliwal, 2012), LDA/GSVD (Howland et al., 2003, Howland and Park, 2004, Ye et al., 2004), orthogonal LDA (Ching et al., 2012, Chu and Goh, 2010, Park et al., 0000, Ye, 2005, Ye and Xiong, 2006), complete LDA (CLDA) (Lu et al., 2012, Yang and Yang, 2003), dual-space LDA (Wang and Tang, 2004, Zheng and Tang, 2009) (DSLDA), Bayes optimal LDA (Hamsici & Martinez, 2008) and least squares LDA (Ye, 2007), have been proposed in the literature. These extensions of LDA can deal with the high-dimensional samples and learn the optimal projection matrix.

Among these LDA-based algorithms, NLDA and CLDA can provide good classification performances for the SSS problem. Chen et al. (2000) firstly proposed the original NLDA method which makes use of the discriminant information in the null space of Sw. However, the implementation of NLDA in Chen et al. (2000) is computationally expensive since it need compute the total null space of Sw, which is often very large for high-dimensional samples. In order to reduce the computational complexity of NLDA, Huang et al. (2002) proposed the PCA + NLDA method, which is equivalent to the original NLDA method theoretically but is more efficient. Chu and Thye (2010) proposed another implementation of NLDA, which is carried out by only doing QR decomposition and then more efficient. Recently, Sharma and Paliwal (2012) proposed a new computationally fast procedure for NLDA, which is the fastest one among the existing implementation of NLDA. A drawback of NLDA is that it loses some useful discriminant information in the principal space of the training samples.

The CLDA method, proposed by Yang and Yang (2003), can make full use of discriminant information. That is, it not only derives its discriminant vectors from the null space of Sw, but derives its discriminant vectors from the range space of Sw. As a result, CLDA can extract more useful discriminant information than the other methods. However, the implementation of CLDA in Yang and Yang (2003) suffers from a complexity burden. Recently, Lu et al. (2012) proposed another computationally fast implementation of CLDA, which is faster than Yang’s implementation.

In order to further reduce the computational burden for CLDA, in this paper, we propose a new and fast implementation of CLDA. Our new procedure for CLDA is faster than the existing ones, but is equivalent to the existing ones in theory. Since CLDA is an extension on NLDA, our implementation of CLDA also provides a fast implementation of NLDA. Compared with the other existing implementations of NLDA, our proposed implementation is the most efficient.

The organization of the rest of this paper is as follows. In Section  2, we review briefly the related works on LDA, NLDA and CLDA algorithms. In Section  3, we propose a new and fast implementation of CLDA and NLDA. Section  4 is devoted to the experiments. Finally, we conclude the paper in Section  5.

Section snippets

Outline of LDA

Given a data matrix X={x1,x2,,xn}=[X1,,Xc]Rd×n, where xiRd, for i=1,2,,n, is the ith training sample in a d dimensional space, XiRd×ni, for i=1,2,,c, is a collection of training samples from the ith class and i=1cni=n. Let Ni be the set of column indices that belongs to the ith class, i.e., xj, for jNi, belongs to the ith class. In the classical LDA, three scatter matrices, i.e., the within-class, between-class and total scatter matrices, are defined, respectively, as follows: Sw=i=1c

New implementations of CLDA and NLDA

In this section, we will present our new and fast implementation of CLDA, which is much faster than the existing ones. Since CLDA is an extension of NLDA, our implementation of CLDA also provides a fast implementation of NLDA. When dealing with the high-dimensional data, we can assume that the training samples are linearly independent (Ye & Xiong, 2006). Then we have rank(Sb)=c1,rank(Sw)=nc and rank(St)=n1.

Experiments and results

In this section, we will test our proposed implementations of CLDA and NLDA on ORL, AR and FERET face databases in terms of computational efficiency and the recognition accuracy. We compare our proposed implementations of NLDA and CLDA with the existing different implementations of NLDA, i.e., PCA + NLDA (Huang et al., 2002), QR + NLDA (Chu & Thye, 2010) and Sharma’s NLDA (Sharma & Paliwal, 2012), and CLDA, i.e., Yang’s CLDA (Yang & Yang, 2003) and Lu’s CLDA (Lu et al., 2012). The Matlab code

Conclusions

In this paper, we have derived a new and fast implementation of complete linear discriminant analysis (CLDA). Besides, we also proposed a new and fast implementation of null-space-based linear discriminant analysis (NLDA). The proposed implementations of CLDA and NLDA are shown to be faster than the original ones. This computational advantage can be achieved without any degradation in classification performance. Experiments on ORL, AR and FERET databases demonstrated the effectiveness of our

Acknowledgments

This research is supported by Anhui Provincial Natural Science Foundation (No. 1308085MF95), the Key Laboratory of Intelligent Perception and Systems for High-Dimensional Information (Nanjing University of Science and Technology), Ministry of Education (Grant No. 30920130122005), China Postdoctoral Science Foundation (2013M531251), NSFC of China (Nos 61231002, 61073137, 61203243), the Natural Science Foundation of the Anhui Higher Education Institutions of China (No. KJ2013B031) and Jiangxi

References (33)

  • M.W. Berry et al.

    Using linear algebra for intelligent information retrieval

    SIAM Review

    (1995)
  • C.M. Bishop

    Pattern recognition and machine learning

    (2006)
  • D. Cai et al.

    SRDA: an efficient algorithm for large scale discriminant analysis

    IEEE Transactions on Knowledge and Data Engineering

    (2008)
  • D. Chu et al.

    A new and fast orthogonal linear discriminant analysis on undersampled problems

    SIAM Journal on Scientific Computing

    (2010)
  • R.O. Duda et al.

    Pattern classification

    (2000)
  • S. Dudoit et al.

    Comparison of discrimination methods for the classfication of tumors using gene expression data

    Journal of the American Statistical Association

    (2002)
  • Cited by (13)

    • A regularized least square based discriminative projections for feature extraction

      2016, Neurocomputing
      Citation Excerpt :

      Fisher LDA often meets small sample size problem (SSS) [2] in which the within-class is singular. Many LDA variants are developed in the past decades [5–16]. Since linear methods could not process the nonlinear variants (e.g. illumination, gesture and so on).

    • Independent components analysis to increase efficiency of discriminant analysis methods (FDA and LDA): Application to NMR fingerprinting of wine

      2015, Talanta
      Citation Excerpt :

      Factorial discriminant analysis (FDA) and linear discriminant analysis (LDA) are among two of the most popular and successful classification methods [1–3]. LDA is based on the determination of linear discriminant functions, which simultaneously maximizes the ratio of between-class variance and minimizes the within-class variance by applying a generalized eigen-decomposition [4]. In LDA, classes are assumed to follow a multivariate normal distribution and be linearly separable.

    • Feature extraction using two-dimensional maximum embedding difference

      2014, Information Sciences
      Citation Excerpt :

      There are lots of methods successful applying to linear data, such as PCA, LDA and their extensions involve probabilistic principal component analysis (PPCA) [20], mixture of PCA [21], independent component analysis (ICA) [22], and incremental PCA [23]. However, they may fail to explore the essential structure of data with nonlinear distribution [24,25]. In real world applications, the nonlinearity encountered mostly appears in non-Gaussian or manifold-value data.

    View all citing articles on Scopus
    View full text