Elsevier

Pattern Recognition

Volume 41, Issue 3, March 2008, Pages 1083-1097
Pattern Recognition

A comparison of generalized linear discriminant analysis algorithms

https://doi.org/10.1016/j.patcog.2007.07.022Get rights and content

Abstract

Linear discriminant analysis (LDA) is a dimension reduction method which finds an optimal linear transformation that maximizes the class separability. However, in undersampled problems where the number of data samples is smaller than the dimension of data space, it is difficult to apply LDA due to the singularity of scatter matrices caused by high dimensionality. In order to make LDA applicable, several generalizations of LDA have been proposed recently. In this paper, we present theoretical and algorithmic relationships among several generalized LDA algorithms and compare their computational complexities and performances in text classification and face recognition. Towards a practical dimension reduction method for high dimensional data, an efficient algorithm is proposed, which reduces the computational complexity greatly while achieving competitive prediction accuracies. We also present nonlinear extensions of these LDA algorithms based on kernel methods. It is shown that a generalized eigenvalue problem can be formulated in the kernel-based feature space, and generalized LDA algorithms are applied to solve the generalized eigenvalue problem, resulting in nonlinear discriminant analysis. Performances of these linear and nonlinear discriminant analysis algorithms are compared extensively.

Introduction

Linear discriminant analysis (LDA) seeks an optimal linear transformation by which the original data is transformed to a much lower dimensional space. The goal of LDA is to find a linear transformation that maximizes class separability in the reduced dimensional space. Hence the criteria for dimension reduction in LDA are formulated to maximize the between-class scatter and minimize the within-class scatter. The scatters are measured by using scatter matrices such as the between-class scatter matrix (Sb), within-class scatter matrix (Sw) and total scatter matrix (St). Let us denote a data set A asA=[a1,,an]=[A1,A2,,Ar]Rm×n,where a collection of data items in the class i (1ir) is represented as a block matrix AiRm×ni and Ni is the index set of data items in the class i. Each class i has ni elements and the total number of data is n=i=1rni. The between-class scatter matrix Sb, within-class scatter matrix Sw and total scatter matrix St are defined as Sb=i=1rni(ci-c)(ci-c)T,Sw=i=1rjNi(aj-ci)(aj-ci)T,St=j=1n(aj-c)(aj-c)T,where ci=(1/ni)jNiaj and c=(1/n)j=1naj are class centroids and the global centroid, respectively.

The optimal dimension reducing transformation GTRl×m (l<m) for LDA is the one that maximizes the between-class scatter and minimizes the within-class scatter in a reduced dimensional space. Common optimization criteria for LDA are formulated as the maximization problem of objective functionsJ1(G)=trace(GTSbG)trace(GTSwG),J2(G)=trace((GTSwG)-1(GTSbG)),J3(G)=|GTSbG||GTSwG|,where S˜i=GTSiG for i=b,w are scatter matrices in the space transformed by GT. It is well known [1], [2] that when Sw is nonsingular, the transformation matrix G is obtained by the eigenvectors corresponding to the r-1 largest eigenvalues ofSw-1Sbg=λg.However, for undersampled problems such as text classification and face recognition where the number of data items is smaller than the data dimension, scatter matrices become singular and their inverses are not defined. In order to overcome the problems caused by the singularity of the scatter matrices, several methods have been proposed [3], [4], [5], [6], [7], [8]. In this paper, we present theoretical relationships among several generalized LDA algorithms and compare computational complexities and performances of them.

While linear dimension reduction has been used in many application areas due to its simple concept and easiness in computation, it is difficult to capture a nonlinear relationship in the data by a linear function. Recently kernel methods have been widely used for nonlinear extension of linear algorithms [9]. The original data space is transformed to a feature space by an implicit nonlinear mapping through kernel methods. As long as an algorithm can be formulated with inner product computations, without knowing the explicit representation of a nonlinear mapping we can apply the algorithm in the transformed feature space, obtaining nonlinear extension of the original algorithm. We present nonlinear extensions of generalized LDA algorithms through the formulation of a generalized eigenvalue problem in the kernel-based feature space.

The rest of the paper is organized as follows. In Section 2, a theoretical comparison of generalized LDA algorithms is presented. We study theoretical and algorithmic relationships among several generalized LDA algorithms and compare their computational complexities and performances. An efficient algorithm is also proposed which computes the exactly same solution as that in Refs. [4], [10] but saves computational complexities greatly. In Section 3, nonlinear extensions of these generalized LDA algorithms are presented. A generalized eigenvalue problem is formulated in the nonlinearly transformed feature space for which all the generalized LDA algorithms can be applied resulting in nonlinear dimension reduction methods. Extensive comparisons of these linear and nonlinear discriminant analysis algorithms are conducted. Conclusion follows in Section 4.

For convenience, important notations used throughout the rest of the paper are listed in Table 1.

Section snippets

Regularized LDA

In the regularized LDA (RLDA) [3], when Sw is singular or ill-conditioned, a diagonal matrix αI with α>0 is added to Sw. Since Sw is symmetric positive semidefinite, Sw+αI is nonsingular with any α>0. Therefore, we can apply the algorithm for the classical LDA to solve the eigenvalue problemSbg=λ(Sw+αI)g.

Nonlinear discriminant analysis based on kernel methods

Linear dimension reduction is conceptually simple and has been used in many application areas. However, it has a limitation for the data which is not linearly separable since it is difficult to capture a nonlinear relationship with a linear mapping. In order to overcome such a limitation, nonlinear extensions of linear dimension reduction methods using kernel methods have been proposed [20], [21], [22], [23], [24], [25]. The main idea of kernel methods is that without knowing the nonlinear

Conclusions/Discussions

We presented the relationships among several generalized LDA algorithms developed for handling undersampled problems and compared their computational complexities and performances. As discussed in the theoretical comparison, many algorithms are closely related, and experimental results indicate that computational complexities are important issues in addition to classification performances. The LDA/GSVD showed competitive performances throughout the experiments, but the computational

About the AuthorCHEONG HEE PARK received her Ph.D. in Mathematics from Yonsei University, Korea in 1998. She received the M.S. and Ph.D. degrees in Computer Science at the Department of Computer Science and Engineering, University of Minnesota in 2002 and 2004, respectively. She is currently in the Department of Computer Science and Engineering, Chungnam National University, Korea as an Assistant Professor. Her research interests include pattern recognition, data mining, bioinformatics and

References (27)

  • N. Cristianini, J. Shawe-Taylor, An Introduction to Support Vector Machines and other Kernel-based Learning Methods,...
  • P. Howland et al.

    Generalizing discriminant analysis using the generalized singular value decomposition

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2004)
  • C.C. Paige et al.

    Towards a generalized singular value decomposition

    SIAM J. Numer. Anal.

    (1981)
  • Cited by (139)

    • A decision tree model for the prediction of the stay time of ships in Brazilian ports

      2023, Engineering Applications of Artificial Intelligence
    View all citing articles on Scopus

    About the AuthorCHEONG HEE PARK received her Ph.D. in Mathematics from Yonsei University, Korea in 1998. She received the M.S. and Ph.D. degrees in Computer Science at the Department of Computer Science and Engineering, University of Minnesota in 2002 and 2004, respectively. She is currently in the Department of Computer Science and Engineering, Chungnam National University, Korea as an Assistant Professor. Her research interests include pattern recognition, data mining, bioinformatics and machine learning.

    About the AuthorHAESUN PARK received her B.S. degree in Mathematics from Seoul National University, Seoul Korea, in 1981 with summa cum laude and the University President's Medal for the top graduate, and her M.S. and Ph.D. degrees in Computer Science from Cornell University, Ithaca, NY, in 1985 and 1987, respectively. She was on the faculty of the Department of Computer Science and Engineering, University of Minnesota, Twin Cities, from 1987 to 2005. From 2003 to 2005, she served as a Program Director, the Computing and Communication Foundations Division at the National Science Foundation, Arlington, VA, USA. Since July 2005, she has been a Professor in Division of Computational Science and Engineering, College of Computing, Georgia Institute of Technology, Atlanta, GA. Her current research interests include numerical algorithms, pattern recognition, bioinformatics, information retrieval, and data mining. She has published over 100 research papers in these areas. Prof. Park served on numerous conference committees and editorial boards of journals. Currently she is on the editorial board of BIT Numerical Mathematics, SIAM Journal on Matrix Analysis and Applications, and International Journal of Bioinformatics Research and Applications, and Statistical Analysis and Data Mining. She is the conference co-chair for the SIAM Conference on Data Mining in 2008 and 2009.

    1

    This study was financially supported by research fund of Chungnam National University in 2005.

    2

    This work was supported in part by the National Science Foundation grant CCF-0621889. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation (NSF).

    View full text