Elsevier

Neural Networks

Volume 93, September 2017, Pages 205-218
Neural Networks

Robust recursive absolute value inequalities discriminant analysis with sparseness

https://doi.org/10.1016/j.neunet.2017.05.011Get rights and content

Abstract

In this paper, we propose a novel absolute value inequalities discriminant analysis (AVIDA) criterion for supervised dimensionality reduction. Compared with the conventional linear discriminant analysis (LDA), the main characteristics of our AVIDA are robustness and sparseness. By reformulating the generalized eigenvalue problem in LDA to a related SVM-type “concave-convex” problem based on absolute value inequalities loss, our AVIDA is not only more robust to outliers and noises, but also avoids the SSS problem. Moreover, the additional L1-norm regularization term in the objective makes sure sparse discriminant vectors are obtained. A successive linear algorithm is employed to solve the proposed optimization problem, where a series of linear programs are solved. The superiority of our AVIDA is supported by experimental results on artificial examples as well as benchmark image databases.

Introduction

Feature extraction plays an important role in pattern recognition. Extracting good features not only can identify the features that contribute most to classification, but also can enhance the performance of classifiers, especially for high dimensional problems Belhumeur et al. (1997), Deng et al. (2012), Martnez and Kak (2001). Generally, dimensionality reduction (DR) is a powerful tool to achieve the goal of feature extraction. Among various methods, principal component analysis (PCA) Bro and Smilde (2014), Jolliffe and Cadima (2016), Smith (2002), Turk and Pentland (1991) and linear discriminant analysis (LDA) Fisher (1936), Izenman (2013), Scholkopft and Mullert (1999), Wang et al. (2016), Yan et al. (2014) are the most popular approaches for dimensionality reduction. LDA seems more interesting since it uses class information directly and is beneficial to the study of the class relationship between data samples, which is important for multivariate data analysis. In fact, for supervised learning, LDA is regarded as the most fundamental and powerful way. The main idea of LDA is to seek a projection transformation such that the between-class scatter is maximized and meanwhile the within-class scatter is minimized, and in turn results in few discriminative features.

However, it is known that the conventional LDA is sensitive to outliers and noises due to its L2-norm formulation. Comparing to L1-norm distance, which is absolute values operation in essence, L2-norm will exaggerate the influence of outliers or noises to some extent, while the robust property of L1-norm has been addressed and developed already Aanas et al. (2002), Wang et al. (2012), Kwak (2008), la Torre and Black (2003), Li et al. (2015), Li et al. (2016), Jeyakumar et al. (2014). Here, we refer the robust property, or robustness, as the insensitivity to outliers and noises. Moreover, conventional L2-norm LDA may suffer from the small sample size (SSS) problem owing to the difficulty in evaluating the within-class scatter matrix Cai et al. (2008), Pang et al. (2014), Qiao et al. (2008), Wang and Tang (2004), Xiang et al. (2006), when the number of features is much larger than the number of samples. This makes the direct implementation of LDA an intractable task.

To deal with the above problems, some strategies have been presented. One of them is called robust LDA, such as L1-norm based LDA (LDA-L1) Wang et al. (2014), Zhong and Zhang (2013). It has the same formulation of LDA but with L2-norm replaced with L1-norm, which was proved to be more robust than the conventional L2-norm LDA and can avoid the SSS problem. Another one is inspired by the study of support vector machine (SVM). As we know, LDA needs to solve an eigenvalue problem, while some SVM models also need to solve eigenvalue problems, such as generalized eigenvalue proximal support vector machine (GEPSVM) (Mangasarian & Wild, 2006), which has been improved by twin support vector machine (TWSVM) (Khemchandani & Chandra, 2007) and projection twin support vector machine (PTSVM) (Chen, Yang, & Ye, 2011) by reformulating the eigenvalue problems to quadratic programming problem (QPP). Following this point of view, recursive “concave-convex” fisher linear discriminant (RPFLD) (Ye, Zhao, & Zhang, 2012) reformulates LDA to a combination of multiple related SVM-type (Ye et al., 2012) problems, which can avoid the SSS problem. However, both LDA-L1 and RPFLD miss an important characteristic in feature extraction called sparseness Clemmensen et al. (2011), Han and Clemmensen (2016), Shi et al. (2014), which is possessed by the discriminant vectors that have a number of zeros. Therefore, it is encouraging to avoid the SSS problem and implement the sparseness when one designs discriminant analysis model.

In this paper, we propose a novel linear discriminant dimensionality reduction algorithm, named recursive absolute linear discriminant analysis (AVIDA). AVIDA introduces absolute value inequalities loss and L1-norm regulation term into its formulation, and solves SVM-type problems in specific calculation. By minimizing the within-class scatter and maximizing the between-class scatter criteria in the absolute values sense, AVIDA makes the projected center of each class be far from the projected center of the whole training data while simultaneously close to the projected data samples of its own class. In summary, the proposed AVIDA has the following several characteristics:

(i) By introducing the absolute value inequalities loss, AVIDA is insensitive to outliers and noises, which provides a robust alternative to conventional LDA.

(ii) By considering an extra L1-norm regularization term, AVIDA is able to get a sparse solution, and thus can extract the most useful information from the data sets.

(iii) Since AVIDA involves absolute value operation formulations in both its objective and constraints, and an L1-norm regularization term is also added, the objective is neither convex nor differential. Therefore, the optimization problem is difficult to be solved by conventional optimization technique. Here we transfer the problem into a series of SVM-type problems and solve them as linear programming problems. The finite termination of the solving algorithm can be guaranteed theoretically.

(iv) Due to the use of absolute values operation, AVIDA can avoid the SSS problem, and has no rank limit. Therefore, the extracted discriminant directions obtained by AVIDA are not constraint to c1, where c is number of classes, while LDA can extract at most c1 ones.

(v) Experimental results support the sparseness and robustness of our AVIDA, and comparison results with other methods also demonstrate its superiority.

The remainder of the paper is organized as follows. Section 2 briefly dwells on LDA and LDA-L1. Section 3 proposes our AVIDA in the primal space and gives the corresponding theoretical analysis. Section 4 makes comparisons of our AVIDA with LDA, LDA-L1 and other methods. At last, concluding remarks are given in Section 5.

Section snippets

Linear discriminant analysis

In this section, we briefly review the conventional L2-norm linear discriminant analysis. Consider the training set T={x1,x2,,xN} with the associated class labels y1,y2,,yN belonging to {1,2,,c}, where xlRn is a column vector for l=1,2,,N. Assume that the ith class contains Ni samples, i=1,2,,c. Then we have i=1cNi=N. Let x¯=1Nl=1Nxl be the mean of all samples and x¯i=1Nij=1Nixij be the mean of samples in the ith class, where xij denotes the jth element in the ith class.

The main idea

Problem formulation

As pointed in Section 1, casting LDA-L1 problem (6) as a related SVM-type problem may enhance its performance. For this purpose, we maximize the between-class scatter with the constraint of sufficiently small within-class scatter. To allow for some discriminant errors, we obtain the following problem minw,ξiji=1cNi|wT(x¯ix¯)|+νi=1cj=1Niξijs.t.|wT(xijx¯i)|ξijξij0,i=1,2,,c,j=1,2,,Ni,where ξij are slack variables. It can be observed that the optimization problem (7) has similar

Experimental results

In this section, we experimentally evaluate our AVIDA with several existing DR methods, including PCA (Turk & Pentland, 1991), PCA-L1 (Kwak, 2008), LDA (Izenman, 2013), RLDA (Friedman, 1989) and LDA-L1 Zhong and Zhang (2013), Wang et al. (2014). The regularization parameter for RLDA is set to 0.001. For LDA-L1, the learning rate parameter is chosen optimally from the set {0.001,0.01,0.1,1,10,100,1000}. All of our experiments are carried out on a PC machine with P4 2 GHz CPU and 2 GB RAM memory

Conclusion

The paper proposed a novel sparse discriminant analysis criterion based on absolute value inequalities loss, called absolute value inequalities discriminant analysis (AVIDA). The main difference between our AVIDA and the existing LDA-L1 is that the former considers the SVM-type formulation of LDA-L1 with additional sparseness. This makes our method be more robust to outliers and noises, also can extract useful sparse features. The experimental results show the improvement of the proposed AVIDA

Acknowledgments

This work is supported by the National Natural Science Foundation of China (No. 61603338 and No. 11201426), the Zhejiang Provincial Natural Science Foundation of China (No. LQ17F030003 and No. LY15F030013), the Scientific Research Fund of Zhejiang Provincial Education Department (No. Y201534889), the Natural Science Foundation Project of CQ CSTC (cstc2014jcyjA40011) and the Scientific and Technological Research Program of Chongqing Municipal Education Commission (KJ1400513) .

References (41)

  • CaiD. et al.

    SRDA: An efficient algorithm for large-scale discriminant analysis

    IEEE Transactions on Knowledge and Data Engineering

    (2008)
  • ClemmensenL. et al.

    Sparse discriminant analysis

    Technometrics

    (2011)
  • DengN.Y. et al.

    Support Vector Machines: Optimization Based Theory, Algorithms, and Extensions

    (2012)
  • FisherR.A.

    The use of multiple measurements in taxonomic problems

    Annals of Eugenics

    (1936)
  • Frank, A. & Asuncion, A. 2010 UCI Machine Learning Repository.http://archive.ics.uci.edu/ml ...
  • FriedmanJ.H.

    Regularized discriminant analysis

    Journal of the American Statistical Association

    (1989)
  • IzenmanA.J.

    Linear Discriminant Analysis. Modern Multivariate Statistical Techniques

    (2013)
  • JeyakumarV. et al.

    Support vector machine classifiers with uncertain knowledge sets via robust optimization

    Optimization

    (2014)
  • JolliffeI.T. et al.

    Principal component analysis: a review and recent developments

    Philosophical Transactions of the Royal Society A

    (2016)
  • KhemchandaniR. et al.

    Twin support vector machines for pattern classification

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2007)
  • Cited by (15)

    • Generalized two-dimensional linear discriminant analysis with regularization

      2021, Neural Networks
      Citation Excerpt :

      Here we notice that the solving algorithms of LDA-L1 and L12DLDA were based on the gradient ascending (GA) technique of nonconvex surrogate functions whose optimal solutions cannot be guaranteed, and the proper step size was hard to choose in practice. To tackle this problem, various methods were proposed, including the non-greedy iterative algorithms for difference optimization problems (Chen, Yang, & Jin, 2014; Li, Wang et al., 2017; Liu, Gao et al., 2017; Ye, Yang et al., 2018), the convex surrogate technique (Zheng, Lin, & Wang, 2014), the concave–convex procedure (CCCP) (Ye et al., 2012) and the successive linear algorithm (SLA) (Li, Zheng et al., 2017). Though the above improvements were proved to be effective, it should be noted that some of them still exist the singularity problem during practical computation, for example, L1-LDA (Zheng et al., 2014) and recursive “concave-convex” Fisher linear discriminant (RPFLD) (Ye et al., 2012), as pointed out in Chen et al. (2013) and Ye, Yang et al. (2018), respectively.

    • Robust bilateral Lp-norm two-dimensional linear discriminant analysis

      2019, Information Sciences
      Citation Excerpt :

      As a generalization of the L1-norm-based LDA, an Lp-norm-based LDA (LDA-Lp) for p > 0 was considered in [1] and [32], respectively. Recursive “concave-convex” Fisher linear discriminant (RPFLD) [46] and recursive absolute linear discriminant analysis (AVIDA) [22] reformulated LDA into multiple related SVM-type problems and used the absolute value operation, which improved the robustness and avoid the SSS problem. Though LDA and its extensions are proved to be effective, they are vector-based methods.

    View all citing articles on Scopus
    View full text