Robust recursive absolute value inequalities discriminant analysis with sparseness
Introduction
Feature extraction plays an important role in pattern recognition. Extracting good features not only can identify the features that contribute most to classification, but also can enhance the performance of classifiers, especially for high dimensional problems Belhumeur et al. (1997), Deng et al. (2012), Martnez and Kak (2001). Generally, dimensionality reduction (DR) is a powerful tool to achieve the goal of feature extraction. Among various methods, principal component analysis (PCA) Bro and Smilde (2014), Jolliffe and Cadima (2016), Smith (2002), Turk and Pentland (1991) and linear discriminant analysis (LDA) Fisher (1936), Izenman (2013), Scholkopft and Mullert (1999), Wang et al. (2016), Yan et al. (2014) are the most popular approaches for dimensionality reduction. LDA seems more interesting since it uses class information directly and is beneficial to the study of the class relationship between data samples, which is important for multivariate data analysis. In fact, for supervised learning, LDA is regarded as the most fundamental and powerful way. The main idea of LDA is to seek a projection transformation such that the between-class scatter is maximized and meanwhile the within-class scatter is minimized, and in turn results in few discriminative features.
However, it is known that the conventional LDA is sensitive to outliers and noises due to its L2-norm formulation. Comparing to L1-norm distance, which is absolute values operation in essence, L2-norm will exaggerate the influence of outliers or noises to some extent, while the robust property of L1-norm has been addressed and developed already Aanas et al. (2002), Wang et al. (2012), Kwak (2008), la Torre and Black (2003), Li et al. (2015), Li et al. (2016), Jeyakumar et al. (2014). Here, we refer the robust property, or robustness, as the insensitivity to outliers and noises. Moreover, conventional L2-norm LDA may suffer from the small sample size (SSS) problem owing to the difficulty in evaluating the within-class scatter matrix Cai et al. (2008), Pang et al. (2014), Qiao et al. (2008), Wang and Tang (2004), Xiang et al. (2006), when the number of features is much larger than the number of samples. This makes the direct implementation of LDA an intractable task.
To deal with the above problems, some strategies have been presented. One of them is called robust LDA, such as L1-norm based LDA (LDA-L1) Wang et al. (2014), Zhong and Zhang (2013). It has the same formulation of LDA but with L2-norm replaced with L1-norm, which was proved to be more robust than the conventional L2-norm LDA and can avoid the SSS problem. Another one is inspired by the study of support vector machine (SVM). As we know, LDA needs to solve an eigenvalue problem, while some SVM models also need to solve eigenvalue problems, such as generalized eigenvalue proximal support vector machine (GEPSVM) (Mangasarian & Wild, 2006), which has been improved by twin support vector machine (TWSVM) (Khemchandani & Chandra, 2007) and projection twin support vector machine (PTSVM) (Chen, Yang, & Ye, 2011) by reformulating the eigenvalue problems to quadratic programming problem (QPP). Following this point of view, recursive “concave-convex” fisher linear discriminant (RPFLD) (Ye, Zhao, & Zhang, 2012) reformulates LDA to a combination of multiple related SVM-type (Ye et al., 2012) problems, which can avoid the SSS problem. However, both LDA-L1 and RPFLD miss an important characteristic in feature extraction called sparseness Clemmensen et al. (2011), Han and Clemmensen (2016), Shi et al. (2014), which is possessed by the discriminant vectors that have a number of zeros. Therefore, it is encouraging to avoid the SSS problem and implement the sparseness when one designs discriminant analysis model.
In this paper, we propose a novel linear discriminant dimensionality reduction algorithm, named recursive absolute linear discriminant analysis (AVIDA). AVIDA introduces absolute value inequalities loss and L1-norm regulation term into its formulation, and solves SVM-type problems in specific calculation. By minimizing the within-class scatter and maximizing the between-class scatter criteria in the absolute values sense, AVIDA makes the projected center of each class be far from the projected center of the whole training data while simultaneously close to the projected data samples of its own class. In summary, the proposed AVIDA has the following several characteristics:
(i) By introducing the absolute value inequalities loss, AVIDA is insensitive to outliers and noises, which provides a robust alternative to conventional LDA.
(ii) By considering an extra L1-norm regularization term, AVIDA is able to get a sparse solution, and thus can extract the most useful information from the data sets.
(iii) Since AVIDA involves absolute value operation formulations in both its objective and constraints, and an L1-norm regularization term is also added, the objective is neither convex nor differential. Therefore, the optimization problem is difficult to be solved by conventional optimization technique. Here we transfer the problem into a series of SVM-type problems and solve them as linear programming problems. The finite termination of the solving algorithm can be guaranteed theoretically.
(iv) Due to the use of absolute values operation, AVIDA can avoid the SSS problem, and has no rank limit. Therefore, the extracted discriminant directions obtained by AVIDA are not constraint to , where is number of classes, while LDA can extract at most ones.
(v) Experimental results support the sparseness and robustness of our AVIDA, and comparison results with other methods also demonstrate its superiority.
The remainder of the paper is organized as follows. Section 2 briefly dwells on LDA and LDA-L1. Section 3 proposes our AVIDA in the primal space and gives the corresponding theoretical analysis. Section 4 makes comparisons of our AVIDA with LDA, LDA-L1 and other methods. At last, concluding remarks are given in Section 5.
Section snippets
Linear discriminant analysis
In this section, we briefly review the conventional L2-norm linear discriminant analysis. Consider the training set with the associated class labels belonging to , where is a column vector for . Assume that the th class contains samples, . Then we have . Let be the mean of all samples and be the mean of samples in the th class, where denotes the th element in the th class.
The main idea
Problem formulation
As pointed in Section 1, casting LDA-L1 problem (6) as a related SVM-type problem may enhance its performance. For this purpose, we maximize the between-class scatter with the constraint of sufficiently small within-class scatter. To allow for some discriminant errors, we obtain the following problem where are slack variables. It can be observed that the optimization problem (7) has similar
Experimental results
In this section, we experimentally evaluate our AVIDA with several existing DR methods, including PCA (Turk & Pentland, 1991), PCA-L1 (Kwak, 2008), LDA (Izenman, 2013), RLDA (Friedman, 1989) and LDA-L1 Zhong and Zhang (2013), Wang et al. (2014). The regularization parameter for RLDA is set to 0.001. For LDA-L1, the learning rate parameter is chosen optimally from the set . All of our experiments are carried out on a PC machine with P4 2 GHz CPU and 2 GB RAM memory
Conclusion
The paper proposed a novel sparse discriminant analysis criterion based on absolute value inequalities loss, called absolute value inequalities discriminant analysis (AVIDA). The main difference between our AVIDA and the existing LDA-L1 is that the former considers the SVM-type formulation of LDA-L1 with additional sparseness. This makes our method be more robust to outliers and noises, also can extract useful sparse features. The experimental results show the improvement of the proposed AVIDA
Acknowledgments
This work is supported by the National Natural Science Foundation of China (No. 61603338 and No. 11201426), the Zhejiang Provincial Natural Science Foundation of China (No. LQ17F030003 and No. LY15F030013), the Scientific Research Fund of Zhejiang Provincial Education Department (No. Y201534889), the Natural Science Foundation Project of CQ CSTC (cstc2014jcyjA40011) and the Scientific and Technological Research Program of Chongqing Municipal Education Commission (KJ1400513) .
References (41)
- et al.
Recursive projection twin support vector machine via within-class variance minimization
Pattern Recognition
(2011) - et al.
Regularized generalized eigen-decomposition with applications to sparse supervised feature extraction and sparse discriminant analysis
Pattern Recognition
(2016) - et al.
Robust L1-norm two-dimensional linear discriminant analysis
Neural Networks
(2015) - et al.
Least squares recursive projection twin support vector machine for classification[J]
Pattern Recognition
(2012) - et al.
Face recognition by sparse discriminant analysis via joint l-norm minimization
Pattern Recognition
(2014) - et al.
MBLDA: a novel multiple between-class linear discriminant analysis
Information Sciences
(2016) - et al.
Recursive “concave-convex” Fisher Linear Discriminant with applications to face, handwritten digit and terrain recognition
Pattern Recognition
(2012) - et al.
Robust factorization
IEEE Transactions on Pattern Analysis and Machine Intelligence
(2002) - et al.
Eigenfaces vs. fisherfaces: recognition using class specific linear projection
IEEE Transactions on Pattern Analysis and Machine Intelligence
(1997) - et al.
Principal component analysis
Analytical Methods
(2014)
SRDA: An efficient algorithm for large-scale discriminant analysis
IEEE Transactions on Knowledge and Data Engineering
Sparse discriminant analysis
Technometrics
Support Vector Machines: Optimization Based Theory, Algorithms, and Extensions
The use of multiple measurements in taxonomic problems
Annals of Eugenics
Regularized discriminant analysis
Journal of the American Statistical Association
Linear Discriminant Analysis. Modern Multivariate Statistical Techniques
Support vector machine classifiers with uncertain knowledge sets via robust optimization
Optimization
Principal component analysis: a review and recent developments
Philosophical Transactions of the Royal Society A
Twin support vector machines for pattern classification
IEEE Transactions on Pattern Analysis and Machine Intelligence
Cited by (15)
Generalized two-dimensional linear discriminant analysis with regularization
2021, Neural NetworksCitation Excerpt :Here we notice that the solving algorithms of LDA-L1 and L12DLDA were based on the gradient ascending (GA) technique of nonconvex surrogate functions whose optimal solutions cannot be guaranteed, and the proper step size was hard to choose in practice. To tackle this problem, various methods were proposed, including the non-greedy iterative algorithms for difference optimization problems (Chen, Yang, & Jin, 2014; Li, Wang et al., 2017; Liu, Gao et al., 2017; Ye, Yang et al., 2018), the convex surrogate technique (Zheng, Lin, & Wang, 2014), the concave–convex procedure (CCCP) (Ye et al., 2012) and the successive linear algorithm (SLA) (Li, Zheng et al., 2017). Though the above improvements were proved to be effective, it should be noted that some of them still exist the singularity problem during practical computation, for example, L1-LDA (Zheng et al., 2014) and recursive “concave-convex” Fisher linear discriminant (RPFLD) (Ye et al., 2012), as pointed out in Chen et al. (2013) and Ye, Yang et al. (2018), respectively.
Robust Bhattacharyya bound linear discriminant analysis through an adaptive algorithm
2019, Knowledge-Based SystemsRobust bilateral Lp-norm two-dimensional linear discriminant analysis
2019, Information SciencesCitation Excerpt :As a generalization of the L1-norm-based LDA, an Lp-norm-based LDA (LDA-Lp) for p > 0 was considered in [1] and [32], respectively. Recursive “concave-convex” Fisher linear discriminant (RPFLD) [46] and recursive absolute linear discriminant analysis (AVIDA) [22] reformulated LDA into multiple related SVM-type problems and used the absolute value operation, which improved the robustness and avoid the SSS problem. Though LDA and its extensions are proved to be effective, they are vector-based methods.
D-FCM: Density based fuzzy c-means clustering algorithm with application in medical image segmentation
2017, Procedia Computer ScienceCapped norm linear discriminant analysis and its applications
2023, Applied Intelligence