Trace ratio criterion based generalized discriminative learning for semi-supervised dimensionality reduction
Highlights
► An efficient ITR-Score algorithm is proposed for solving trace ratio problem. ► ITR-Score based TR-SDA is presented for semi-supervised dimensionality reduction. ► Both discriminative and geometrical structure can be preserved by TR-SDA. ► We examine TR-SDA based on extensive simulations on synthetic and real world datasets. ► TR-SDA delivers significant improvements compared with other state-of-art algorithms.
Introduction
Dealing with high-dimensional data has always been a major problem for pattern recognition and machine learning. Typical applications involving high-dimensional data include face recognition, document categorization and image retrieval. Finding a low-dimensional representation of high-dimensional space, namely dimensionality reduction is thus of great practical importance. The goal of dimensionality reduction is to reduce the complexity of the original space and embed high-dimensional space into a low-dimensional space while keeping most of the desired intrinsic information [1], [2]. The desired information can be discriminative [11], [12], [15], [16], [17], geometrical [1], [2], [13], [14], [46] or both discriminative and geometrical [19], [20], [21], [22], [23]. Among all the dimensionality reduction methods, Linear Discriminant Analysis (LDA) [11], [12] is the most popular method and has been widely used in many classification applications. The goal of LDA is to find the optimal low-dimensional presentation to the original dataset by maximizing between-class scatter matrix Sb, while minimizing within-class scatter matrix Sw. The original formulation of LDA, known as Fisher LDA [11], can only deal with binary-class classification. When solving multi-class classification problem, the basic LDA has to be extended using two main criterions including ratio trace criterion and trace ratio criterion .
In ratio trace or determinant ratio LDA, it is assumed that the within-class scatter matrix is nonsingular. Finding the optimal projection can be solved by generalized eigen-value decomposition (GEVD) [35]. However, trace ratio LDA may confront ill-posed problem when the number of data points is smaller than that of the features [34], [44], [45]. Several variants of ratio trace LDA are proposed to solve this problem such as null-space LDA [25], uncorrelated LDA [26], LDA/GSVD [27], Discriminative Common Vectors [28]. Another widely used criterion of LDA is the trace ratio criterion. Different from the former one, the trace ratio criterion can directly reflect Euclidean distances between data points of inter and intra classes. In addition, the optimal projection obtained by trace ratio LDA is orthogonal, while the one obtained by ratio trace LDA is non-orthogonal. Recently, there has been increasing interest in the issue of finding orthogonal projection for dimensionality reduction methods [29], [30], [31]. As described in [4], when evaluating the similarities between data points based on Euclidean distance, the non-orthogonal projection may put different weights on different projection directions thus changing the similarities, while for orthogonal projection, such similarities can be preserved. Thus trace ratio LDA tends to perform empirically better than ratio trace LDA in many classification problems. In this paper, we will focus on trace ratio LDA. For convenience, in this paper we denote it as TR-LDA.
Solving trace ratio problem of LDA directly has always been a problem, because there is no close-form solution [7]. Several attempts have been proposed to find the optimal solution [3], [4], [5], [6], [7], [8]. Guo et al. [3] has pointed out that the original TR problem can be converted to an equivalent trace difference problem, which can be solved by a heuristic bisection method. Recently, Wang et al. [4] has proposed another efficient algorithm, called ITR algorithm to find the optimal solution based on an iterative procedure, which is faster than the former one. In this paper, we further analyze ITR algorithm, and discuss the drawbacks of its training strategy. We then propose a new efficient algorithm, called ITR-Score algorithm, to improve the original ITR algorithm. The proposed algorithm can be viewed as a greedy strategy to find the optimum of TR problem. Hence it is more efficient than the previous ones.
In general, the TR-LDA is supervised, which means it requires labeled information. Although TR-LDA works pretty well [3], [4], it needs considerable number of labeled data in order to be able to deliver satisfactory results. But in many practical cases, obtaining sufficient number of labeled data for training can be problematic because labeling large number of data is time-consuming and costly. On the other hand, unlabeled data may be abundant and can easily be obtained in the real world. Thus, using semi-supervised learning methods [19], [20], [21], [22], [23], [24], [47], which incorporate both labeled and unlabeled data into learning procedure, has become an effective option instead of only relying on supervised learning. In this paper, we will propose an orthogonal constrained framework for semi-supervised learning. Under such a framework, the TR-LDA can be extended to its corresponding semi-supervised version called trace ratio based semi-supervised discriminant analysis (TR-SDA). Furthermore, through analyzing the relationship between supervised and semi-supervised TR problems, we show that the proposed ITR-Score algorithm can be extended to solve semi-supervised TR problem.
The main contributions of this paper are summarized as follows:
- (1)
As an extended algorithm of TR-LDA, the proposed TR-SDA can find an optimal low-dimensional projection by preserving the discriminative information embedded in the labeled set as well as the geometric information embedded in both labeled and unlabeled set. Also similar to TR-LDA, the optimal projection obtained by TR-SDA is orthogonal that can preserve the similarity between data points without any change if it is based on Euclidean distance.
- (2)
We propose a new method called ITR-Score algorithm to solve supervised and semi-supervised TR problem. By improving the original ITR algorithm both from the initialization and training strategy, the proposed method can converge faster. This indicates that ITR-Score algorithm is more efficient than the ITR algorithm.
- (3)
We propose an orthogonal constrained framework for semi-supervised learning. Under such a framework, the TR-SDA algorithm can be related to several existing semi-supervised algorithms such as SDA [19], Lap-LDA [21], SSMMC [22], SSDR [20]. In short, our algorithm can be viewed as an improved or extended method to these algorithms.
- (4)
The proposed TR-SDA can easily be extended to a nonlinear version using kernel trick [32], [33]. In this paper, we restrict the nonlinear projection to be in an orthogonal basis of high-dimensional Hilbert space. We then perform linear dimensionality reduction based on such basis. Finally, we connect TR-LDA, TR-SDA and their corresponding kernel versions in a unified form.
The rest of this paper is organized as follows: In Section 2, we briefly describe the basic idea of LDA and TR-LDA. We then review the previews work for solving TR problem and propose our improved method. In Section 3, we propose an orthogonal constrained framework for semi-supervised learning. We extend TR-LDA to its corresponding semi-supervised version TR-SDA. In Section 4, we extend our algorithm for solving nonlinear problem using kernel trick. The simulation results are presented in Section 5 and the conclusions are drawn in Section 6.
Section snippets
Trace ratio problem
In this section, we first review the basic idea of Linear Discriminant Analysis. The goal of LDA is to find a linear transformation matrix W∈RD×d, for which the between-class scatter matrix is maximized, while the within-class scatter matrix is minimized. Let be the training set, each xi belongs to a class ci={1,2,…,c}. Let li be the number of data points in ith class, l be the number of data points in all classes, we define the between-class scatter matrix Sb, within-class
Orthogonal constrained semi-supervised learning framework
The algorithms to solve TR problem are all supervised. In order to use unlabeled data points to achieve satisfactory results, there are many works incorporating both labeled and unlabeled set into learning procedure [19], [20], [21], [22], [23]. In this paper, we first introduce a semi-supervised learning framework. Denote representing the whole dataset, is the labeled set corresponding with the labeled matrix and is the unlabeled set, the
Kernelization
The proposed TR-SDA is a linear algorithm. In this section, we will extend it to solve the nonlinear problem using kernel trick [32], [33]. For convenience, we denote the kernel version of TR-SDA as TR-KSDA.
The basic idea of the kernel trick is to map the original data space to a high-dimensional Hilbert space given by ϕ:X→F, then perform linear dimensionality reduction on the new space. Let ϕ(X)={ϕ(x1),ϕ(x2),…,ϕ(xl+u)} be such high-dimensional space, we assume the map can be implicitly
Related work
In the paper, we propose a semi-supervised version of TR-LDA. It is to the best of our knowledge that there are several recently proposed semi-supervised dimensionality reduction methods using the same objectives of this paper [19], [20], [21], [22]. By analyzing the strategy and mechanism of these algorithms, we show that our proposed algorithm is, in fact, an improved or extended method among these algorithms.
Simulations and results
We will evaluate our algorithms with several synthetic datasets and real world datasets. For synthetic datasets, we use 2d Gaussian dataset to show the discriminative boundary learned by our algorithm and a 3d Gaussian dataset to visualize the output set in a 2d reduced space. We also use a two-moon dataset to deal with nonlinear classification problem and show the discriminative boundary learned by our algorithm. In addition, we demonstrate visualization and classification problem on four real
Conclusions
A new efficient algorithm for finding optimal solution of trace ratio problem is proposed. Based on this algorithm, we derive an orthogonal constrained semi-supervised learning framework. We show that the algorithm can be extended for solving corresponding semi-supervised problems. The essence of the proposed algorithm is that it is able to incorporate unlabeled set into a learning procedure for preserving the geometrical structure embedded in both labeled and unlabeled set. Also, the algorithm
Mingbo Zhao is currently pursuing his Ph.D. degree at the Department of Electronic Engineering, City University of Hong Kong, Hong Kong. He received his B.Eng. degree and master degree from the Department of Electronic Engineering, Shanxi University, Shanxi, PR China in 2005 and 2008, respectively. His current interests include data mining, machine learning, pattern recognition, and their applications.
References (46)
- et al.
A generalized Foley–Sammon transform based on generalized fisher discriminant criterion and its application to face recognition
Pattern Recognition Letter
(2003) - et al.
Learning a Mahalanobis distance metric for data clustering and classification
Pattern Recognition
(2008) - et al.
Supervised dimensionality reduction via sequential semi-definite programming
Pattern Recognition
(2008) - et al.
Class label versus sample label-based CCA
Applied Mathematics and Computation
(2007) - et al.
A unified framework for semi-supervised dimensionality reduction
Pattern Recognition
(2008) - et al.
J. Lin and g. Yu, A new LDA-based face recognition system which can solve the small sample size problem
Pattern Recognition
(2000) - et al.
Face recognition based on uncorrelated discriminant transformation
Pattern Recognition
(2001) - et al.
Solving the small sample size problem in face recognition using generalized discriminant analysis
Pattern Recognition
(2006) Orthogonal neighborhood preserving discriminant analysis for face recognition
Pattern Recognition
(2008)- et al.
Why can LDA be performed in PCA transformed space?
Pattern Recognition
(2003)
Outlier-resisting graph embedding
Neurocomputing
Semi-supervised Gaussian process latent variable model with pairwise constraints
Neurocomputing
A global geometric framework for nonlinear dimensionality reduction
Science
Nonlinear dimensionality reduction by locally linear embedding
Science
Trace ratio problem revisited
IEEE Transactions on Neural Network
The use of multiple measurements in taxonomic problems
Annals of Eugenics
Eigenfaces vs. fisherfaces: recognition using class specific linear projection
IEEE Transactions on Pattern analysis and Machine Intelligence
Face recognition using Laplacianfaces
IEEE Transactions on Pattern Analysis and Machine Intelligence
Laplacian eigenmaps for dimensionality reduction and data representation
Neural Computation
Efficient and robust feature extraction by maximum margin criterion
IEEE Transactions on Neural Networks
Cited by (75)
Coordinate Descent Optimized Trace Difference Model for Joint Clustering and Feature Extraction
2024, Pattern RecognitionMulti-Omics and Its Clinical Application in Hepatocellular Carcinoma: Current Progress and Future Opportunities
2021, Chinese Medical Sciences JournalOrthogonal neighborhood preserving discriminant analysis with patch embedding for face recognition
2020, Pattern RecognitionCitation Excerpt :Nie et al. [23] reformulated the objective function of LDA as the loss between sample pairs, and proposed an iterative algorithm to effectively solve the trace ratio optimization problem, which achieved good performance. Similarly, in Zhao et al. [24] and Wang et al. [25], iterative methods were also employed to solve the trace ratio optimization problem instead of transforming it into the ratio trace problem to make an approximation. To maximize separability between classes, discriminant locality preserving projections (DLPP) [26] employed the way of separating the intra-class means between classes.
A robust graph-based semi-supervised sparse feature selection method
2020, Information SciencesCitation Excerpt :To overcome this problem, semi-supervised FE and FS have been presented for exploiting labeled and unlabeled data in dimension reduction process [9,28,34]. Many semi-supervised FE methods [18,41,48] have been proposed based on Linear Discriminant Analysis (LDA) which is a supervised method for feature extraction. The purpose of LDA is to detect a low-dimensional representation of data that maximizes the inter-class scatter and minimizes the intra-class scatter [4].
Mingbo Zhao is currently pursuing his Ph.D. degree at the Department of Electronic Engineering, City University of Hong Kong, Hong Kong. He received his B.Eng. degree and master degree from the Department of Electronic Engineering, Shanxi University, Shanxi, PR China in 2005 and 2008, respectively. His current interests include data mining, machine learning, pattern recognition, and their applications.
Zhao Zhang is currently working towards his Ph.D. degree at the Department of Electronic Engineering, City University of Hong Kong, Hong Kong. He received his B.Eng. (First Hons.) and master degrees from the Department of Computer Science and Technology, Nanjing Forestry University, Nanjing, PR China in 2008 and 2010, respectively. His current interests include machine learning, pattern recognition and computational intelligence.
Tommy W. S. Chow (IEEE M’93–SM’03) received his B.Sc. (First Hons.) and Ph.D. degrees from the University of Sunderland, Sunderland, UK. He joined the City University of Hong Kong, Hong Kong, as a Lecturer in 1988. He is currently a Professor in the Electronic Engineering Department. His research interests are in the area of Machine learning including Supervised and unsupervised learning, Data mining, Pattern recognition and fault diagnostic. He worked for NEI Reyrolle Technology at Hebburn, England developing digital simulator for transient network analyser. He then worked on a research project involving high current density current collection system for superconducting direct current machines, in collaboration with the Ministry of Defense (Navy) at Bath, England and the International Research and Development at Newcastle upon Tyne. He has authored or coauthored of over 120 technical papers in international journals, 5 book chapters, and over 60 technical papers in international conference proceedings.