Elsevier

Pattern Recognition

Volume 45, Issue 4, April 2012, Pages 1482-1499
Pattern Recognition

Trace ratio criterion based generalized discriminative learning for semi-supervised dimensionality reduction

https://doi.org/10.1016/j.patcog.2011.10.008Get rights and content

Abstract

Dealing with high-dimensional data has always been a major problem in many pattern recognition and machine learning applications. Trace ratio criterion is a criterion that can be applicable to many dimensionality reduction methods as it directly reflects Euclidean distance between data points of within or between classes. In this paper, we analyze the trace ratio problem and propose a new efficient algorithm to find the optimal solution. Based on the proposed algorithm, we are able to derive an orthogonal constrained semi-supervised learning framework. The new algorithm incorporates unlabeled data into training procedure so that it is able to preserve the discriminative structure as well as geometrical structure embedded in the original dataset. Under such a framework, many existing semi-supervised dimensionality reduction methods such as SDA, Lap-LDA, SSDR, SSMMC, can be improved using our proposed framework, which can also be used to formulate a corresponding kernel framework for handling nonlinear problems. Theoretical analysis indicates that there are certain relationships between linear and nonlinear methods. Finally, extensive simulations on synthetic dataset and real world dataset are presented to show the effectiveness of our algorithms. The results demonstrate that our proposed algorithm can achieve great superiority to other state-of-art algorithms.

Highlights

► An efficient ITR-Score algorithm is proposed for solving trace ratio problem. ► ITR-Score based TR-SDA is presented for semi-supervised dimensionality reduction. ► Both discriminative and geometrical structure can be preserved by TR-SDA. ► We examine TR-SDA based on extensive simulations on synthetic and real world datasets. ► TR-SDA delivers significant improvements compared with other state-of-art algorithms.

Introduction

Dealing with high-dimensional data has always been a major problem for pattern recognition and machine learning. Typical applications involving high-dimensional data include face recognition, document categorization and image retrieval. Finding a low-dimensional representation of high-dimensional space, namely dimensionality reduction is thus of great practical importance. The goal of dimensionality reduction is to reduce the complexity of the original space and embed high-dimensional space into a low-dimensional space while keeping most of the desired intrinsic information [1], [2]. The desired information can be discriminative [11], [12], [15], [16], [17], geometrical [1], [2], [13], [14], [46] or both discriminative and geometrical [19], [20], [21], [22], [23]. Among all the dimensionality reduction methods, Linear Discriminant Analysis (LDA) [11], [12] is the most popular method and has been widely used in many classification applications. The goal of LDA is to find the optimal low-dimensional presentation to the original dataset by maximizing between-class scatter matrix Sb, while minimizing within-class scatter matrix Sw. The original formulation of LDA, known as Fisher LDA [11], can only deal with binary-class classification. When solving multi-class classification problem, the basic LDA has to be extended using two main criterions including ratio trace criterion maxWTr[(WTSbW)1(WTSwW)] and trace ratio criterion maxWTW=I((Tr(WTSbW)/(Tr(WTSwW))).

In ratio trace or determinant ratio LDA, it is assumed that the within-class scatter matrix is nonsingular. Finding the optimal projection can be solved by generalized eigen-value decomposition (GEVD) [35]. However, trace ratio LDA may confront ill-posed problem when the number of data points is smaller than that of the features [34], [44], [45]. Several variants of ratio trace LDA are proposed to solve this problem such as null-space LDA [25], uncorrelated LDA [26], LDA/GSVD [27], Discriminative Common Vectors [28]. Another widely used criterion of LDA is the trace ratio criterion. Different from the former one, the trace ratio criterion can directly reflect Euclidean distances between data points of inter and intra classes. In addition, the optimal projection obtained by trace ratio LDA is orthogonal, while the one obtained by ratio trace LDA is non-orthogonal. Recently, there has been increasing interest in the issue of finding orthogonal projection for dimensionality reduction methods [29], [30], [31]. As described in [4], when evaluating the similarities between data points based on Euclidean distance, the non-orthogonal projection may put different weights on different projection directions thus changing the similarities, while for orthogonal projection, such similarities can be preserved. Thus trace ratio LDA tends to perform empirically better than ratio trace LDA in many classification problems. In this paper, we will focus on trace ratio LDA. For convenience, in this paper we denote it as TR-LDA.

Solving trace ratio problem of LDA directly has always been a problem, because there is no close-form solution [7]. Several attempts have been proposed to find the optimal solution [3], [4], [5], [6], [7], [8]. Guo et al. [3] has pointed out that the original TR problem can be converted to an equivalent trace difference problem, which can be solved by a heuristic bisection method. Recently, Wang et al. [4] has proposed another efficient algorithm, called ITR algorithm to find the optimal solution based on an iterative procedure, which is faster than the former one. In this paper, we further analyze ITR algorithm, and discuss the drawbacks of its training strategy. We then propose a new efficient algorithm, called ITR-Score algorithm, to improve the original ITR algorithm. The proposed algorithm can be viewed as a greedy strategy to find the optimum of TR problem. Hence it is more efficient than the previous ones.

In general, the TR-LDA is supervised, which means it requires labeled information. Although TR-LDA works pretty well [3], [4], it needs considerable number of labeled data in order to be able to deliver satisfactory results. But in many practical cases, obtaining sufficient number of labeled data for training can be problematic because labeling large number of data is time-consuming and costly. On the other hand, unlabeled data may be abundant and can easily be obtained in the real world. Thus, using semi-supervised learning methods [19], [20], [21], [22], [23], [24], [47], which incorporate both labeled and unlabeled data into learning procedure, has become an effective option instead of only relying on supervised learning. In this paper, we will propose an orthogonal constrained framework for semi-supervised learning. Under such a framework, the TR-LDA can be extended to its corresponding semi-supervised version called trace ratio based semi-supervised discriminant analysis (TR-SDA). Furthermore, through analyzing the relationship between supervised and semi-supervised TR problems, we show that the proposed ITR-Score algorithm can be extended to solve semi-supervised TR problem.

The main contributions of this paper are summarized as follows:

  • (1)

    As an extended algorithm of TR-LDA, the proposed TR-SDA can find an optimal low-dimensional projection by preserving the discriminative information embedded in the labeled set as well as the geometric information embedded in both labeled and unlabeled set. Also similar to TR-LDA, the optimal projection obtained by TR-SDA is orthogonal that can preserve the similarity between data points without any change if it is based on Euclidean distance.

  • (2)

    We propose a new method called ITR-Score algorithm to solve supervised and semi-supervised TR problem. By improving the original ITR algorithm both from the initialization and training strategy, the proposed method can converge faster. This indicates that ITR-Score algorithm is more efficient than the ITR algorithm.

  • (3)

    We propose an orthogonal constrained framework for semi-supervised learning. Under such a framework, the TR-SDA algorithm can be related to several existing semi-supervised algorithms such as SDA [19], Lap-LDA [21], SSMMC [22], SSDR [20]. In short, our algorithm can be viewed as an improved or extended method to these algorithms.

  • (4)

    The proposed TR-SDA can easily be extended to a nonlinear version using kernel trick [32], [33]. In this paper, we restrict the nonlinear projection to be in an orthogonal basis of high-dimensional Hilbert space. We then perform linear dimensionality reduction based on such basis. Finally, we connect TR-LDA, TR-SDA and their corresponding kernel versions in a unified form.

The rest of this paper is organized as follows: In Section 2, we briefly describe the basic idea of LDA and TR-LDA. We then review the previews work for solving TR problem and propose our improved method. In Section 3, we propose an orthogonal constrained framework for semi-supervised learning. We extend TR-LDA to its corresponding semi-supervised version TR-SDA. In Section 4, we extend our algorithm for solving nonlinear problem using kernel trick. The simulation results are presented in Section 5 and the conclusions are drawn in Section 6.

Section snippets

Trace ratio problem

In this section, we first review the basic idea of Linear Discriminant Analysis. The goal of LDA is to find a linear transformation matrix WRD×d, for which the between-class scatter matrix is maximized, while the within-class scatter matrix is minimized. Let X={x1,x2,,xl}RD×l be the training set, each xi belongs to a class ci={1,2,…,c}. Let li be the number of data points in ith class, l be the number of data points in all classes, we define the between-class scatter matrix Sb, within-class

Orthogonal constrained semi-supervised learning framework

The algorithms to solve TR problem are all supervised. In order to use unlabeled data points to achieve satisfactory results, there are many works incorporating both labeled and unlabeled set into learning procedure [19], [20], [21], [22], [23]. In this paper, we first introduce a semi-supervised learning framework. Denote X={Xl,Xu} representing the whole dataset, Xl={xi}i=1l is the labeled set corresponding with the labeled matrix Y={yi}i=1l and Xu={xi}i=ll+u is the unlabeled set, the

Kernelization

The proposed TR-SDA is a linear algorithm. In this section, we will extend it to solve the nonlinear problem using kernel trick [32], [33]. For convenience, we denote the kernel version of TR-SDA as TR-KSDA.

The basic idea of the kernel trick is to map the original data space to a high-dimensional Hilbert space given by ϕ:XF, then perform linear dimensionality reduction on the new space. Let ϕ(X)={ϕ(x1),ϕ(x2),…,ϕ(xl+u)} be such high-dimensional space, we assume the map can be implicitly

Related work

In the paper, we propose a semi-supervised version of TR-LDA. It is to the best of our knowledge that there are several recently proposed semi-supervised dimensionality reduction methods using the same objectives of this paper [19], [20], [21], [22]. By analyzing the strategy and mechanism of these algorithms, we show that our proposed algorithm is, in fact, an improved or extended method among these algorithms.

Simulations and results

We will evaluate our algorithms with several synthetic datasets and real world datasets. For synthetic datasets, we use 2d Gaussian dataset to show the discriminative boundary learned by our algorithm and a 3d Gaussian dataset to visualize the output set in a 2d reduced space. We also use a two-moon dataset to deal with nonlinear classification problem and show the discriminative boundary learned by our algorithm. In addition, we demonstrate visualization and classification problem on four real

Conclusions

A new efficient algorithm for finding optimal solution of trace ratio problem is proposed. Based on this algorithm, we derive an orthogonal constrained semi-supervised learning framework. We show that the algorithm can be extended for solving corresponding semi-supervised problems. The essence of the proposed algorithm is that it is able to incorporate unlabeled set into a learning procedure for preserving the geometrical structure embedded in both labeled and unlabeled set. Also, the algorithm

Mingbo Zhao is currently pursuing his Ph.D. degree at the Department of Electronic Engineering, City University of Hong Kong, Hong Kong. He received his B.Eng. degree and master degree from the Department of Electronic Engineering, Shanxi University, Shanxi, PR China in 2005 and 2008, respectively. His current interests include data mining, machine learning, pattern recognition, and their applications.

References (46)

  • Y. Pang et al.

    Outlier-resisting graph embedding

    Neurocomputing

    (2010)
  • X. Wang et al.

    Semi-supervised Gaussian process latent variable model with pairwise constraints

    Neurocomputing

    (2010)
  • J.B. Tenenbaum et al.

    A global geometric framework for nonlinear dimensionality reduction

    Science

    (2000)
  • S. Roweis et al.

    Nonlinear dimensionality reduction by locally linear embedding

    Science

    (2000)
  • H. Wang, S. Yan, d. Xu, X. Tang, T. Huang, Trace ratio vs. ratio trace for dimensionality reduction, in: Proceedings of...
  • Y. Jia et al.

    Trace ratio problem revisited

    IEEE Transactions on Neural Network

    (2009)
  • S. Yan, X. Tang, Trace quotient problem revisited, in: Proceedings of ECCV, 2006, pp....
  • J. Ye, Least square linear discriminant analysis, in: Proceedings of ICML,...
  • R.A. Fisher

    The use of multiple measurements in taxonomic problems

    Annals of Eugenics

    (1936)
  • P.N. Belhumeur. et al.

    Eigenfaces vs. fisherfaces: recognition using class specific linear projection

    IEEE Transactions on Pattern analysis and Machine Intelligence

    (1997)
  • X. He et al.

    Face recognition using Laplacianfaces

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2005)
  • M. Belkin et al.

    Laplacian eigenmaps for dimensionality reduction and data representation

    Neural Computation

    (2003)
  • H. Li et al.

    Efficient and robust feature extraction by maximum margin criterion

    IEEE Transactions on Neural Networks

    (2006)
  • Cited by (75)

    • Orthogonal neighborhood preserving discriminant analysis with patch embedding for face recognition

      2020, Pattern Recognition
      Citation Excerpt :

      Nie et al. [23] reformulated the objective function of LDA as the loss between sample pairs, and proposed an iterative algorithm to effectively solve the trace ratio optimization problem, which achieved good performance. Similarly, in Zhao et al. [24] and Wang et al. [25], iterative methods were also employed to solve the trace ratio optimization problem instead of transforming it into the ratio trace problem to make an approximation. To maximize separability between classes, discriminant locality preserving projections (DLPP) [26] employed the way of separating the intra-class means between classes.

    • A robust graph-based semi-supervised sparse feature selection method

      2020, Information Sciences
      Citation Excerpt :

      To overcome this problem, semi-supervised FE and FS have been presented for exploiting labeled and unlabeled data in dimension reduction process [9,28,34]. Many semi-supervised FE methods [18,41,48] have been proposed based on Linear Discriminant Analysis (LDA) which is a supervised method for feature extraction. The purpose of LDA is to detect a low-dimensional representation of data that maximizes the inter-class scatter and minimizes the intra-class scatter [4].

    View all citing articles on Scopus

    Mingbo Zhao is currently pursuing his Ph.D. degree at the Department of Electronic Engineering, City University of Hong Kong, Hong Kong. He received his B.Eng. degree and master degree from the Department of Electronic Engineering, Shanxi University, Shanxi, PR China in 2005 and 2008, respectively. His current interests include data mining, machine learning, pattern recognition, and their applications.

    Zhao Zhang is currently working towards his Ph.D. degree at the Department of Electronic Engineering, City University of Hong Kong, Hong Kong. He received his B.Eng. (First Hons.) and master degrees from the Department of Computer Science and Technology, Nanjing Forestry University, Nanjing, PR China in 2008 and 2010, respectively. His current interests include machine learning, pattern recognition and computational intelligence.

    Tommy W. S. Chow (IEEE M’93–SM’03) received his B.Sc. (First Hons.) and Ph.D. degrees from the University of Sunderland, Sunderland, UK. He joined the City University of Hong Kong, Hong Kong, as a Lecturer in 1988. He is currently a Professor in the Electronic Engineering Department. His research interests are in the area of Machine learning including Supervised and unsupervised learning, Data mining, Pattern recognition and fault diagnostic. He worked for NEI Reyrolle Technology at Hebburn, England developing digital simulator for transient network analyser. He then worked on a research project involving high current density current collection system for superconducting direct current machines, in collaboration with the Ministry of Defense (Navy) at Bath, England and the International Research and Development at Newcastle upon Tyne. He has authored or coauthored of over 120 technical papers in international journals, 5 book chapters, and over 60 technical papers in international conference proceedings.

    View full text