Recursive Dimension Reduction for semisupervised learning

doi:10.1016/j.neucom.2015.06.062

Neurocomputing

Volume 171, 1 January 2016, Pages 1629-1636

https://doi.org/10.1016/j.neucom.2015.06.062 Get rights and content

Abstract

Semisupervised Dimension Reduction (SDR) Using Trace Ratio Criterion (TR-FSDA) is an effective iterative SDR algorithm, which introduces a flexible regularization term ${‖ F - X^{T} W ‖}^{2}$ to relax such a hard linear constraint in SDA that the low-dimensional representation $F$ is constrained to lie in the linear subspace spanned by the data matrix $X$ . We, however, observe that TR-FSDA may take some meaningless features in the iteration and cannot be always guaranteed to converge. In this paper, we propose a novel method for SDR, referred to as Recursive Dimension Reduction for Semisupervised Learning (RDS). Instead of solving the non-trivial TR problem using the iterative algorithm of TR-FSDA, we solve the objective function of TR-FSDA using a newly-developed recursive procedure. In each iteration, only a projection vector and a one-dimensional data representation are produced by solving a standard Rayleigh Quotient problem. Our algorithm escapes from the convergence guarantee, since it directly solves the objective and requires no any iterative strategy in finding each of the projection vectors. The experiments on four face databases, one object database, one shape image database, and one Handwritten Digit database demonstrate the effectiveness of RDS.

Introduction

DIMENSION reduction (DR) has attracted much attention in pattern recognition, computer vision, etc., since it can effectively address the so-called “curse of dimensionality” problem. Two of the most well-known DR algorithms are Principal Component Analysis (PCA) [1] and Fisher Linear Discriminant (FLD) [2]. In addition, by considering different aspects of DR, different algorithms have been developed [1], [2], [3], [4], [5], [6], [7], [8], [40], [41], [42], [43], e.g., to yield nonnegative projection [40], [41] by using nonnegative matrix factorization and to improve discrimination by using the parallel vector field embedding algorithm [42], [43].

PCA and FLD are two linear subspace learning algorithms, which, however, cannot discover the essential data structures that are nonlinear. Recently, a number of manifold-based learning techniques, e.g., Isometric Feature Mapping (ISOMAP) [4], Local Linear Embedding (LLE) [5], and Laplacian Eigenmap (LE) [6] are developed to resolve this problem. The central idea of manifold learning is to find an intrinsic low-dimensional embedding of data. However, these classical approaches cannot map new coming samples. In order to solve this problem, He et al., proposed Locality Preserving Projections (LPP) [7]. Yang et al. [8] proposed a “classification-oriented” technique, called Unsupervised Discriminant Projection (UDP). Yan et al. [9] recently proposed a general formulation known as graph embedding providing a unified formulation of a broad set of DR techniques.

For supervised learning, we may require collecting a large number of data points. Directly labeling these data is not only time-consuming but also expensive. Furthermore, supervised learning algorithms may not obtain the promising results when label information is not sufficient. Therefore, semisupervised learning plays an important role to solve these problems. In the literature, many semisupervised learning algorithms for classification have been developed, e.g., Transductive SVM (TSVM) [10], [11], and graph-based semi-supervised learning algorithms [12], [13], [14], [15]. Among them, GFHF [12] and LGC [13] are two label propagation approaches designed for predicting the labels of unlabeled data in the training set, which, however, cannot deal with new coming data. The linear LapRLS [14] can be viewed as an “out-of-sample” extension of LGC/GFHF. The recent years have witnessed numerous researching activities on semisupervised DR [16], [17], [18], [19], [20] for different tasks. However, these algorithms suffer from such a constraint that the low-dimensional data representation $F$ is constrained to lie within the linear space spanned by all the training samples. To relax this hard linear constraint, Nie et al. developed Flexible Manifold Embedding (FME) [21], which is a multidimensional extension of LapRLS. In the research [22], Nie et al. proposed Semisupervised Dimensionality Reduction via Virtual Label Regression (VLR), which can be viewed as a formulation of two-step FME with outlier detection.

Considering that in Semisupervised Discriminant Analysis (SDA) [19], [20], the manifold smoothness term is introduced in the objective function of FLD and the low-dimensional data representation $F$ is constrained to be in the space spanned by all the training samples, Semisupervised Dimension Reduction Using Trace Ratio Criterion (TR-FSDA) is proposed to relax this constraint by modeling the mismatch between $F$ and $h (X) = X^{T} W$ [23]. To solve the resulted non-trivial Trace Ratio (TR) optimization problem, an iterative algorithm is designed to simultaneously find $F$ and $W$ . TR-FSDA has demonstrated the promising results for different recognition tasks.

In this paper, we target to solve the problem for semisupervised DR using a well-designed recursive procedure. Our algorithm is based on TR-FSDA. Despite the exhibited performance advantage, TR-FSDA suffers from two restrictions. Firstly, the meaningful discriminant projection vectors may not be correctly found. TR-FSDA adopts an iterative algorithm to address the non-trivial TR optimization problem. At each iteration, it solves a problem similar to Maximum Margin Criterion (MMC) [24] which optimizes the problem $\max_{W, W^{T} W = I} t r (W^{T} A W - W^{T} B W)$ , where $t r (\cdot)$ denotes the trace, and $A$ and $B$ represent graph relationships that represent different types of information of data [9], respectively. As claimed in [24], the most meaningful discriminant projection vectors should be selected as the eigenvectors corresponding to the eigen values more than or equal to zeros of matrix $A - B$ . TR-FSDA does not take into account this problem and arbitrarily chooses the number of the discriminant projection axes at each iteration, such that the used meaningless discriminant projection vectors from the previous iterations may affect the generation of optimal discriminant projection vectors in the following iterations. Second, the convergence cannot be guranteed. At the $t th$ iteration, TR-FSDA requires the calculation of matrix $Z_{t} = {(λ_{t} + λ_{t} {\bar{L}}_{a} - {\bar{L}}_{b})}^{- 1}$ , where $λ_{t}$ is the TR value calculated from the projection matrix of the previous step, and the definitions on ${\bar{L}}_{a}$ and ${\bar{L}}_{b}$ can be found in [23]. $Z_{t}$ must be positive to ensure the convergence of TR-FSDA, which is not always true in real applications and depends on the values of the parameters involved in TR-FSDA. In order to solve this problem, Huang et al., [23] proposed to ignore the parameter combination that causes the non-convergence of TR-FSDA. However, doing so is very absurd, since the parameters are used to balance different terms to improve the recognition results. Thus, TR-FSDA may achieve undesired results. To address these problems, a new recursive procedure is designed to calculate $F$ and $W$ . At each iteration, our approach is to solve a Rayleigh Quotient rather than non-trivial TR optimal problem and thus has no problem existed in TR-FSDA.

Section snippets

TR-FSDA revisited

Given a data matrix $X = [x_{1}, x_{2}, .. ., x_{l}, x_{l + 1}, x_{l + 2}, .. ., x_{n}] \in R^{d \times n}$ , the first $l$ points $x_{i}$ ( $i \leq l$ ) are labeled and the remaining $u$ points are unlabeled. The label for the labeled point $x_{i}$ is defined as $y_{i} = {1, 2, .. ., C}$ in which $C$ denotes the number of classes. We also define a linear regression function $h (X) = X^{T} W$ , where $W \in R^{d \times r}$ is the projection matrix and $r$ is the dimensionality of lower-dimensional subspace. We construct the following graph using the popular method: if $x_{i}$ is in the $k$ -neighbors of $x_{j}$ or $x_{j}$

RDS

In this section, we develop a new approach, called Recursive Dimension Reduction for Semisupervised Learning (RDS), which uses a novel recursive procedure to extract the discriminant projection vectors.

Experiments

We evaluate our algorithm on four face databases UMIST [31], ORL [32], YALE [33], and FERET [34], a shape image database MPEG-7 [35], an object database COIL20 [36], and a Handwritten Digit (HD) database [37]. In [22], the authors have inferred that COIL20 and UMIST have a clear manifold. Table 1 describes the details for each database used in the experiments. For UMIST, MPEG-7, COIL20, and HD databases, 50% of samples per class and the remaining samples are randomly selected as the training

Conclusion

The primary goal of TR-FSDA is to better cope with the data sampled from a certain type of nonlinear manifold that is somewhat close to a linear subspace by relaxing the hard linear constraint $F = X^{T} W$ in SDA. TR-FSDA adopts an iterative algorithm to solve the optimization problem. However, the matrix $Z_{t}$ at each iteration requires being of positive-definiteness to ensure the convergence of TR-FSDA, which is not true in real applications. Furthermore, the difference formulation in TR-FSDA may lead

Acknowledgment

The authors are extremely thankful to Scientific Research Foundation for Advanced Talents and Returned Overseas Scholars of Nanjing Forestry University (163070679), Natural Science Foundation of the Jiangsu Higher Education Institutions of China (14KJB520018), Natural Science Foundation of Jiangsu Province of China (BK2012399), Practice Innovation Training Program Projects for Jiangsu College Students, and Natural Science Foundations of China (61101197, 61401214, and 61402192) for support.

Qiaolin Ye received the BS degree in Computer Science from Nanjing Institute of Technology, Nanjing, China, in 2007, the MS degree in Computer Science and Technology from Nanjing Forestry University, Jiangsu, China, in 2009, and the Ph.D. degree in Pattern Recognition and Intelligence System from Nanjing University of Science and Technology, Jiangsu, China, in 2013.

He is currently an associate professor with the computer science department at the Nanjing Forestry University, Nanjing, China. He

References (43)

F.P. Nie et al.
Semi-supervised orthogonal discriminant analysis via label propagation
Pattern Recog.
(2009)
C.P. Hou et al.
Multiple view semi-supervised dimensionality reduction
Pattern Recog.
(2010)
Y. Song et al.
A unified framework for semisupervised dimensionality reduction
Pattern Recog.
(2008)
X. Chen et al.
Recursive projection twin support vector machine via within-class variance minimization
Pattern Recog.
(2011)
M. Turk, and A.P. Pentland, Face Recognition Using Eigenfaces, in: Proceedings of the IEEE Conference Computer Vision...
P.N. Belhumeur et al.
Eigenfaces vs. fisherfaces: recognition using class specific linear projection
IEEE Trans. Pattern Anal. Mach. Intell.
(1997)
D.L. Swets et al.
Using discriminant eigenfeatures for image retrieval
IEEE Trans. Pattern Anal. Mach. Intell.
(1996)
J.B. Tenenbaum et al.
A global geometric framework for nonlinear dimensionality reduction
Science
(2000)
S.T. Roweis et al.
Nonlinear dimensionality reduction by locally linear embedding
Science
(2000)
M. Belkin et al.
Laplacian eigenmaps for dimensionality reduction and data representation
Neural Comput.
(2003)

C. Xiang et al.

Face recognition using recursive Fisher linear discriminant

IEEE Trans. Image Process

(2006)

J. Yang et al.

Globally maximizing, locally minimizing: unsupervised discriminant projection with applications to face and palm biometrics

IEEE Trans. Pattern Anal. Mach. Intell.

(2007)

S. Yan et al.

Graph embedding and extensions: a general framework for dimensionality reduction

IEEE Trans. Pattern Anal. Mach. Intell.

(2007)

V. Vapnik

Statistical Learning Theory

(1998)

R. Collobert et al.

Large scale transductive SVMs

J. Mach. Learn. Res.

(2006)

X. Zhu et al.

Semi-supervised learning using Gaussian fields and harmonic functions

Proc. ICML

(2003)

D. Zhou et al.

Learning with local and global consistency

Proc. NIPS

(2004)

M. Belkin et al.

Manifold regularization: a geometric framework for learning from examples

J. Mach. Learn. Res.

(2006)

V. Sindhwani et al.

Beyond the point cloud: from transductive to semi-supervised learning

Proc. Int. Conf. Mach. Learn.

(2005)

S.M. Xiang et al.

Nonlinear dimensionality reduction with local spline embedding

IEEE Trans. Knowl. Data Eng.

(2009)

D. Cai et al.

Semi-supervised discriminant analysis

Proc. ICCV

(2007)

Cited by (1)

Matrix entropy driven maximum margin feature learning
2018, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

He is currently an associate professor with the computer science department at the Nanjing Forestry University, Nanjing, China. He has authored more than 30 scientific papers in pattern recognition, machine learning, and data mining. His research interests include machine learning, data mining, and pattern recognition.

T.M. Yin received the B.S. degree in forestry and the Ph.D degree in genetics and molecular biology from Nanjing Forestry University, Jiangsu, China. Dr. Yin’s main research interests focus on genomics, gene function, and molecular breeding of woody plants. His representative achievements include: (1) contribution towards construction of the genetic platforms for tree genomic studies, (2) mapping and cloning of genes underlying important traits in woody plants, (3) development of genetic tools and marker resources for applicability of the sequenced poplar genome to studies of alternate poplar genotypes and species and (4) discovery on the genetic mechanism triggering the evolution process from hermephordites to diecious plants and genomic proofs for parapatric speciation.

In 2011, Dr. Yin won the Outstanding Young Scientist Fund of Natural Science Fund of China. In 2010, Dr. Yin was nominated as one of the top ten outstanding young scientists in Jiangsu province of China. In 2008, he was the awardee of the Cheung Kong Scholars Program of China. The other honors recognized for Dr. Yin include distinguished contributor and awardee for Science and Technology Development at Oak Ridge National Lab, Department of Energy, U.S.A.; awardee of New Century Excellent Talents Program of China; awardee of Jubilee Award issued by International Fund of Sweden. Contributing editor for book Tree Genetics and Breeding, which won the national second prize for Excellent Scientific and Technical Books. Dr. Yin is an active reviewer for some famous international journals, such as Genome Research, New Phytologist, Molecular Breeding etc. He also serves as the academic editor for PLosOne.

Shangbing Gao received the BS degree in mathematics from the Northwestern Polytechnical University in 2003. He received the MS degree in applied mathematics from the Nanjing University of Information and Science and Technology in 2006. He is now working at Huaiyin institute of technology as an assistant lecturer. He is currently pursuing the Ph.D. degree with School of Computer Science and Technology, Nanjing University of Science and Technology (NUST). He is on the subject of pattern recognition and intelligence systems. His current research interests include pattern recognition and computer vision.

View full text

Brief PapersRecursive Dimension Reduction for semisupervised learning

Abstract

Introduction

Section snippets

TR-FSDA revisited

RDS

Experiments

Conclusion

Acknowledgment

Pattern Recog.

Pattern Recog.

Pattern Recog.

Pattern Recog.

Eigenfaces vs. fisherfaces: recognition using class specific linear projection

IEEE Trans. Pattern Anal. Mach. Intell.

Using discriminant eigenfeatures for image retrieval

IEEE Trans. Pattern Anal. Mach. Intell.

A global geometric framework for nonlinear dimensionality reduction

Science

Nonlinear dimensionality reduction by locally linear embedding

Science

Laplacian eigenmaps for dimensionality reduction and data representation

Neural Comput.

Face recognition using recursive Fisher linear discriminant

IEEE Trans. Image Process

Globally maximizing, locally minimizing: unsupervised discriminant projection with applications to face and palm biometrics

IEEE Trans. Pattern Anal. Mach. Intell.

Graph embedding and extensions: a general framework for dimensionality reduction

IEEE Trans. Pattern Anal. Mach. Intell.

Statistical Learning Theory

Large scale transductive SVMs

J. Mach. Learn. Res.

Semi-supervised learning using Gaussian fields and harmonic functions

Proc. ICML

Learning with local and global consistency

Proc. NIPS

Manifold regularization: a geometric framework for learning from examples

J. Mach. Learn. Res.

Beyond the point cloud: from transductive to semi-supervised learning

Proc. Int. Conf. Mach. Learn.

Nonlinear dimensionality reduction with local spline embedding

IEEE Trans. Knowl. Data Eng.

Semi-supervised discriminant analysis

Proc. ICCV

Brief Papers
Recursive Dimension Reduction for semisupervised learning