Learning from normalized local and global discriminative information for semi-supervised regression and dimensionality reduction

doi:10.1016/j.ins.2015.06.021

Information Sciences

Volume 324, 10 December 2015, Pages 286-309

https://doi.org/10.1016/j.ins.2015.06.021 Get rights and content

Abstract

Semi-supervised dimensionality reduction is one of the important topics in pattern recognition and machine learning. During the past decade, Laplacian Regularized Least Square (LapRLS) and Semi-supervised Discriminant Analysis (SDA) are the two widely-used semi-supervised dimensionality reduction methods. In this paper, we show that SDA and LapRLS can be unified into a constrained manifold regularized least square framework. The manifold term, however, cannot fully utilize the underlying discriminative information. We thus introduce a new and effective semi-supervised dimensionality reduction method, called Learning from Local and Global Information (LLGDI), to solve the problem. The proposed LLGDI method adopts a set of local classification functions to preserve both local geometrical and discriminative information of dataset. It also adopts a global classification function to preserve the global discriminative information, and an uncorrelated constraint to calculate the projection matrix for simultaneously solving regression and dimensionality reduction problem. As a result, the LLGDI method is able to preserve local discriminative, manifold information as well as the global discriminative information. Theoretical analysis and extensive simulations presented in the paper show the effectiveness of the LLGDI algorithm. The results also demonstrate LLGDI can achieve superior performance compared with other existing methods.

Introduction

Dealing with high-dimensional data has always been a major problem with the research of pattern recognition and machine learning. Typical applications include face recognition, document categorization, and image retrieval. Thus, finding a low-dimensional representation of high-dimensional space is of great practical importance. The goal of dimensionality reduction is to reduce the complexity of input space and embed high-dimensional space into a low-dimensional space while keeping most of the desired intrinsic information [16], [18], [28], [36], [38], [39], [40], [41], [42], [43]. Among all the dimensionality reduction techniques, Principle Compent Analysis (PCA) [19] and Linear Discriminant Analysis (LDA) [1] are the two most popular methods. PCA pursues the direction of maximum variance for optimal reconstruction, while LDA, a supervised method, finds the optimal projection V maximizing the between-class scatter matrix S_b and minimizing the within-class scatter matrix S_w in a low-dimensional subspace. Due to the utilization of label information, LDA can achieve better classification results compared with PCA given sufficient labeled samples are provided [1], [4].

In general, supervised methods can deliver better performance than unsupervised methods, but obtaining sufficient number of labeled data for training can be problematic because labeling large number of samples is costly and laborious. On the other hand, unlabeled samples are abundant and can easily be obtained in numerous real world cases. Compared to supervised learning approaches that only rely on labeled training data, the idea of semi-supervised learning is to incorporate labeled and unlabeled data together to improve learning performance [2], [3], [12], [13], [14], [44], [45]. In brief, semi-supervised learning can be perceived as a framework that can provide efficient alternative to labeling unlabeled data. Well-known semi-supervised learning methods include Gaussian Field and Harmonic Fuction (GFHF) [45], Learning from Local and Global Consistency (LLGC) [44] and Special Label Propagation (SLP) [12]. These methods work in a transductive way by propagating label information from labeled set into unlabeled set through label propagation. This approach is efficient but it cannot predict class labels of new-coming samples. This drawback usually results in the out-of-sample problem. In contrast, semi-supervised dimensionality reduction methods not only reduce the dimensionality, but also naturally solve the out-of-sample problem. Thus semi-supervised methods can usually deliver better results when dealing with real-world applications.

The two widely-used semi-supervised methods are Semi-supervised Discriminant Analysis (SDA) [3] and Laplacian Regularized Least Square (LapRLS) [9]. These two methods share the same concept of dimensionality reduction, i.e. they first construct a graph Laplacian matrix to approximate the manifold structure by using both labeled and unlabeled samples. They then perform dimensionality reduction by adding the graph Laplacian matrix as a regularized term to the original objective function of LDA and Regularized Least Square (RLS). As a result, the discriminative structure embedded in the labeled samples and the geometrical structure embedded in labeled and unlabeled data can be preserved. In fact Lap-RLS is essentially derived from the perspective of regression instead of classification. Lap-RLS can be perceived as a training method that is aimed at training a linear classification model by regressing labeled set on the class label, while SDA is a subspace learning method which is aimed for solving classification problems. Though they both are stemmed from different supervised methods, we in this paper show that both SDA and Lap-RLS can be unified under a regularized least square framework. As a result, both of them are able to solve regression as well as subspace learning problems.

The connection and theoretical similarities between SDA and LapRLS can be elaborated under the least square framework. It should be noted that the regression term in LapRLS and the least square framework is supervised, which mean these two methods utilize a labeled set to train a linear classification function. Since the number of labeled data is relatively small compared with unlabeled data, training a linear classification function under a small sample size can be ineffective [21]. Another issue of semi-supervised method is the utilization of data samples to construct a graph that is used for characterizing the local structure of data manifold. In SDA and Lap-RLS, local structure is preserved by using a manifold regularized term defined on the affinity matrix of Gaussian function. But these Laplacian matrixes cannot capture the discriminative information of classes. This is essential when handling classification problems. In addition, the Gaussian function based affinity matrix is found to be over sensitive to the Gaussian variance; only a slight variation on the variance may affect the results significantly. Thus, Gaussian function based affinity matrix is not a popular method for handling complicated image classification and visualization problems. Instead of using Gaussian function for graph construction, several methods including Locally Linear Reconstruction [22], [23], Local Regression and Global Alignment [30], [31] and Local Spline Regression [26], [27] have then been proposed.

In this paper, we introduce a newly developed method, Learning from Local and Global Discriminative Information (LLGDI), for solving the above semi-supervised dimensionality reduction problems. The proposed LLGDI aims to train a classification function by utilizing all availanle data points. Specifically, our proposed method first relaxes the original supervised regression term making it a loss term and a global regression regularized term. The loss term measures the inconsistency between the predicted and initial labels on a labeled set, while the global regression regularized term aims to train the classification function as well as to calculate the projection matrix for out-of-sample problem. In addition, in order to characterize both manifold and discriminative structure embedded in a dataset, LLGDI employs a set of local classification function for each data point to predict the label of its neighboring points. In this way, both local and global discriminative information of a dataset can be preserved by using the LLGDI method. Also, in order to handle the subspace learning problem, we have also introduced an uncorrelated constraint into the objective function of LLGDI. As a result, both regression and subspace learning problems can be solved at the same time.

The main contributions of this work are as follows. First, we address the SDA method into a least square framework and establish the connections between SDA and LapRLS. Second, in order to relaxe the limitations of the least square framework of SDA, we develop a new method, called LLGDI. The new method can preserve the local geometrical and discriminative information of a dataset by using a normalized local discriminative manifold regularization term. Third, we extend the LLGDI method to perform dimensionality reduction by including a relaxed uncorrelated constraint to the objective function. As a result, both regression and subspace learning problems can be solved simultaneously. Finally, the relationship between LLGDI and other state-of-the-art methods are analyzed. Theoretical analysis shows that many other semi-supervised methods are different the special cases of the LLGDI method.

This paper is organized as follows. In Section 2, the notations and a brief review of LDA, MR and SDA are detailed. In Section 3, the equivalence between SDA and Lap-RLS under a constrained regularized least square framework is derived. Section 4 presents the proposed LLGDI method for semi-supervised regression and dimensionality reduction through the introduction of a normalized local discriminative manifold regularization term. Discussion on the relationship between LLGDI and other state-of-the-art semi-supervised methods is also included. Section 5 demonstrates the extensive simulations and the final conclusions are drawn in Section 6.

Section snippets

Notations and review of related work

In this section, we will first give some notations used in our work and briefly review several related works, which include Linear Discriminant Analysis (LDA), Manifold Regularization (MR) and Semi-supervised Discriminant Analysis (SDA). Let $X = {X_{l}, X_{u}} = {x_{1}, x_{2}, \dots, x_{l + u}} \in R^{D \times (l + u)}$ be the data matrix where the first l and the remaining u columns are the labeled and unlabeled samples, respectively, $Y_{l} = {y_{1}, y_{2}, \dots, y_{j}} \in R^{c \times l}$ be the binary label matrix with each column y_j representing the class assignment of x

On the equivalence between SDA and Lap-RLS/L under uncorrelated constraint

Previous work in [41] has established a relationship between SDA and Lap-RLS/L using a least square framework, but their equivalence is not clear. In this section, we will analyze the equivalent relationship between SDA and Lap-RLS/L under an uncorrelated constraint. Specifically, we will first introduce a class labeled induced semi-supervised discriminant analysis (C-SDA). By using C-SDA as a bridge, we then establish the equivalence between SDA and Lap-RLS/L.

Learning from local and global discriminative information

The connection between SDA and LapRLS/L throws light on their relationship for semisupervised learning. Two issuess still need to be addressed: (1) the regression term in both LapRLS and LS-SDA is supervised and it only utilizes the labeled set to train the linear classification function. Since the number of labeled set is small compared with that of unlabeled data, this can be problematic that the linear classification function can be underfit because of small sample size [21]; (2) the

Simulations

In this section, we evaluate our algorithms with three synthetic datasets and several real-world datasets. For the synthetic datasets, we evaluate the proposed method using two-cycle, two-Swiss-roll and two-plate datasets. For real-world datasets, we focus on solving the classification problems based on six real-world datasets which are all benchmark datasets. For classification problem, we use 8 real-world datasets to evaluate the performance of methods, which include UMNIST [5], Extended

Conclusion

In this paper, we propose an effective LLGDI method for semi-supervised regression and dimensionality reduction. LLGDI is aimed at characterizing local and global discriminative manifold structure in a given dataset. This paper theoretically shows SDA can be addressed as a least square framework. An interesting equivalent relationship between SDA and Lap-RLS/L is derived under the uncorrelated constraint. As a result, the least square solution can be used for regression as well as subspace

Acknowledgment

This work was partly supported by the National Natural Science Foundation of China (Grant No. 61300209), partly supported by major program of National Natural Science Foundation of China (Grant No. 61033013) and also partly supported by the National Natural Science Foundation of China (Grant No. 61402310).

References (45)

F. Wang
A general learning framework using local and global regularization
Pattern Recognit.
(2010)
C. Zhang et al.
A general kernelization framework for learning algorithms based on kernel PCA
Neurocomputing
(2010)
Z. Zhang et al.
Marginal semi-supervised sub-manifold projections with informative constraints for dimensionality reduction and recognition
Neural Networks
(2012)
M. Zhao et al.
Trace ratio criterion based generalized discriminative learning for semi-supervised dimensionality reduction
Pattern Recognit.
(2012)
M. Zhao et al.
Soft label based linear discriminant analysis for image recognition and retrieval
Comput. Vision Image Understanding
(2014)
M. Zhao et al.
A general soft label based linear discriminant analysis for semi-supervised dimension reduction
Neural Networks
(2014)
P.N. Belhumeur et al.
Eigenfaces vs. Fisherfaces: recognition using class specific linear projection
IEEE Trans. Pattern Anal. Mach. Intell.
(1997)
M. Belkin et al.
Manifold regularization: a geometric framework for learning from labeled and unlabeled examples
J. Mach. Learn. Res.
(2006)
D. Cai et al.
Semi-supervised discriminant analysis
K. Fukunaga
Introduction to Statistical Pattern Classification
(1990)

D.B. Graham et al.

Characterizing virtual eigensignatures for general purpose face recognition in face recognition: from theory to application

NATO ASI Ser. F, Comput. Syst. Sci.

(1998)

X. He et al.

Face recognition using Laplacianfaces

IEEE Trans. Pattern Anal. Mach. Intell.

(2005)

R.A. Horn et al.

Matrix Analysis

(1990)

J. Hull

A database for handwritten text recognition research

IEEE Trans. Pattern Anal. Mach. Intell.

(1994)

K.C. Lee et al.

Acquiring linear subspaces for face recognition under variable lighting

IEEE Trans. Pattern Anal. Mach. Intell.

(2005)

B. Leibe et al.

Analyzing appearance and contour based methods for object categorization

S.A. Nene et al.

Columbia Object Image Library (COIL-100)

(1996)

F. Nie et al.

A general graph-based semi-supervised learning with novel class discovery

Neural Comput. Appl.

(2010)

F. Nie et al.

Flexible manifold embedding: a framework for semi-supervised and unsupervised dimensionality reduction

IEEE Trans. Image Process.

(2010)

F. Nie, D. Xu, I.W.H. Tsang, C. Zhang, A flexible and effective linearization method for subspace learning, Graph...

F. Nie et al.

Spectral embedded clustering: a framework for in-sample and out-of-sample spectral clustering

IEEE Trans. Neural Networks Learn. Syst.

(2011)

S. Roweis et al.

Nonlinear dimensionality reduction by locally linear embedding

Science

(2000)

Cited by (62)

Coordinate Descent Optimized Trace Difference Model for Joint Clustering and Feature Extraction
2024, Pattern Recognition
Joint clustering and dimensionality reduction methods are a promising solution to clustering due to its scalability to high-dimensional data. Some methods leverage trace ratio criterion and attain clusters by borrowing the K-means algorithm. However, trace ratio criterion has no close-formed solution for the discriminative projection matrix and the K-means algorithm has a limited capacity to handle the many-cluster problem. In this paper, Coordinate Descent Optimized Trace Difference model (CDOTD) is proposed for joint clustering and feature extraction. Formulating the objective function as a direct trace difference criterion containing a balance parameter, CDOTD harmonizes between-cluster scatter maximization and within-cluster scatter minimization by the balance parameter. Using the direct trace difference criterion, CDOTD can straightforward solve for the discriminative projection matrix and avoid obtaining a poor discriminative projection matrix in the iterative manner when a bad cluster start is given. CDOTD uses the coordinate descent method for clustering optimization, improving the ability to address the many-cluster problem. Extensive experiments show that CDOTD has achieved significant performance improvements compared to previous trace ratio criterion related joint clustering and feature extraction methods, and also outperformed other clustering methods in most cases.
Discriminative sparse least square regression for semi-supervised learning
2023, Information Sciences
The various variants of the classical least square regression (LSR) have been extensively utilized in numerous applications. However, most previous linear regression methods only consider the fitting between the original features and the corresponding label information. They ignore the correlations among data points. Another problem with these methods is that the strict zero-one binary label matrix is utilized as the final regression target. The degree of freedom is minimal, and the binary label information cannot be adequately fitted. A semi-supervised learning discriminative sparse least square regression (DSLSR) algorithm with some notable characteristics is presented to address the above mentioned issues. Firstly, we utilize the estimated label of the observed samples to design a new objective function in the generalized regression form, further generalizing the previous least square regression framework. Secondly, a novel jointly sparse regularized term is designed to take full use of the estimated label information, which is expected to force the extracted features of each class to be jointly sparse instead of the learned projection being jointly sparse. Thirdly, Label estimation, label relaxation, locality, and joint sparsity are seamlessly integrated into the least square regression (LSR) framework. This property can make the margins of the observed data with the same class label as minimal as feasible while concurrently maximizing the margins of observed data from other classes. The experimental findings reveal that DSLSR outperforms state-of-the-art linear regression-based approaches.
Supervised learning of explicit maps with ability to correct distortions in the target output for manifold learning
2022, Information Sciences
Most manifold learning algorithms are non-parametric, use unsupervised approaches, and hence lack prediction capability. Some of these methods have been adapted for out of sample points, but often they require the training data. We propose a framework to estimate an explicit map for manifold learning in a supervised setting. Although different modeling tools can be used, we study the performance with polynomials and neural networks. The quality of output of any regression system depends on the quality of the target data. We consider outputs from an unsupervised manifold learning method as the target. However, even for simple data, often the outputs (target) get severely distorted. For high-dimensional data, it is difficult to assess and correct if the target data are distorted. Our approach can predict as well as eliminate such distortions to a great extent in the output. We suggest three regularizers, each of which can be used to augment the loss function. For image data, even for a low order polynomial, the number of free parameters becomes large demanding a large training set. For this, we propose a scheme exploiting the spatial characteristics of neighboring pixels. The effectiveness of our framework is demonstrated using synthetic and real data for both polynomial and neural network-based models.
Semi-supervised regression using diffusion on graphs
2021, Applied Soft Computing
Citation Excerpt :
Doquire and Verleysen [19] proposed a variant of the Laplacian method for feature selection algorithm named SSLS (Semi-Supervised Laplacian Score), which blends both supervised and unsupervised Laplacian Score methods for regression problems. Zhao et al. [20] combined the LapRLS with SSL Discriminant Analysis methods (SDA) and creating an SSL dimensionality reduction in a regression setting. On a similar note, the study by Sheng and Zhu [21] applied a regularized regressor integrated with quadratic loss inside a LapRLS framework, studying the correlation of the convergence rate.
In real-world machine learning applications, unlabeled training data are readily available, but labeled data are expensive and hard to obtain. Therefore, semi-supervised learning algorithms have gathered much attention. Previous studies in this area mainly focused on a semi-supervised classification problem, whereas semi-supervised regression has received less attention. In this paper, we proposed a novel semi-supervised regression algorithm using heat diffusion with a boundary-condition that guarantees a closed-form solution. Experiments from artificial and real datasets from business, biomedical, physical, and social domain show that the boundary-based heat diffusion method can effectively outperform the top state of the art methods.
Discriminative sparse embedding based on adaptive graph for dimension reduction
2020, Engineering Applications of Artificial Intelligence
The traditional manifold learning methods usually utilize the original observed data to directly define the intrinsic structure among data. Because the original samples often contain a deal of redundant information or it is corrupted by noises, it leads to the unreliability of the obtained intrinsic structure. In addition, the intrinsic structure learning and subspace learning are completely separated. For solving above problems, this paper presents a novel dimension reduction method termed discriminative sparse embedding (DSE) based on adaptive graph. By projecting the original samples into a low-dimensional subspace, DSE learns a sparse weight matrix, which can reduce the effects of redundant information and noises of the original data, and uncover essential structural relationship among the data. In DSE, the robust subspace is learned from the original data. Meanwhile, the intrinsic local structure and the optimal subspace can be simultaneously learned, in which they are mutually improved, and the accurate structure can be captured, and the optimal subspace can be obtained. We propose an alternative and iterative method to solve the DSE model. In order to evaluate the performance of DSE, it is compared with some state-of-the-art feature extraction algorithms. Various experiments show that our DSE is effective and feasible.
Integrating joint feature selection into subspace learning: A formulation of 2DPCA for outliers robust feature selection
2020, Neural Networks
Since the principal component analysis and its variants are sensitive to outliers that affect their performance and applicability in real world, several variants have been proposed to improve the robustness. However, most of the existing methods are still sensitive to outliers and are unable to select useful features. To overcome the issue of sensitivity of PCA against outliers, in this paper, we introduce two-dimensional outliers-robust principal component analysis (ORPCA) by imposing the joint constraints on the objective function. ORPCA relaxes the orthogonal constraints and penalizes the regression coefficient, thus, it selects important features and ignores the same features that exist in other principal components. It is commonly known that square Frobenius norm is sensitive to outliers. To overcome this issue, we have devised an alternative way to derive objective function. Experimental results on four publicly available benchmark datasets show the effectiveness of joint feature selection and provide better performance as compared to state-of-the-art dimensionality-reduction methods.

View all citing articles on Scopus

View full text

Learning from normalized local and global discriminative information for semi-supervised regression and dimensionality reduction

Abstract

Introduction

Section snippets

Notations and review of related work

On the equivalence between SDA and Lap-RLS/L under uncorrelated constraint

Learning from local and global discriminative information

Simulations

Conclusion

Acknowledgment

Pattern Recognit.

Neurocomputing

Neural Networks

Pattern Recognit.

Comput. Vision Image Understanding

Neural Networks

Eigenfaces vs. Fisherfaces: recognition using class specific linear projection

IEEE Trans. Pattern Anal. Mach. Intell.

Manifold regularization: a geometric framework for learning from labeled and unlabeled examples

J. Mach. Learn. Res.

Semi-supervised discriminant analysis

Introduction to Statistical Pattern Classification

Characterizing virtual eigensignatures for general purpose face recognition in face recognition: from theory to application

NATO ASI Ser. F, Comput. Syst. Sci.

Face recognition using Laplacianfaces

IEEE Trans. Pattern Anal. Mach. Intell.

Matrix Analysis

A database for handwritten text recognition research

IEEE Trans. Pattern Anal. Mach. Intell.

Acquiring linear subspaces for face recognition under variable lighting

IEEE Trans. Pattern Anal. Mach. Intell.

Analyzing appearance and contour based methods for object categorization

Columbia Object Image Library (COIL-100)

A general graph-based semi-supervised learning with novel class discovery

Neural Comput. Appl.

Flexible manifold embedding: a framework for semi-supervised and unsupervised dimensionality reduction

IEEE Trans. Image Process.

Spectral embedded clustering: a framework for in-sample and out-of-sample spectral clustering

IEEE Trans. Neural Networks Learn. Syst.

Nonlinear dimensionality reduction by locally linear embedding

Science