Subspace learning via Locally Constrained A-optimal nonnegative projection
Introduction
The explosive growth of data with high dimensions arises in many real-world applications, making subspace learning a hot topic that attracts lots of interests from the community of pattern recognition and data mining [31]. Generally, subspace learning can be essentially treated as dimensionality reduction, which tackles curse of dimensionality. Some popular methods include Principle Component Analysis (PCA) [1], Locality Preserving Projection (LPP) [21] and Independent Component Analysis (ICA) [25]. Moreover, the recently emerging nonnegative matrix factorization based methods are also utilized to learn a low-dimensional data subspace, which assumes the original high-dimensional data matrix can be represented by a compact representation with the intrinsically low dimension. Fundamentally, these methods aim to learn the compressed subspace spanned by some dominant column vectors. Therefore, there exist inherent connections among subspace learning, matrix factorization and dimensionality reduction. Let us take PCA for example. As a classical statistical tool, PCA aims to find mutually orthogonal basis functions that capture the direction of maximum variance, so as to best preserve the global Euclidean distribution of the data points. It performs dimensionality reduction by projecting the original high-dimensional data onto the low-dimensional linear subspace spanned by the leading eigenvectors of its covariance matrix. Besides, it also decomposes the data matrix into two matrices, one of which can preserve the most energy through the principle components.
In this work, we focus our attention on subspace learning from the perspective of matrix factorization. Typically, some popular matrix factorization approaches includes Singular Value Decomposition (SVD) [26], Nonnegative Matrix Factorization (NMF) [28], and Concept Factorization (CF) [38]. SVD is one of the most frequently used method, which factorizes one matrix into three parts, including two orthogonal matrices and one diagonal matrix. NMF finds two nonnegative matrices, whose product can approximate the original matrix to a large extent. The nonnegative constraints are suggested to yield a parts-based representation since they allow only additive yet not subtractive combinations [27]. To project the high-dimensional data points onto the low-dimensional subspace, we employ NMF to derive some bases and the projected vectors under these bases. To this end, there are many variants of NMF to address different issues, such as Graph-regularized Nonnegative Matrix Factorization (GNMF) [9], and Sparse Nonnegative Matrix Factorization (SNMF) [22]. Very recently, A-optimal Nonnegative Projection (ANP) [34] was proposed for image representation, which employs ridge regression to regularize the encoding factor in matrix decomposition from the statistical view. However, ANP does not respect the local geometrical structure of the data, which might lead to local inconsistency in the new data subspace. To address this issue, we explicitly consider the manifold structure in data projection through manifold learning, whose advantages have been demonstrated in pattern recognition and data analysis [4], [6], [10].
Moreover, we often encounter situations that some prior knowledge is available as one kind of side information, possibly beneficial for many learning approaches. Semi-supervised learning is a popular paradigm to enhance the learning performance through utilizing a large number of unlabeled data and limited labeled data [41], [11], [42]. Motivated by this, we also explore how to incorporate some supervised information to guide the matrix factorization. Specifically, we decide to use two kinds of label constraints as the prior knowledge, i.e., weak constraint and hard constraint [33], to strengthen the discriminating power of the derived subspace. Hence, in this paper, we present a novel subspace learning algorithm named Locally Constrained A-optimal nonnegative projection, termed as LCA in short. The central goal is to learn a low-dimensional subspace spanned by projected vectors, which are dominant and essential for preserving the intrinsically geometrical structure and the inclusion of prior knowledge. To ensure its effectiveness, we assume that the high-dimensional data points are drawn from an ambient submanifold and they support the manifold assumption [2], [4], [7], i.e., two data points will share labels or features if they are close to each other. Once obtaining the informative data subspace, it is favorable for us to do subspace segmentation, clustering, classification and regression.
It is worth highlighting the main contributions of this paper below:
- •
We present a novel subspace learning approach from the perspectives of manifold leaning and semi-supervised learning based on nonnegative matrix factorization, i.e., Locally Constrained A-optimal nonnegative projection (LCA). This method contributes to learning an informative subspace that preserves the intrinsically geometrical structure and constrains the projected vector under the supervision of some prior knowledge. It also inherits the advantage of NMF, which leads to a parts-based data representation.
- •
LCA models the data points in the new subspace in a ridge regression model and imposes the constraint on the projected vector as a regression regularizer. Besides, it guarantees the local consistency of the low-dimensional subspace by employing the neighborhood structure information reflected in a manifold regularizer. Thus, the expected regression error can be reduced and the locally geometrical structure can be preserved in the derived data subspace, respectively.
- •
To take advantage of some side information, LCA incorporates the available label information into the nonnegative projection through two different kinds of constraints. Under the guidance of the prior knowledge, the learned data subspace is empowered with more discriminating ability, which facilitates many real-world applications, e.g., subspace segmentation and clustering.
- •
To examine the performance of LCA, extensive experiments were carried out for subspace clustering tasks on both image and document databases. Promising results indicate that our method enjoys satisfactory performances compared to other methods.
The remainder of this paper is structured as follows. First, we briefly review the related work in Section 2. Then in Section 3 we introduce our proposed Locally Constrained A-optimal nonnegative projection method. Section 4 shows the experimental results and some analysis. Finally, we provide some concluding remarks and suggestions for future work in Section 5.
Section snippets
Related work
In this section, we review some works that are closely related to the proposed method. Since this paper explores subspace learning from the perspective of matrix factorization, the following will concentrate on matrix factorization based learning approaches.
As mentioned earlier, Principle Component Analysis (PCA) [1] is regarded as a popular method for both dimensionality reduction and matrix factorization to derive a low-dimensional data subspace. It factorizes the original data matrix into
Our method
In this section, we introduce our proposed LCA method and provide the derivations of the optimization framework. Since LCA is fundamentally based on ANP, we first show its brief description.
Experiments
In this section, we examine the performance of the proposed LCA approach under two kinds of label constraints through subspace clustering on several benchmark data sets. First, we give a brief description about the data sets. Then, we show the experimental design, including the evaluation metrics, compared methods and parameter settings. In the end, we report the experimental results as well as some analysis. We further provide the model selection to give an intuitive scenario for choosing the
Conclusion
In this paper, we have presented a novel subspace learning method from the perspective of matrix factorization, i.e., Locally Constrained A-optimal nonnegative matrix factorization (LCA). This approach is essentially based on A-optimal nonnegative projection [34] and focuses on two primary problems, namely how to preserve the local structure in the new subspace and how to perform nonnegative projection under the guidance of some supervised information. In particular, motivated by the recent
Acknowledgments
This work was supported in part by National Natural Science Foundation of China under Grants 91120302, 61222207, 61173185 and 61173186, National Basic Research Program of China (973 Program) under Grant 2013CB336500, the Fundamental Research Funds for the Central Universities under Grant 2012FZA5017 and the Zhejiang Province Key S&T Innovation Group Project under Grant 2009R50009.
Ping Li is currently pursuing the PhD degree in computer science at Zhejiang University. He received the MS degree in information and communication engineering from Central South University, China, in 2010. His research interests include machine learning, data mining and information retrieval.
References (42)
- et al.
Discriminative concept factorization for data representation
Neurocomputing
(2011) - et al.
Clustering analysis using manifold kernel concept factorization
Neurocomputing
(2012) - et al.
Image representation using Laplacian regularized nonnegative tensor factorization
Pattern Recognition
(2011) - et al.
Principal component analysis
Wiley Interdiscip. Rev.Comput. Stat.
(2010) - M. Belkin, I. Matveeva, P. Niyogi, Regularization and semi-supervised learning on large graphs, in: Proceedings of the...
- et al.
Laplacian eigenmaps and spectral techniques for embedding and clustering
Adv. Neural Inf. Process. Syst.
(2002) - et al.
Manifold regularizationa geometric framework for learning from examples
J. Mach. Learn. Res.
(2006) - et al.
Convex Optimization
(2004) - et al.
Manifold adaptive experimental design for text categorization
IEEE Trans. Knowl. Data Eng.
(2012) - et al.
Locally consistent concept factorization for document clustering
IEEE Trans. Knowl. Data Eng.
(2011)
Speed up kernel discriminant analysis
VLDB J.
Graph regularized nonnegative matrix factorization for data representation
IEEE Trans. Pattern Anal. Mach. Intell.
Nenmfan optimal gradient method for nonnegative matrix factorization
IEEE Trans. Signal Process.
Cited by (15)
Discriminative semi-supervised non-negative matrix factorization for data clustering
2021, Engineering Applications of Artificial IntelligenceCitation Excerpt :And as a result, those data points that have same label are represented as a single point. Some other semi-supervised NMF approach were proposed by using the tricks in GNMF and CNMF to increase the locally structure preserving and discrimination of new representation (Li et al., 2013; Sun et al., 2016). In Babaee et al. (2016), the authors proposed a semi-supervised NMF called Discriminative NMF (DNMF), which utilizes the label information of a fraction of data as a discriminative constraint.
Coupled local–global adaptation for multi-source transfer learning
2018, NeurocomputingCitation Excerpt :Xiong et al. [22] verify that exploiting manifold structure of latent domains can further modeling the dataset. Li et al. [23–25] investigate the manifold structures under subspace learning. For multi-source domain adaptation, another possible solution is to learn multiple cross-domain transforms, one for each source-target pair [26].
Discriminative non-negative matrix factorization (DNMF) and its application to the fault diagnosis of diesel engine
2017, Mechanical Systems and Signal ProcessingCitation Excerpt :In 2013, Liu et al. [29] proposed a semi-supervised NMF approach, namely Constrained NMF (CNMF) by introducing the label information into the main objective function of NMF. In the same year, Li et al. [30] proposed a semi-supervised NMF method by using the tricks in GNMF and CNMF. The locality preserving and discrimination of data representation have been increased respectively.
Discriminative Nonnegative Matrix Factorization for dimensionality reduction
2016, NeurocomputingCitation Excerpt :In another proposed semi-supervised NMF approach, namely Subspace Learning via Locally Constrained A-optimal nonnegative projection (LCA), a semi-supervised locally structure preserving is proposed, in which the tricks in GNMF and CNMF are used to increase the locality preserving and discrimination of new representation, respectively. However, the points with same label are also projected to a single point [17]. Through the paper, we use hard constraint LCA (LCA-H in the original paper) as the used LCA method.
Manifold optimal experimental design via dependence maximization for active learning
2014, NeurocomputingCitation Excerpt :To overcome this drawback, some methods utilize both measured and unmeasured samples to actively select the most informative points, e.g., Transductive Experimental Design (TED) [30] evaluates the average prediction variance on the pre-given unseen data based on I-optimal design. Nevertheless, TED does not consider the local manifold structure of the data space, which is of vital importance in active learning, since naturally occurring data often reside on a lower dimensional sub-manifold of the ambient Euclidean space [3,16,18]. To handle this deficit, Laplacian regularized D-optimal design (LapRDD) [13] was proposed, where the loss function is defined on both labeled and unlabeled points with an imposed locality preserving regularizer, which has been adopted in several learning methods to improve the performance [15,17].
Ping Li is currently pursuing the PhD degree in computer science at Zhejiang University. He received the MS degree in information and communication engineering from Central South University, China, in 2010. His research interests include machine learning, data mining and information retrieval.
Jiajun Bu received the BS and PhD degrees in computer science from Zhejiang University, China, in 1995 and 2000, respectively. He is a professor in College of Computer Science, Zhejiang University. His research interests include embedded system, data mining, information retrieval and mobile database.
Chun Chen received the BS degree in mathematics from Xiamen University, China, in 1981, and his MS and PhD degrees in computer science from Zhejiang University, China, in 1984 and 1990, respectively. He is a professor in College of Computer Science, Zhejiang University. His research interests include information retrieval, data mining, computer vision, computer graphics and embedded technology.
Can Wang received the BS degree in economics, MS and PhD degrees in computer science from Zhejiang University, China, in 1995, 2003 and 2009, respectively. He is currently a faculty member in College of Computer Science at Zhejiang University. His research interests include information retrieval, machine learning and information accessibility for disabled.
Deng Cai received the PhD degree in computer science from the University of Illinois at Urbana Champaign in 2009. Before that, he received the BS and MS degrees from Tsinghua University, China in 2000 and 2003, respectively, both in automation. He is an associate professor in the State Key Lab of CAD&CG, College of Computer Science at Zhejiang University, China. His research interests include machine learning, data mining and information retrieval.