L1-graph construction using structured sparsity

doi:10.1016/j.neucom.2013.03.045

Neurocomputing

Volume 120, 23 November 2013, Pages 441-452

https://doi.org/10.1016/j.neucom.2013.03.045 Get rights and content

Abstract

As a powerful model to represent the data, graph has been widely applied to many machine learning tasks. More notably, to address the problems associated with the traditional graph construction methods, sparse representation has been successfully used for graph construction, and one typical work is L₁-graph. However, since L₁-graph often establishes only part of all the valuable connections between different data points due to its tendency to ignore the intrinsic structure hidden among the data, it fails to exploit such important information for the subsequent machine learning. Besides, the high computational costs of L₁-graph prevent it from being applied to large scale high-dimensional datasets. In this paper, we construct a new graph, called the k-nearest neighbor (k-NN) fused Lasso graph, which is different from the traditional L₁-graph because of its successful incorporation of the structured sparsity into the graph construction process and its applicability to large complex datasets. More concretely, to induce the structured sparsity, a novel regularization term is defined and reformulated into a matrix form to fit in the sparse representation step of L₁-graph construction, and the k-NN method and kernel method are employed to deal with large complex datasets. Experimental results on several complex image datasets demonstrate the promising performance of our k-NN fused Lasso graph and also its advantage over the traditional L₁-graph in the task of spectral clustering.

Introduction

Since graph is a powerful model to represent the data, it has served as foundation for lots of machine learning problems, such as spectral clustering [1], [2], semi-supervised learning [3], [4], dimension reduction [5] and so on. Although many graph-based methods have been developed for different machine learning tasks, graph construction still receives relatively little attention as pointed out in [6], [7]. In the literature, there exist two commonly used strategies for graph construction, namely k-nearest neighbor (k-NN) and $ϵ - ball$ methods. Although these methods are easy both to understand and to implement, they suffer from inherent limitations, e.g. data dependency and sensitivity to noise.

Recently, to address these problems, sparse representation [8] has been successfully used for graph construction, among which one typical work is L₁-graph [9], [10]. The success of L₁-graph lies in the sparse representation step, in which it seeks a sparse linear reconstruction of each data point with the other data points by exploiting the sparse property of the Lasso penalty [11]. This is, in fact, a new way that is fundamentally different from the traditional ones (like Euclidean distance, cosine distance, etc.) to measure the similarity between different data points. By inducing sparsity in the linear reconstruction process, it identifies the most relevant data points as well as their estimated similarity to the reconstructed data point, and by doing so gets a graph that proves effective in the subsequent graph-based machine learning tasks.

However, two interesting comments from previous works on the Lasso method and the L₁-graph attract our attention:

(1)
As reported in [12], when faced with a group of highly correlated variables, Lasso method tends to randomly choose one of them.
(2)
In [10], the authors stated that “for certain extreme cases, e.g. if we simply duplicate each sample and generate another new dataset of double size, L₁-graph may only connect these duplicated pairs”.

With some simple mathematical derivations, as shown in Section 2, we can see the similarity between the sparse representation step of L₁-graph construction and the Lasso method. As a result, if we think of the data points and the similarity between data points in sparse representation step as variables and correlation between variables in Lasso method respectively, the first comment indeed suggests that the sparse representation does not connect all the data points that need to be connected. In the situation mentioned in the second comment, the similarity between the reconstructed data point and its duplicate (measured by the sparse representation method of L₁-graph) dominates others, which also makes the sparse representation step ignore many other valuable connections.

In addition, advances in technology have made large scale high-dimensional datasets common in many scientific disciplines, yet the construction process of L₁-graph, in which the computational costs become unbearable because of a huge matrix (details will be given in Section 2) constructed when dealing with these datasets, prevents it from being further applied to problems related to such large complex datasets.

Our work mainly aims to overcome these shortcomings of L₁-graph. To avoid L₁-graph's failure to establish all valuable connections between different data points, we seek to incorporate structured sparsity into the L₁-graph construction process. The main idea is to exploit the local structure across the dataset by making the reconstruction coefficients of every data point and its nearest neighbors also close to each other in value in the linear reconstruction process of the sparse representation step. To achieve this, we propose a novel regularization term, which makes use of the information provided by traditional ways of measuring similarity between data points, for the sparse representation step of L₁-graph construction to induce structured sparsity and reformulate it in matrix form to fit in our new graph (which we call k-NN fused Lasso graph) construction process. And in order to deal with large scale high-dimensional datasets, we employ the k-NN method and kernel method in our new graph construction process. To be more specific, we reconstruct each data point and construct the corresponding new regularization term, both with only its k nearest neighbors to handle large scale datasets. And when solving the linear reconstruction problem, we use the kernel matrix instead of the original data vectors to handle high-dimensional datasets. The effectiveness of k-NN fused Lasso graph is verified by the experimental results on several large complex image datasets in the task of spectral clustering. Specifically, to gain a first impression of its effectiveness, the similarity (i.e. weight) matrix of our new graph on the doubled soybean dataset (the soybean dataset, which contains 47 35-dimensional instances, can be downloaded from the UCI Machine Learning Repository [20], and the doubled soybean dataset is generated by making an exact duplicate of each data point in the original dataset) are illustrated on Fig. 1(b). Comparing it with Fig. 1(a), we can easily see the tremendous advantage of our new graph over the L₁-graph.

Our main contribution is the development of the new k-NN fused Lasso graph construction method. To be more specific, our contributions can be summarized as follows:

(1)
We proposed a novel regularization term to induce structured sparsity.
(2)
We designed a reformulation strategy to incorporate the new regularization term into the graph construction process.
(3)
We successfully employed the k-NN method and kernel method to make our graph construction method applicable to large scale high-dimensional datasets.

The idea of linearly reconstructing a given data point by its neighbors is also used in some other works, e.g. the locally linear embedding [22] method for dimension reduction. However, unlike our method, these works did not pay much attention to the reconstruction process itself. In [23], the authors proposed a unifying framework for dimension reduction called patch alignment. In our graph construction process, by using the k-NN method, we also construct a patch for each data point, and conducting the sparse representation step is similar to the part optimization in [23]. By unlike [23], we do not have a whole alignment step. We run the sparse representation step for each of the patches, and unifying them in the end to get the similarity matrix by symmetrizing the original similarity matrix constructed by the sparse representation steps. The idea of exploring the dataset structure in a pairwise manner is also present in some previous works, e.g. the max-min distance analysis [24]. But in [24], the authors used the pairwise distance between different classes, while our method focuses on the pairwise distance of the reconstruction coefficients of different data points. Also, our method is unsupervised in nature. We do not need such prior information as class labels, which makes our method applicable to many unsupervised or semi-supervised problems, and distinguishes our work from previous works like [24], [14], as well as some other works, like the Group Sparse MahNMF in [25]. Like the elastic net [12], our new regularization term also has certain grouping effect. But we promote such grouping effect in a pairwise manner with the L₁ norm, which makes our method performs differently from the elastic net [12] as well as some other elastic net based works, such as Elastic Net Inducing MahNMF [25] and Manifold Elastic Net [26]. To the best of our knowledge, we have made the first attempt to incorporate the structured sparsity into the L₁-graph construction process, and the fact that our new k-NN fused Lasso graph outperforms the traditional k-NN graph and L₁-graph (see later experimental results in Section 6) when applied to spectral clustering on large complex image datasets demonstrates the great value of the structured sparsity information we utilize in our new method.

The rest of the paper will be organized as follows. In Section 2, we briefly review the L₁-graph construction method. In Section 3, we describe in detail how we overcome the shortcomings of L₁-graph by the new regularization term, k-NN method and kernel method, and summarize the L₁-graph construction process used in this paper, which also makes use of the k-NN method and kernel method to deal with large complex image datasets. In Section 4, we summarize the construction process of the new k-NN fused Lasso graph. In Section 5, we compare the computational complexity of the different graph construction methods. Section 6 provides the experimental results to verify the effectiveness of our new k-NN fused Lasso graph when applied to spectral clustering, and Section 7 gives conclusions.

Section snippets

L₁-graph construction

In this section, we give a brief review of the closely connected L₁-graph construction method [10]. Suppose we have a set of data points $a_{1}, a_{2}, \dots, a_{n}$ , which are represented in the form of column vectors ( $a_{i} \in R^{m}$ ). Our goal is to construct an L₁-graph based on these data points. Motivated by the limitations of the traditional graph construction methods as mentioned above, the L₁-graph construction method seeks to determine the neighborhood and the edge weight simultaneously. The corresponding

New regularization term

Following the discussion in Section 2, we can see that, in nature, the sparse representation step of the L₁-graph construction process is solving the optimization problem given by Eq. (4). As we have discussed in Section 1, when the data point a_i is linearly reconstructed with other data points, it is reasonable to assume that the reconstruction coefficients of every data point and its nearest neighbors should be similar in value. Since in most cases, we do not have prior knowledge about the

k-NN fused Lasso graph construction

Finally, we summarize the graph construction process for our k-NN fused Lasso graph as follows:

(1)
Initialization: Choose the value of parameter $k_{1}, k_{2}$ , s.t. $k_{2} \leq k_{1}$ . The input now is a kernel matrix $K = {[K_{ij}]}_{n \times n}$ derived from the set of data points $a_{1}, a_{2}, \dots, a_{n} \in R^{m}$ .
(2)
Sparse representation: For each data point a_i, we determine its k₁ nearest neighbors $a_{i_{1}}, \dots, a_{i_{k_{2}}}, \dots, a_{i_{k_{1}}}$ . We construct the new regularization term with Eq. (12) and get the matrix Cⁱ with Eqs. (13), (14), (15), (16). Let $B^{i} = {[K_{i_{p} i_{q}}]}_{k_{1} \times k_{1}}$ , where $p, q =$

Computational complexity analysis

In this section, we give a theoretical analysis of the computational complexity of the graph construction methods involved in this paper. Four different graph construction methods are used in this paper, namely the k-NN graph, the original L₁-graph (which is described in Section 2), the L₁-graph with k-NN method and kernel method (which is described in Section 3.2 and used in the experiments in Section 6 as L₁-graph. We will refer to this as the new L₁-graph for convenience), and our new k-NN

Experimental results

In this section, we conduct a variety of experiments to demonstrate the effectiveness of our k-NN fused Lasso graph and also its advantage over the traditional L₁-graph. In this paper, we focus on testing our k-NN fused Lasso graph in the task of spectral clustering, regardless of many other graph-based machine learning tasks.

Conclusion

In this paper, we have proposed a new graph construction method based on the L₁-graph and the recent development in sparse representation. Our main motivation is to overcome the traditional L₁-graph's potential tendency to ignore the intrinsic structure of the data and help it convey more valuable information in order to improve its performance. We have incorporated the structured sparsity into our L₁-graph construction, and more notably, the successful employment of k-NN method and kernel

Acknowledgments

This work was supported by National Natural Science Foundation of China under Grants 61073084 and 61202231, Beijing Natural Science Foundation of China under Grants 4122035 and 4132037, Ph.D. Programs Foundation of Ministry of Education of China under Grant 20120001110097, and National Hi-Tech Research and Development Program (863 Program) of China under Grant 2012AA012503.

Zhiwu Lu He received the M.Sc. degree in applied mathematics from Peking University, Beijing, China in 2005, and the Ph.D. degree in computer science from City University of Hong Kong in 2011. Since March 2011, he has become an assistant professor with the Institute of Computer Science and Technology, Peking University. He has published over 30 papers in international journals and conference proceedings including TIP, TSMC-B, TMM, AAAI, ICCV, CVPR, ECCV, and ACM-MM. His research interests lie

References (26)

J. Shi, J. Malik, Normalized cuts and image segmentation, in: Proceedings of CVPR, 1997, pp....
Z. Lu, H. Ip, Constrained spectral clustering via exhaustive and efficient constraint propagation, in: Proceedings of...
D. Zhou, O. Bousquet, T. Lal, J. Weston, B. Schölkopf, Learning with local and global consistency, in: Advances in...
X. Zhu, Z. Ghahramani, J. Lafferty, Semi-supervised learning using Gaussian fields and harmonic functions, in:...
S. Yan et al.
Graph embedding and extensionsa general framework for dimensionality reduction
IEEE Trans. Pattern Anal. Mach. Intell.
(2007)
X. Zhu, Semi-supervised learning literature survey,...
P.P. Talukdar, Topics in Graph Construction for Semi-Supervised Learning, Technical Report,...
J. Wright et al.
Robust face recognition via sparse representation
IEEE Trans. Pattern Anal. Mach. Intell.
(2009)
S. Yan, H. Wang, Semi-supervised learning by sparse representation, in: Proceedings of SDM, 2009, pp....
B. Cheng et al.
Learning with l1-graph for image analysis
IEEE Trans. Image Process.
(2010)

R. Tibshirani

Regression shrinkage and selection via the Lasso

J. R. Stat. Soc. Ser. B

(1996)

H. Zou et al.

Regularization and variable selection via the elastic net

J. R. Stat. Soc. Ser. B

(2005)

R. Tibshirani et al.

Sparsity and smoothness via the fused Lasso

J. R. Stat. Soc. Ser. B

(2005)

Cited by (24)

Joint feature representation and classification via adaptive graph semi-supervised nonnegative matrix factorization
2020, Signal Processing: Image Communication
As an effective feature representation method, non-negative matrix factorization (NMF) cannot utilize the label information sufficiently, which makes it not be suitable for the classification task. In this paper, we propose a joint feature representation and classification framework named adaptive graph semi-supervised nonnegative matrix factorization (AGSSNMF). Firstly, to enhance the discriminative ability of feature representation and accomplish the classification task, a regression model with nonnegative matrix factorization (called as RNMF) is proposed, which exploits the relation between the label information and feature representation. Secondly, to overcome the drawback of insufficient labels, an adaptive graph-based label propagation (refereed as AGLP) model is established, which adopts a local constraint to reflect the local structure of data. Then, we integrate RNMF and AGLP into a unified framework for feature representation and classification. Finally, an iterative optimization algorithm is used to solve the objective function. Extensive experiments show that the proposed framework has excellent performance compared with some well-known methods.
Discriminative sparse embedding based on adaptive graph for dimension reduction
2020, Engineering Applications of Artificial Intelligence
The traditional manifold learning methods usually utilize the original observed data to directly define the intrinsic structure among data. Because the original samples often contain a deal of redundant information or it is corrupted by noises, it leads to the unreliability of the obtained intrinsic structure. In addition, the intrinsic structure learning and subspace learning are completely separated. For solving above problems, this paper presents a novel dimension reduction method termed discriminative sparse embedding (DSE) based on adaptive graph. By projecting the original samples into a low-dimensional subspace, DSE learns a sparse weight matrix, which can reduce the effects of redundant information and noises of the original data, and uncover essential structural relationship among the data. In DSE, the robust subspace is learned from the original data. Meanwhile, the intrinsic local structure and the optimal subspace can be simultaneously learned, in which they are mutually improved, and the accurate structure can be captured, and the optimal subspace can be obtained. We propose an alternative and iterative method to solve the DSE model. In order to evaluate the performance of DSE, it is compared with some state-of-the-art feature extraction algorithms. Various experiments show that our DSE is effective and feasible.
Human action recognition using double discriminative sparsity preserving projections and discriminant ridge-based classifier based on the GDWL-l1 graph
2020, Expert Systems with Applications
Citation Excerpt :
To fix this problem, (Yang, Wang, Yang, Han & Huang, 2014) regularized the l1- graph's problem by adding the geometric information of data using the graph Laplacian by a k-nearest-neighbor adjacency matrix. ( Zhou, Lu & Peng, 2013) introduced the k-nearest-neighbor fused LASSO graph method to incorporate the structured sparsity into graph construction. ( He et al., 2016) introduced a weighted sparse coding method to build the graph using Euclidean distance.
Human action recognition is defined as determining the actions of humans happening in video sequences. Human action recognition is one of the interesting topics which can play important role in intelligence, surveillance and health protection systems' performance; the performance of human action recognition methods is a vital issue that has been focused on many recent papers. Subspace learning and classification steps can impress the performance of the human action recognition methods. Accordingly, in this paper, new subspace learning and classification algorithms are proposed for human action recognition. Notably, graphs play important role to describe the relationship of data but most of the graph based methods used Euclidean distance metric. To overcome it, a geodesic distance based weighted LASSO l1-graph (GDWL-l1 graph) is proposed to extract the between-class and within-class graphs. Then, double discriminative sparsity preserving projections (DDSPP) algorithm is introduced to map the high-dimensional data to a new discriminant low-dimensional space using these graphs in order to have better discrimination besides sparsity and locality preserving in the mapped space. Subspace learning using DDSPP algorithm leads to a discriminative and sparse low-dimensional space. At the end, a discriminant ridge-based classifier (DRC) is introduced to inherit the grouping effect of the ridge regression besides incorporating the geodesic distance by defining a criterion of the classification. Experimental results show the suitable performance of the proposed method on HMDB51 and UCF101 datasets which are as 66.41% and 92.46%, respectively.
Sparse graphs with smoothness constraints: Application to dimensionality reduction and semi-supervised classification
2019, Pattern Recognition
Citation Excerpt :
The one obtained with Eq. (2) is called standard sparse graph (ℓ1-s). In [34], the authors proposed k-nearest neighbor (k-NN) fused Lasso graph. This work integrates structured sparsity into the graph construction process.
Sparse representation is a useful tool in machine learning and pattern recognition area. Sparse graphs (graphs constructed using sparse representation of data) proved to be very informative graphs for many learning tasks such as label propagation, embedding, and clustering. It has been shown that constructing an informative graph is one of the most important steps since it significantly affects the final performance of the post graph-based learning algorithm. In this paper, we introduce a new sparse graph construction method that integrates manifold constraints on the unknown sparse codes as a graph regularizer. These constraints seem to be a natural regularizer that was discarded in existing state-of-the art graph construction methods. This regularizer imposes constraints on the graph coefficients in the same way a locality preserving constraint imposes on data projection in non-linear manifold learning. The proposed method is termed Sparse Graph with Laplacian Smoothness (SGLS). We also propose a kernelized version of the SGLS method. A series of experimental results on several public image datasets show that the proposed methods can out-perform many state-of-the-art methods for the tasks of label propagation, nonlinear and linear embedding.
Adaptive sparse graph learning based dimensionality reduction for classification
2019, Applied Soft Computing Journal
To preserve the sparsity structure in dimensionality reduction, sparsity preserving projection (SPP) is widely used in many fields of classification, which has the advantages of noise robustness and data adaptivity compared with other graph based method. However, the sparsity parameter of SPP is fixed for all samples without any adjustment. In this paper, an improved SPP method is proposed, which has an adaptive parameter adjustment strategy during sparse graph construction. With this adjustment strategy, the sparsity parameter of each sample is adjusted adaptively according to the relationship of those samples with nonzero sparse representation coefficients, by which the discriminant information of graph is enhanced. With the same expectation, similarity information both in original space and projection space is applied for sparse representation as guidance information. Besides, a new measurement is introduced to control the influence of each sample’s local structure on projection learning, by which more correct discriminant information should be preserved in the projection space. With the contributions of above strategies, the low-dimensional space with high discriminant ability is found, which is more beneficial for classification. Experimental results on three datasets demonstrate that the proposed approach can achieve better classification performance over some available state-of-the-art approaches.
Learning a discriminant graph-based embedding with feature selection for image categorization
2019, Neural Networks
Graph-based embedding methods are very useful for reducing the dimension of high-dimensional data and for extracting their relevant features. In this paper, we introduce a novel nonlinear method called Flexible Discriminant graph-based Embedding with feature selection (FDEFS). The proposed algorithm aims to classify image sample data in supervised learning and semi-supervised learning settings. Specifically, our method incorporates the Manifold Smoothness, Margin Discriminant Embedding and the Sparse Regression for feature selection. The weights add $ℓ_{2, 1}$ -norm regularization for local linear approximation. The sparse regression implicitly performs feature selection on the original features of data matrix and of the linear transform. We also provide an effective solution method to optimize the objective function. We apply the algorithm on six public image datasets including scene, face and object datasets. These experiments demonstrate the effectiveness of the proposed embedding method. They also show that proposed the method compares favorably with many competing embedding methods.

View all citing articles on Scopus

Yuxin Peng He is the professor and director of Multimedia Information Processing Lab (MIPL) in the Institute of Computer Science and Technology (ICST), Peking University. He received his Ph.D. degree in computer application from School of Electronics Engineering and Computer Science (EECS), Peking University, in July 2003. After that, he worked as an assistant professor in ICST, Peking University. From August 2003 to November 2004, he was a visiting scholar with the Department of Computer Science, City University of Hong Kong. He was promoted to associate professor in Peking University in August 2005. In 2006, he was authorized by the “Program for New Star in Science and Technology of Beijing”, and the “Program for New Century Excellent Talents in University (NCET)”. In August 2010, he was promoted to professor in Peking University. He has published over 50 papers in international journals and conference proceedings including TCSVT, TIP, ACM-MM, ICCV, CVPR and AAAI. In 2009, he led his team to participate in TRECVID. In six tasks of the high-level feature extraction (HLFE) and search, his team won the first places in fours tasks and the second places in the left two tasks. Besides, he has obtained 12 patents. His current research interests include multimedia information retrieval, computer vision and pattern recognition.

View full text

L1-graph construction using structured sparsity

Abstract

Introduction

Section snippets