Elsevier

Neurocomputing

Volume 120, 23 November 2013, Pages 441-452
Neurocomputing

L1-graph construction using structured sparsity

https://doi.org/10.1016/j.neucom.2013.03.045Get rights and content

Abstract

As a powerful model to represent the data, graph has been widely applied to many machine learning tasks. More notably, to address the problems associated with the traditional graph construction methods, sparse representation has been successfully used for graph construction, and one typical work is L1-graph. However, since L1-graph often establishes only part of all the valuable connections between different data points due to its tendency to ignore the intrinsic structure hidden among the data, it fails to exploit such important information for the subsequent machine learning. Besides, the high computational costs of L1-graph prevent it from being applied to large scale high-dimensional datasets. In this paper, we construct a new graph, called the k-nearest neighbor (k-NN) fused Lasso graph, which is different from the traditional L1-graph because of its successful incorporation of the structured sparsity into the graph construction process and its applicability to large complex datasets. More concretely, to induce the structured sparsity, a novel regularization term is defined and reformulated into a matrix form to fit in the sparse representation step of L1-graph construction, and the k-NN method and kernel method are employed to deal with large complex datasets. Experimental results on several complex image datasets demonstrate the promising performance of our k-NN fused Lasso graph and also its advantage over the traditional L1-graph in the task of spectral clustering.

Introduction

Since graph is a powerful model to represent the data, it has served as foundation for lots of machine learning problems, such as spectral clustering [1], [2], semi-supervised learning [3], [4], dimension reduction [5] and so on. Although many graph-based methods have been developed for different machine learning tasks, graph construction still receives relatively little attention as pointed out in [6], [7]. In the literature, there exist two commonly used strategies for graph construction, namely k-nearest neighbor (k-NN) and ϵ-ball methods. Although these methods are easy both to understand and to implement, they suffer from inherent limitations, e.g. data dependency and sensitivity to noise.

Recently, to address these problems, sparse representation [8] has been successfully used for graph construction, among which one typical work is L1-graph [9], [10]. The success of L1-graph lies in the sparse representation step, in which it seeks a sparse linear reconstruction of each data point with the other data points by exploiting the sparse property of the Lasso penalty [11]. This is, in fact, a new way that is fundamentally different from the traditional ones (like Euclidean distance, cosine distance, etc.) to measure the similarity between different data points. By inducing sparsity in the linear reconstruction process, it identifies the most relevant data points as well as their estimated similarity to the reconstructed data point, and by doing so gets a graph that proves effective in the subsequent graph-based machine learning tasks.

However, two interesting comments from previous works on the Lasso method and the L1-graph attract our attention:

  • (1)

    As reported in [12], when faced with a group of highly correlated variables, Lasso method tends to randomly choose one of them.

  • (2)

    In [10], the authors stated that “for certain extreme cases, e.g. if we simply duplicate each sample and generate another new dataset of double size, L1-graph may only connect these duplicated pairs”.

With some simple mathematical derivations, as shown in Section 2, we can see the similarity between the sparse representation step of L1-graph construction and the Lasso method. As a result, if we think of the data points and the similarity between data points in sparse representation step as variables and correlation between variables in Lasso method respectively, the first comment indeed suggests that the sparse representation does not connect all the data points that need to be connected. In the situation mentioned in the second comment, the similarity between the reconstructed data point and its duplicate (measured by the sparse representation method of L1-graph) dominates others, which also makes the sparse representation step ignore many other valuable connections.

In addition, advances in technology have made large scale high-dimensional datasets common in many scientific disciplines, yet the construction process of L1-graph, in which the computational costs become unbearable because of a huge matrix (details will be given in Section 2) constructed when dealing with these datasets, prevents it from being further applied to problems related to such large complex datasets.

Our work mainly aims to overcome these shortcomings of L1-graph. To avoid L1-graph's failure to establish all valuable connections between different data points, we seek to incorporate structured sparsity into the L1-graph construction process. The main idea is to exploit the local structure across the dataset by making the reconstruction coefficients of every data point and its nearest neighbors also close to each other in value in the linear reconstruction process of the sparse representation step. To achieve this, we propose a novel regularization term, which makes use of the information provided by traditional ways of measuring similarity between data points, for the sparse representation step of L1-graph construction to induce structured sparsity and reformulate it in matrix form to fit in our new graph (which we call k-NN fused Lasso graph) construction process. And in order to deal with large scale high-dimensional datasets, we employ the k-NN method and kernel method in our new graph construction process. To be more specific, we reconstruct each data point and construct the corresponding new regularization term, both with only its k nearest neighbors to handle large scale datasets. And when solving the linear reconstruction problem, we use the kernel matrix instead of the original data vectors to handle high-dimensional datasets. The effectiveness of k-NN fused Lasso graph is verified by the experimental results on several large complex image datasets in the task of spectral clustering. Specifically, to gain a first impression of its effectiveness, the similarity (i.e. weight) matrix of our new graph on the doubled soybean dataset (the soybean dataset, which contains 47 35-dimensional instances, can be downloaded from the UCI Machine Learning Repository [20], and the doubled soybean dataset is generated by making an exact duplicate of each data point in the original dataset) are illustrated on Fig. 1(b). Comparing it with Fig. 1(a), we can easily see the tremendous advantage of our new graph over the L1-graph.

Our main contribution is the development of the new k-NN fused Lasso graph construction method. To be more specific, our contributions can be summarized as follows:

  • (1)

    We proposed a novel regularization term to induce structured sparsity.

  • (2)

    We designed a reformulation strategy to incorporate the new regularization term into the graph construction process.

  • (3)

    We successfully employed the k-NN method and kernel method to make our graph construction method applicable to large scale high-dimensional datasets.

The idea of linearly reconstructing a given data point by its neighbors is also used in some other works, e.g. the locally linear embedding [22] method for dimension reduction. However, unlike our method, these works did not pay much attention to the reconstruction process itself. In [23], the authors proposed a unifying framework for dimension reduction called patch alignment. In our graph construction process, by using the k-NN method, we also construct a patch for each data point, and conducting the sparse representation step is similar to the part optimization in [23]. By unlike [23], we do not have a whole alignment step. We run the sparse representation step for each of the patches, and unifying them in the end to get the similarity matrix by symmetrizing the original similarity matrix constructed by the sparse representation steps. The idea of exploring the dataset structure in a pairwise manner is also present in some previous works, e.g. the max-min distance analysis [24]. But in [24], the authors used the pairwise distance between different classes, while our method focuses on the pairwise distance of the reconstruction coefficients of different data points. Also, our method is unsupervised in nature. We do not need such prior information as class labels, which makes our method applicable to many unsupervised or semi-supervised problems, and distinguishes our work from previous works like [24], [14], as well as some other works, like the Group Sparse MahNMF in [25]. Like the elastic net [12], our new regularization term also has certain grouping effect. But we promote such grouping effect in a pairwise manner with the L1 norm, which makes our method performs differently from the elastic net [12] as well as some other elastic net based works, such as Elastic Net Inducing MahNMF [25] and Manifold Elastic Net [26]. To the best of our knowledge, we have made the first attempt to incorporate the structured sparsity into the L1-graph construction process, and the fact that our new k-NN fused Lasso graph outperforms the traditional k-NN graph and L1-graph (see later experimental results in Section 6) when applied to spectral clustering on large complex image datasets demonstrates the great value of the structured sparsity information we utilize in our new method.

The rest of the paper will be organized as follows. In Section 2, we briefly review the L1-graph construction method. In Section 3, we describe in detail how we overcome the shortcomings of L1-graph by the new regularization term, k-NN method and kernel method, and summarize the L1-graph construction process used in this paper, which also makes use of the k-NN method and kernel method to deal with large complex image datasets. In Section 4, we summarize the construction process of the new k-NN fused Lasso graph. In Section 5, we compare the computational complexity of the different graph construction methods. Section 6 provides the experimental results to verify the effectiveness of our new k-NN fused Lasso graph when applied to spectral clustering, and Section 7 gives conclusions.

Section snippets

L1-graph construction

In this section, we give a brief review of the closely connected L1-graph construction method [10]. Suppose we have a set of data points a1,a2,,an, which are represented in the form of column vectors (aiRm). Our goal is to construct an L1-graph based on these data points. Motivated by the limitations of the traditional graph construction methods as mentioned above, the L1-graph construction method seeks to determine the neighborhood and the edge weight simultaneously. The corresponding

New regularization term

Following the discussion in Section 2, we can see that, in nature, the sparse representation step of the L1-graph construction process is solving the optimization problem given by Eq. (4). As we have discussed in Section 1, when the data point ai is linearly reconstructed with other data points, it is reasonable to assume that the reconstruction coefficients of every data point and its nearest neighbors should be similar in value. Since in most cases, we do not have prior knowledge about the

k-NN fused Lasso graph construction

Finally, we summarize the graph construction process for our k-NN fused Lasso graph as follows:

  • (1)

    Initialization: Choose the value of parameter k1,k2, s.t. k2k1. The input now is a kernel matrix K=[Kij]n×n derived from the set of data points a1,a2,,anRm.

  • (2)

    Sparse representation: For each data point ai, we determine its k1 nearest neighbors ai1,,aik2,,aik1. We construct the new regularization term with Eq. (12) and get the matrix Ci with Eqs. (13), (14), (15), (16). Let Bi=[Kipiq]k1×k1, where p,q=

Computational complexity analysis

In this section, we give a theoretical analysis of the computational complexity of the graph construction methods involved in this paper. Four different graph construction methods are used in this paper, namely the k-NN graph, the original L1-graph (which is described in Section 2), the L1-graph with k-NN method and kernel method (which is described in Section 3.2 and used in the experiments in Section 6 as L1-graph. We will refer to this as the new L1-graph for convenience), and our new k-NN

Experimental results

In this section, we conduct a variety of experiments to demonstrate the effectiveness of our k-NN fused Lasso graph and also its advantage over the traditional L1-graph. In this paper, we focus on testing our k-NN fused Lasso graph in the task of spectral clustering, regardless of many other graph-based machine learning tasks.

Conclusion

In this paper, we have proposed a new graph construction method based on the L1-graph and the recent development in sparse representation. Our main motivation is to overcome the traditional L1-graph's potential tendency to ignore the intrinsic structure of the data and help it convey more valuable information in order to improve its performance. We have incorporated the structured sparsity into our L1-graph construction, and more notably, the successful employment of k-NN method and kernel

Acknowledgments

This work was supported by National Natural Science Foundation of China under Grants 61073084 and 61202231, Beijing Natural Science Foundation of China under Grants 4122035 and 4132037, Ph.D. Programs Foundation of Ministry of Education of China under Grant 20120001110097, and National Hi-Tech Research and Development Program (863 Program) of China under Grant 2012AA012503.

Zhiwu Lu He received the M.Sc. degree in applied mathematics from Peking University, Beijing, China in 2005, and the Ph.D. degree in computer science from City University of Hong Kong in 2011. Since March 2011, he has become an assistant professor with the Institute of Computer Science and Technology, Peking University. He has published over 30 papers in international journals and conference proceedings including TIP, TSMC-B, TMM, AAAI, ICCV, CVPR, ECCV, and ACM-MM. His research interests lie

References (26)

  • J. Shi, J. Malik, Normalized cuts and image segmentation, in: Proceedings of CVPR, 1997, pp....
  • Z. Lu, H. Ip, Constrained spectral clustering via exhaustive and efficient constraint propagation, in: Proceedings of...
  • D. Zhou, O. Bousquet, T. Lal, J. Weston, B. Schölkopf, Learning with local and global consistency, in: Advances in...
  • X. Zhu, Z. Ghahramani, J. Lafferty, Semi-supervised learning using Gaussian fields and harmonic functions, in:...
  • S. Yan et al.

    Graph embedding and extensionsa general framework for dimensionality reduction

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2007)
  • X. Zhu, Semi-supervised learning literature survey,...
  • P.P. Talukdar, Topics in Graph Construction for Semi-Supervised Learning, Technical Report,...
  • J. Wright et al.

    Robust face recognition via sparse representation

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2009)
  • S. Yan, H. Wang, Semi-supervised learning by sparse representation, in: Proceedings of SDM, 2009, pp....
  • B. Cheng et al.

    Learning with l1-graph for image analysis

    IEEE Trans. Image Process.

    (2010)
  • R. Tibshirani

    Regression shrinkage and selection via the Lasso

    J. R. Stat. Soc. Ser. B

    (1996)
  • H. Zou et al.

    Regularization and variable selection via the elastic net

    J. R. Stat. Soc. Ser. B

    (2005)
  • R. Tibshirani et al.

    Sparsity and smoothness via the fused Lasso

    J. R. Stat. Soc. Ser. B

    (2005)
  • Cited by (24)

    • Discriminative sparse embedding based on adaptive graph for dimension reduction

      2020, Engineering Applications of Artificial Intelligence
    • Human action recognition using double discriminative sparsity preserving projections and discriminant ridge-based classifier based on the GDWL-l1 graph

      2020, Expert Systems with Applications
      Citation Excerpt :

      To fix this problem, (Yang, Wang, Yang, Han & Huang, 2014) regularized the l1- graph's problem by adding the geometric information of data using the graph Laplacian by a k-nearest-neighbor adjacency matrix. ( Zhou, Lu & Peng, 2013) introduced the k-nearest-neighbor fused LASSO graph method to incorporate the structured sparsity into graph construction. ( He et al., 2016) introduced a weighted sparse coding method to build the graph using Euclidean distance.

    • Sparse graphs with smoothness constraints: Application to dimensionality reduction and semi-supervised classification

      2019, Pattern Recognition
      Citation Excerpt :

      The one obtained with Eq. (2) is called standard sparse graph (ℓ1-s). In [34], the authors proposed k-nearest neighbor (k-NN) fused Lasso graph. This work integrates structured sparsity into the graph construction process.

    View all citing articles on Scopus

    Zhiwu Lu He received the M.Sc. degree in applied mathematics from Peking University, Beijing, China in 2005, and the Ph.D. degree in computer science from City University of Hong Kong in 2011. Since March 2011, he has become an assistant professor with the Institute of Computer Science and Technology, Peking University. He has published over 30 papers in international journals and conference proceedings including TIP, TSMC-B, TMM, AAAI, ICCV, CVPR, ECCV, and ACM-MM. His research interests lie in machine learning, computer vision, and multimedia information retrieval.

    Yuxin Peng He is the professor and director of Multimedia Information Processing Lab (MIPL) in the Institute of Computer Science and Technology (ICST), Peking University. He received his Ph.D. degree in computer application from School of Electronics Engineering and Computer Science (EECS), Peking University, in July 2003. After that, he worked as an assistant professor in ICST, Peking University. From August 2003 to November 2004, he was a visiting scholar with the Department of Computer Science, City University of Hong Kong. He was promoted to associate professor in Peking University in August 2005. In 2006, he was authorized by the “Program for New Star in Science and Technology of Beijing”, and the “Program for New Century Excellent Talents in University (NCET)”. In August 2010, he was promoted to professor in Peking University. He has published over 50 papers in international journals and conference proceedings including TCSVT, TIP, ACM-MM, ICCV, CVPR and AAAI. In 2009, he led his team to participate in TRECVID. In six tasks of the high-level feature extraction (HLFE) and search, his team won the first places in fours tasks and the second places in the left two tasks. Besides, he has obtained 12 patents. His current research interests include multimedia information retrieval, computer vision and pattern recognition.

    View full text