skip to main content
research-article

Parallel Sparse Subspace Clustering via Joint Sample and Parameter Blockwise Partition

Published: 09 May 2017 Publication History

Abstract

Sparse subspace clustering (SSC) is a classical method to cluster data with specific subspace structure for each group. It has many desirable theoretical properties and has been shown to be effective in various applications. However, under the condition of a large-scale dataset, learning the sparse sample affinity graph is computationally expensive. To tackle the computation time cost challenge, we develop a memory-efficient parallel framework for computing SSC via an alternating direction method of multiplier (ADMM) algorithm. The proposed framework partitions the data matrix into column blocks and then decomposes the original problem into parallel multivariate Lasso regression subproblems and samplewise operations. The proposed method allows us to allocate multiple cores/machines for the processing of individual column blocks. We propose a stochastic optimization algorithm to minimize the objective function. Experimental results on real-world datasets demonstrate that the proposed blockwise ADMM framework is substantially more efficient than its matrix counterpart used by SSC, without sacrificing performance in applications. Moreover, our approach is directly applicable to parallel neighborhood selection for Gaussian graphical models structure estimation.

References

[1]
Yoshua Bengio. 2009. Learning deep architectures for AI. Found. Trends Mach. Learn. 2, 1 (2009), 1--127.
[2]
Haithem Boussaid and Iasonas Kokkinos. 2014. Fast and exact: ADMM-based discriminative shape segmentation with loopy part models. In IEEE Conference on Computer Vision and Pattern Recognition.
[3]
Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein. 2011. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3, 1 (2011), 1--122.
[4]
Paul S. Bradley and Olvi L. Mangasarian. 2000. K-plane clustering. J. Global Optimiz. 16, 1 (2000), 23--32.
[5]
Wen-Yen Chen, Yangqiu Song, Hongjie Bai, Chih-Jen Lin, and Edward Y. Chang. 2011. Parallel spectral clustering in distributed systems. IEEE Trans. Pattern Anal. Mach. Intell. 33, 3 (2011), 568--586.
[6]
Bin Cheng, Jianchao Yang, Shuicheng Yan, Yun Fu, and Thomas S. Huang. 2010. Learning with ℓ1 graph for image analysis. IEEE Trans. Image Process. 19, 4 (2010), 858--866.
[7]
João Paulo Costeira and Takeo Kanade. 1998. A multibody factorization method for independently moving objects. Int. J. Comput. Vis. 29, 3 (1998), 159--179.
[8]
Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Andrew Senior, Paul Tucker, Ke Yang, Quoc V. Le, and others. 2012. Large scale distributed deep networks. In Advances in Neural Information Processing Systems.
[9]
Ehsan Elhamifar and René Vidal. 2009. Sparse subspace clustering. In IEEE Conference on Computer Vision and Pattern Recognition.
[10]
Ehsan Elhamifar and René Vidal. 2013. Sparse subspace clustering: Algorithm, theory, and applications. IEEE Trans. Pattern Anal. Mach. Intell. 35, 11 (2013), 2765--2781.
[11]
Jiashi Feng, Zhouchen Lin, Huan Xu, and Shuicheng Yan. 2014. Robust subspace segmentation with block-diagonal prior. In IEEE Conference on Computer Vision and Pattern Recognition.
[12]
Q. Fu, H. Wang, and A. Banerjee. 2013. Bethe-ADMM for tree decomposition based parallel MAP inference. In Conference on Uncertainty in Artificial Intelligence.
[13]
Hans P. Graf, Eric Cosatto, Leon Bottou, Igor Dourdanovic, and Vladimir Vapnik. 2004. Parallel support vector machines: The cascade SVM. In Advances in Neural Information Processing Systems.
[14]
Xiaofei He and Partha Niyogi. 2004. Locality preserving projections. In Neural Information Processing Systems.
[15]
Mingyi Hong, Zhi-Quan Luo, and Meisam Razaviyayn. 2016. Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems. SIAM J. Optimiz. 26, 1 (2016), 337--364.
[16]
Wei Hong, John Wright, Kun Huang, and Yi Ma. 2006. Multiscale hybrid linear models for lossy image representation. IEEE Trans. Image Process. 15, 12 (2006), 3655--3671.
[17]
Han Hu, Zhouchen Lin, Jianjiang Feng, and Jie Zhou. 2014. Smooth representation clustering. In IEEE Conference on Computer Vision and Pattern Recognition.
[18]
Ian Jolliffe. 2005. Principal Component Analysis. Wiley Online Library.
[19]
Tim Kraska, Ameet Talwalkar, John C. Duchi, Rean Griffith, Michael J. Franklin, and Michael I. Jordan. 2013. MLbase: A distributed machine-learning system. In Biennial Conference on Innovative Data Systems Research.
[20]
Alex Krizhevsky and Geoffrey Hinton. 2009. Learning multiple layers of features from tiny images. (2009).
[21]
Hanjiang Lai, Yan Pan, Canyi Lu, Yong Tang, and Shuicheng Yan. 2014. Efficient k-support matrix pursuit. In European Conference on Computer Vision.
[22]
Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278--2324.
[23]
Baohua Li, Ying Zhang, Zhouchen Lin, and Huchuan Lu. 2015. Subspace clustering by mixture of gaussian regression. In IEEE Conference on Computer Vision and Pattern Recognition.
[24]
Mu Li, Li Zhou, Zichao Yang, Aaron Li, Fei Xia, David G. Andersen, and Alexander Smola. 2013. Parameter server for distributed machine learning. In The Big Learning Workshop at Advances in Neural Information Processing Systems.
[25]
Yingyu Liang, Maria-Florina Balcan, and Vandana Kanchanapally. 2013. Distributed PCA and k-means clustering. In The Big Learning Workshop at Advances in Neural Information Processing Systems.
[26]
Bo Liu, Meng Wang, Richang Hong, Zhengjun Zha, and Xian-Sheng Hua. 2010. Joint learning of labels and distance metric. IEEE Trans. Syst. Man Cybernet. B 40, 3 (2010), 973--978.
[27]
Bo Liu, Xiao-Tong Yuan, Yang Yu, Qingshan Liu, and Dimitris N. Metaxas. 2016. Decentralized robust subspace clustering. In The AAAI Conference on Artificial Intelligence.
[28]
Guangcan Liu, Zhouchen Lin, Shuicheng Yan, Ju Sun, Yong Yu, and Yi Ma. 2013. Robust recovery of subspace structures by low-rank representation. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1 (2013), 171--184.
[29]
Canyi Lu, Jiashi Feng, Zhouchen Lin, and Shuicheng Yan. 2013. Correlation adaptive subspace segmentation by trace lasso. In IEEE International Conference on Computer Vision.
[30]
Canyi Lu, Hai Min, Zhong-Qiu Zhao, Lin Zhu, De-Shuang Huang, and Shuicheng Yan. 2012. Robust and efficient subspace segmentation via least squares regression. In European Conference on Computer Vision.
[31]
Dijun Luo, Feiping Nie, Chris Ding, and Heng Huang. 2011. Multi-subspace representation and discovery. In Machine Learning and Knowledge Discovery in Databases. Springer, 405--420.
[32]
Yi Ma, Allen Y. Yang, Harm Derksen, and Robert Fossum. 2008. Estimation of subspace arrangements with applications in modeling and segmenting mixed data. SIAM Rev. 50, 3 (2008), 413--458.
[33]
Nicolai Meinshausen and Peter Bühlmann. 2006. High-dimensional graphs and variable selection with the lasso. Ann. Stat. 34, 3 (2006), 1436--1462.
[34]
Andrew Y. Ng, Michael I. Jordan, and Yair Weiss. 2002. On spectral clustering: Analysis and an algorithm. Advances in Neural Information Processing Systems (2002).
[35]
Robert Nishihara, Laurent Lessard, Benjamin Recht, Andrew Packard, and Michael I. Jordan. 2015. A general analysis of the convergence of ADMM. In International Conference on Machine Learning.
[36]
Feng Niu, Benjamin Retcht, Christopher Ré, and Stephen J. Wright. 2011. Hogwild! a lock-free approach to parallelizing stochastic gradient descent. In Advances in Neural Information Processing Systems.
[37]
Aude Oliva and Antonio Torralba. 2001. Modeling the shape of the scene: A holistic representation of the spatial envelope. Int. J. Comput. Vis. 42, 3 (2001), 145--175.
[38]
Dohyung Park, Constantine Caramanis, and Sujay Sanghavi. 2014. Greedy subspace clustering. In Advances in Neural Information Processing Systems.
[39]
Lance Parsons, Ehtesham Haque, and Huan Liu. 2004. Subspace clustering for high dimensional data: A review. ACM SIGKDD Explor. Newslett. 6, 1 (2004), 90--105.
[40]
Yigang Peng, Arvind Ganesh, John Wright, Wenli Xu, and Yi Ma. 2012. RASL: Robust alignment by sparse and low-rank decomposition for linearly correlated images. IEEE Trans. Pattern Anal. Mach. Intell. 34, 11 (2012), 2233--2246.
[41]
Rajat Raina, Anand Madhavan, and Andrew Y. Ng. 2009. Large-scale deep unsupervised learning using graphics processors. In International Conference on Machine Learning.
[42]
Jianbo Shi and Jitendra Malik. 2000. Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22, 8 (2000), 888--905.
[43]
Wei Shi, Qing Ling, Kun Yuan, Gang Wu, and Wotao Yin. 2014. On the linear convergence of the ADMM in decentralized consensus optimization. IEEE Trans. Signal Process. 62, 7 (2014), 1750--1761.
[44]
Mahdi Soltanolkotabi, Emmanuel J. Candes, and others. 2012. A geometric analysis of subspace clustering with outliers. The Annals of Statistics 40, 4 (2012), 2195--2238.
[45]
Evan R. Sparks, Ameet Talwalkar, Virginia Smith, Jey Kottalam, Xinghao Pan, Joseph Gonzalez, Michael J. Franklin, Michael I. Jordan, and Tim Kraska. 2013. MLI: An API for distributed machine learning. In IEEE International Conference on Data Mining.
[46]
Martin Szummer and Martin Szummer. 2002. Partially labeled classification with Markov random walks. Advances in Neural Information Processing Systems (2002).
[47]
Jinhui Tang, Richang Hong, Shuicheng Yan, Tat-Seng Chua, Guo-Jun Qi, and Ramesh Jain. 2011. Image annotation by kNN-sparse graph-based label propagation over noisily tagged web images. ACM Trans. Intell. Syst. Technol. 2, 2 (2011), 14.
[48]
Ameet Talwalkar, Tim Kraska, Rean Griffith, John Duchi, Joseph Gonzalez, Denny Britz, Xinghao Pan, Virginia Smith, Evan Sparks, Andre Wibisono, Michael J. Franklin, and Michael I. Jordan. 2012. MLbase: A distributed machine learning wrapper. NIPS Big Learning Workshop.
[49]
Stephen Tierney, Junbin Gao, and Yi Guo. 2014. Subspace clustering for sequential data. In IEEE Conference on Computer Vision and Pattern Recognition.
[50]
Michael Tipping and Christopher Bishop. 1999. Mixtures of probabilistic principal component analyzers. Neur. Comput. 11, 2 (1999), 443--482.
[51]
René Vidal. 2010. A tutorial on subspace clustering. IEEE Signal Process. Mag. 28, 2 (2010), 52--68.
[52]
René Vidal, Stefano Soatto, Yi Ma, and Shankar Sastry. 2003. An algebraic geometric approach to the identification of a class of linear hybrid systems. In IEEE Conference on Decision and Control.
[53]
Huahua Wang, Arindam Banerjee, Cho-Jui Hsieh, Pradeep Ravikumar, and Inderjit Dhillon. 2013. Large scale distributed sparse precision estimation. In Advances in Neural Information Processing Systems.
[54]
Meng Wang, Xian-Sheng Hua, Richang Hong, Jinhui Tang, Guo-Jun Qi, and Yan Song. 2009. Unified video annotation via multigraph learning. IEEE Trans. Circ. Syst. Video Technol. 19, 5 (2009), 733--746.
[55]
Meng Wang, Xueliang Liu, and Xindong Wu. 2015a. Visual classification by ℓ1-hypergraph modeling. IEEE Trans. Knowl. Data Eng. 27, 9 (2015), 2564--2574.
[56]
Yu Wang, David Wipf, Qing Ling, Wei Chen, and Ian Wassell. 2015b. Multi-task learning for subspace segmentation. (2015).
[57]
Ermin Wei and Asuman Ozdaglar. 2013. On the O(1/k) convergence of asynchronous distributed alternating direction method of multipliers. In IEEE Global Conference on Signal and Information Processing.
[58]
Shuicheng Yan and Huan Wang. 2009. Semi-supervised learning by sparse representation. In SIAM International Conference on Data Mining.
[59]
Allen Y. Yang, John Wright, Yi Ma, and Shankar Sastry. 2008. Unsupervised segmentation of natural images via lossy data compression. Comput. Vision Image Underst. 110, 2 (2008), 212--225.
[60]
Sen Yang, Jie Wang, Wei Fan, Xiatian Zhang, Peter Wonka, and Jieping Ye. 2013. An efficient ADMM algorithm for multidimensional anisotropic total variation regularization problems. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
[61]
Zhi-Qin Yu, Xing-Jian Shi, Ling Yan, and Wu-Jun Li. 2014. Distributed stochastic ADMM for matrix factorization. In ACM International Conference on Conference on Information and Knowledge Management.
[62]
Xiao-Tong Yuan and Ping Li. 2014. Sparse additive subspace clustering. In European Conference on Computer Vision.
[63]
Caoxie Zhang, Honglak Lee, and Kang G. Shin. 2012. Efficient distributed linear classification algorithms via the alternating direction method of multipliers. In International Conference on Artificial Intelligence and Statistics.
[64]
Ruiliang Zhang and James Kwok. 2014a. Asynchronous distributed ADMM algorithm for Global Variable Consensus Optimization. In International Conference on Machine Learning.
[65]
Ruiliang Zhang and James Kwok. 2014b. Asynchronous distributed ADMM for consensus optimization. In International Conference on Machine Learning.
[66]
Dengyong Zhou, Olivier Bousquet, Thomas Navin Lal, Jason Weston, and Bernhard Schölkopf. 2004. Learning with local and global consistency. In Advances in Neural Information Processing Systems.
[67]
Xiaojin Zhu, Zoubin Ghahramani, John Lafferty, and others. 2003. Semi-supervised learning using gaussian fields and harmonic functions. In International Conference on Machine Learning.

Cited By

View all
  • (2023)Integration of Rural Revitalization Strategy and Modernized Agricultural Governance Based on Intelligent Big Data AnalysisApplied Mathematics and Nonlinear Sciences10.2478/amns.2023.2.014049:1Online publication date: 6-Dec-2023
  • (2023)Fed-SC: One-Shot Federated Subspace Clustering over High-Dimensional Data2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00222(2905-2918)Online publication date: Apr-2023
  • (2022)Large-Scale Subspace Clustering by Independent Distributed and Parallel CodingIEEE Transactions on Cybernetics10.1109/TCYB.2021.305205652:9(9090-9100)Online publication date: Sep-2022
  • Show More Cited By

Index Terms

  1. Parallel Sparse Subspace Clustering via Joint Sample and Parameter Blockwise Partition

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Embedded Computing Systems
      ACM Transactions on Embedded Computing Systems  Volume 16, Issue 3
      Special Issue on Embedded Computing for IoT, Special Issue on Big Data and Regular Papers
      August 2017
      610 pages
      ISSN:1539-9087
      EISSN:1558-3465
      DOI:10.1145/3072970
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Journal Family

      Publication History

      Published: 09 May 2017
      Accepted: 01 February 2017
      Revised: 01 January 2017
      Received: 01 January 2016
      Published in TECS Volume 16, Issue 3

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Parallel optimization
      2. semi-supervised learning
      3. sparsity
      4. subspace clustering

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      • Natural Science Foundation of Jiangsu Province of China
      • National Natural Science Foundation of China

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)7
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 08 Mar 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Integration of Rural Revitalization Strategy and Modernized Agricultural Governance Based on Intelligent Big Data AnalysisApplied Mathematics and Nonlinear Sciences10.2478/amns.2023.2.014049:1Online publication date: 6-Dec-2023
      • (2023)Fed-SC: One-Shot Federated Subspace Clustering over High-Dimensional Data2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00222(2905-2918)Online publication date: Apr-2023
      • (2022)Large-Scale Subspace Clustering by Independent Distributed and Parallel CodingIEEE Transactions on Cybernetics10.1109/TCYB.2021.305205652:9(9090-9100)Online publication date: Sep-2022
      • (2020)High-Dimensional Clustering for Incomplete Mixed Dataset Using Artificial IntelligenceIEEE Access10.1109/ACCESS.2020.29868138(69629-69638)Online publication date: 2020
      • (2019)Generation of in-bounds inputs for arrays in memory-unsafe languagesProceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization10.5555/3314872.3314890(136-148)Online publication date: 16-Feb-2019
      • (2018)Community detection method based on mixed-norm sparse subspace clusteringNeurocomputing10.1016/j.neucom.2017.10.060275:C(2150-2161)Online publication date: 31-Jan-2018
      • (2018)Quantum technique for access control in cloud computing IIJournal of Network and Computer Applications10.1016/j.jnca.2017.11.012103:C(178-184)Online publication date: 1-Feb-2018
      • (2017)A distributed parallel algorithm for inferring hierarchical groups from large‐scale text corpusesConcurrency and Computation: Practice and Experience10.1002/cpe.440430:11Online publication date: 20-Dec-2017

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media