research-article

Parallel Sparse Subspace Clustering via Joint Sample and Parameter Blockwise Partition

Authors:

Xiao-Tong Yuan,

Dimitris N. MetaxasAuthors Info & Claims

ACM Transactions on Embedded Computing Systems (TECS), Volume 16, Issue 3

Article No.: 75, Pages 1 - 17

https://doi.org/10.1145/3063316

Published: 09 May 2017 Publication History

Abstract

Sparse subspace clustering (SSC) is a classical method to cluster data with specific subspace structure for each group. It has many desirable theoretical properties and has been shown to be effective in various applications. However, under the condition of a large-scale dataset, learning the sparse sample affinity graph is computationally expensive. To tackle the computation time cost challenge, we develop a memory-efficient parallel framework for computing SSC via an alternating direction method of multiplier (ADMM) algorithm. The proposed framework partitions the data matrix into column blocks and then decomposes the original problem into parallel multivariate Lasso regression subproblems and samplewise operations. The proposed method allows us to allocate multiple cores/machines for the processing of individual column blocks. We propose a stochastic optimization algorithm to minimize the objective function. Experimental results on real-world datasets demonstrate that the proposed blockwise ADMM framework is substantially more efficient than its matrix counterpart used by SSC, without sacrificing performance in applications. Moreover, our approach is directly applicable to parallel neighborhood selection for Gaussian graphical models structure estimation.

References

[1]

Yoshua Bengio. 2009. Learning deep architectures for AI. Found. Trends Mach. Learn. 2, 1 (2009), 1--127.

Digital Library

[2]

Haithem Boussaid and Iasonas Kokkinos. 2014. Fast and exact: ADMM-based discriminative shape segmentation with loopy part models. In IEEE Conference on Computer Vision and Pattern Recognition.

Digital Library

[3]

Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein. 2011. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3, 1 (2011), 1--122.

Digital Library

[4]

Paul S. Bradley and Olvi L. Mangasarian. 2000. K-plane clustering. J. Global Optimiz. 16, 1 (2000), 23--32.

Digital Library

[5]

Wen-Yen Chen, Yangqiu Song, Hongjie Bai, Chih-Jen Lin, and Edward Y. Chang. 2011. Parallel spectral clustering in distributed systems. IEEE Trans. Pattern Anal. Mach. Intell. 33, 3 (2011), 568--586.

Digital Library

[6]

Bin Cheng, Jianchao Yang, Shuicheng Yan, Yun Fu, and Thomas S. Huang. 2010. Learning with &ell;₁ graph for image analysis. IEEE Trans. Image Process. 19, 4 (2010), 858--866.

Digital Library

[7]

João Paulo Costeira and Takeo Kanade. 1998. A multibody factorization method for independently moving objects. Int. J. Comput. Vis. 29, 3 (1998), 159--179.

Digital Library

[8]

Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Andrew Senior, Paul Tucker, Ke Yang, Quoc V. Le, and others. 2012. Large scale distributed deep networks. In Advances in Neural Information Processing Systems.

Digital Library

[9]

Ehsan Elhamifar and René Vidal. 2009. Sparse subspace clustering. In IEEE Conference on Computer Vision and Pattern Recognition.

[10]

Ehsan Elhamifar and René Vidal. 2013. Sparse subspace clustering: Algorithm, theory, and applications. IEEE Trans. Pattern Anal. Mach. Intell. 35, 11 (2013), 2765--2781.

Digital Library

[11]

Jiashi Feng, Zhouchen Lin, Huan Xu, and Shuicheng Yan. 2014. Robust subspace segmentation with block-diagonal prior. In IEEE Conference on Computer Vision and Pattern Recognition.

Digital Library

[12]

Q. Fu, H. Wang, and A. Banerjee. 2013. Bethe-ADMM for tree decomposition based parallel MAP inference. In Conference on Uncertainty in Artificial Intelligence.

Digital Library

[13]

Hans P. Graf, Eric Cosatto, Leon Bottou, Igor Dourdanovic, and Vladimir Vapnik. 2004. Parallel support vector machines: The cascade SVM. In Advances in Neural Information Processing Systems.

Digital Library

[14]

Xiaofei He and Partha Niyogi. 2004. Locality preserving projections. In Neural Information Processing Systems.

[15]

Mingyi Hong, Zhi-Quan Luo, and Meisam Razaviyayn. 2016. Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems. SIAM J. Optimiz. 26, 1 (2016), 337--364.

[16]

Wei Hong, John Wright, Kun Huang, and Yi Ma. 2006. Multiscale hybrid linear models for lossy image representation. IEEE Trans. Image Process. 15, 12 (2006), 3655--3671.

Digital Library

[17]

Han Hu, Zhouchen Lin, Jianjiang Feng, and Jie Zhou. 2014. Smooth representation clustering. In IEEE Conference on Computer Vision and Pattern Recognition.

Digital Library

[18]

Ian Jolliffe. 2005. Principal Component Analysis. Wiley Online Library.

[19]

Tim Kraska, Ameet Talwalkar, John C. Duchi, Rean Griffith, Michael J. Franklin, and Michael I. Jordan. 2013. MLbase: A distributed machine-learning system. In Biennial Conference on Innovative Data Systems Research.

[20]

Alex Krizhevsky and Geoffrey Hinton. 2009. Learning multiple layers of features from tiny images. (2009).

[21]

Hanjiang Lai, Yan Pan, Canyi Lu, Yong Tang, and Shuicheng Yan. 2014. Efficient k-support matrix pursuit. In European Conference on Computer Vision.

[22]

Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278--2324.

[23]

Baohua Li, Ying Zhang, Zhouchen Lin, and Huchuan Lu. 2015. Subspace clustering by mixture of gaussian regression. In IEEE Conference on Computer Vision and Pattern Recognition.

[24]

Mu Li, Li Zhou, Zichao Yang, Aaron Li, Fei Xia, David G. Andersen, and Alexander Smola. 2013. Parameter server for distributed machine learning. In The Big Learning Workshop at Advances in Neural Information Processing Systems.

[25]

Yingyu Liang, Maria-Florina Balcan, and Vandana Kanchanapally. 2013. Distributed PCA and k-means clustering. In The Big Learning Workshop at Advances in Neural Information Processing Systems.

[26]

Bo Liu, Meng Wang, Richang Hong, Zhengjun Zha, and Xian-Sheng Hua. 2010. Joint learning of labels and distance metric. IEEE Trans. Syst. Man Cybernet. B 40, 3 (2010), 973--978.

Digital Library

[27]

Bo Liu, Xiao-Tong Yuan, Yang Yu, Qingshan Liu, and Dimitris N. Metaxas. 2016. Decentralized robust subspace clustering. In The AAAI Conference on Artificial Intelligence.

Digital Library

[28]

Guangcan Liu, Zhouchen Lin, Shuicheng Yan, Ju Sun, Yong Yu, and Yi Ma. 2013. Robust recovery of subspace structures by low-rank representation. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1 (2013), 171--184.

Digital Library

[29]

Canyi Lu, Jiashi Feng, Zhouchen Lin, and Shuicheng Yan. 2013. Correlation adaptive subspace segmentation by trace lasso. In IEEE International Conference on Computer Vision.

Digital Library

[30]

Canyi Lu, Hai Min, Zhong-Qiu Zhao, Lin Zhu, De-Shuang Huang, and Shuicheng Yan. 2012. Robust and efficient subspace segmentation via least squares regression. In European Conference on Computer Vision.

Digital Library

[31]

Dijun Luo, Feiping Nie, Chris Ding, and Heng Huang. 2011. Multi-subspace representation and discovery. In Machine Learning and Knowledge Discovery in Databases. Springer, 405--420.

Digital Library

[32]

Yi Ma, Allen Y. Yang, Harm Derksen, and Robert Fossum. 2008. Estimation of subspace arrangements with applications in modeling and segmenting mixed data. SIAM Rev. 50, 3 (2008), 413--458.

Digital Library

[33]

Nicolai Meinshausen and Peter Bühlmann. 2006. High-dimensional graphs and variable selection with the lasso. Ann. Stat. 34, 3 (2006), 1436--1462.

[34]

Andrew Y. Ng, Michael I. Jordan, and Yair Weiss. 2002. On spectral clustering: Analysis and an algorithm. Advances in Neural Information Processing Systems (2002).

Digital Library

[35]

Robert Nishihara, Laurent Lessard, Benjamin Recht, Andrew Packard, and Michael I. Jordan. 2015. A general analysis of the convergence of ADMM. In International Conference on Machine Learning.

Digital Library

[36]

Feng Niu, Benjamin Retcht, Christopher Ré, and Stephen J. Wright. 2011. Hogwild&excl; a lock-free approach to parallelizing stochastic gradient descent. In Advances in Neural Information Processing Systems.

Digital Library

[37]

Aude Oliva and Antonio Torralba. 2001. Modeling the shape of the scene: A holistic representation of the spatial envelope. Int. J. Comput. Vis. 42, 3 (2001), 145--175.

Digital Library

[38]

Dohyung Park, Constantine Caramanis, and Sujay Sanghavi. 2014. Greedy subspace clustering. In Advances in Neural Information Processing Systems.

Digital Library

[39]

Lance Parsons, Ehtesham Haque, and Huan Liu. 2004. Subspace clustering for high dimensional data: A review. ACM SIGKDD Explor. Newslett. 6, 1 (2004), 90--105.

Digital Library

[40]

Yigang Peng, Arvind Ganesh, John Wright, Wenli Xu, and Yi Ma. 2012. RASL: Robust alignment by sparse and low-rank decomposition for linearly correlated images. IEEE Trans. Pattern Anal. Mach. Intell. 34, 11 (2012), 2233--2246.

Digital Library

[41]

Rajat Raina, Anand Madhavan, and Andrew Y. Ng. 2009. Large-scale deep unsupervised learning using graphics processors. In International Conference on Machine Learning.

Digital Library

[42]

Jianbo Shi and Jitendra Malik. 2000. Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22, 8 (2000), 888--905.

Digital Library

[43]

Wei Shi, Qing Ling, Kun Yuan, Gang Wu, and Wotao Yin. 2014. On the linear convergence of the ADMM in decentralized consensus optimization. IEEE Trans. Signal Process. 62, 7 (2014), 1750--1761.

Digital Library

[44]

Mahdi Soltanolkotabi, Emmanuel J. Candes, and others. 2012. A geometric analysis of subspace clustering with outliers. The Annals of Statistics 40, 4 (2012), 2195--2238.

[45]

Evan R. Sparks, Ameet Talwalkar, Virginia Smith, Jey Kottalam, Xinghao Pan, Joseph Gonzalez, Michael J. Franklin, Michael I. Jordan, and Tim Kraska. 2013. MLI: An API for distributed machine learning. In IEEE International Conference on Data Mining.

[46]

Martin Szummer and Martin Szummer. 2002. Partially labeled classification with Markov random walks. Advances in Neural Information Processing Systems (2002).

Digital Library

[47]

Jinhui Tang, Richang Hong, Shuicheng Yan, Tat-Seng Chua, Guo-Jun Qi, and Ramesh Jain. 2011. Image annotation by kNN-sparse graph-based label propagation over noisily tagged web images. ACM Trans. Intell. Syst. Technol. 2, 2 (2011), 14.

Digital Library

[48]

Ameet Talwalkar, Tim Kraska, Rean Griffith, John Duchi, Joseph Gonzalez, Denny Britz, Xinghao Pan, Virginia Smith, Evan Sparks, Andre Wibisono, Michael J. Franklin, and Michael I. Jordan. 2012. MLbase: A distributed machine learning wrapper. NIPS Big Learning Workshop.

[49]

Stephen Tierney, Junbin Gao, and Yi Guo. 2014. Subspace clustering for sequential data. In IEEE Conference on Computer Vision and Pattern Recognition.

Digital Library

[50]

Michael Tipping and Christopher Bishop. 1999. Mixtures of probabilistic principal component analyzers. Neur. Comput. 11, 2 (1999), 443--482.

Digital Library

[51]

René Vidal. 2010. A tutorial on subspace clustering. IEEE Signal Process. Mag. 28, 2 (2010), 52--68.

[52]

René Vidal, Stefano Soatto, Yi Ma, and Shankar Sastry. 2003. An algebraic geometric approach to the identification of a class of linear hybrid systems. In IEEE Conference on Decision and Control.

[53]

Huahua Wang, Arindam Banerjee, Cho-Jui Hsieh, Pradeep Ravikumar, and Inderjit Dhillon. 2013. Large scale distributed sparse precision estimation. In Advances in Neural Information Processing Systems.

Digital Library

[54]

Meng Wang, Xian-Sheng Hua, Richang Hong, Jinhui Tang, Guo-Jun Qi, and Yan Song. 2009. Unified video annotation via multigraph learning. IEEE Trans. Circ. Syst. Video Technol. 19, 5 (2009), 733--746.

Digital Library

[55]

Meng Wang, Xueliang Liu, and Xindong Wu. 2015a. Visual classification by &ell;₁-hypergraph modeling. IEEE Trans. Knowl. Data Eng. 27, 9 (2015), 2564--2574.

[56]

Yu Wang, David Wipf, Qing Ling, Wei Chen, and Ian Wassell. 2015b. Multi-task learning for subspace segmentation. (2015).

[57]

Ermin Wei and Asuman Ozdaglar. 2013. On the O(1/k) convergence of asynchronous distributed alternating direction method of multipliers. In IEEE Global Conference on Signal and Information Processing.

[58]

Shuicheng Yan and Huan Wang. 2009. Semi-supervised learning by sparse representation. In SIAM International Conference on Data Mining.

[59]

Allen Y. Yang, John Wright, Yi Ma, and Shankar Sastry. 2008. Unsupervised segmentation of natural images via lossy data compression. Comput. Vision Image Underst. 110, 2 (2008), 212--225.

Digital Library

[60]

Sen Yang, Jie Wang, Wei Fan, Xiatian Zhang, Peter Wonka, and Jieping Ye. 2013. An efficient ADMM algorithm for multidimensional anisotropic total variation regularization problems. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.

Digital Library

[61]

Zhi-Qin Yu, Xing-Jian Shi, Ling Yan, and Wu-Jun Li. 2014. Distributed stochastic ADMM for matrix factorization. In ACM International Conference on Conference on Information and Knowledge Management.

Digital Library

[62]

Xiao-Tong Yuan and Ping Li. 2014. Sparse additive subspace clustering. In European Conference on Computer Vision.

[63]

Caoxie Zhang, Honglak Lee, and Kang G. Shin. 2012. Efficient distributed linear classification algorithms via the alternating direction method of multipliers. In International Conference on Artificial Intelligence and Statistics.

[64]

Ruiliang Zhang and James Kwok. 2014a. Asynchronous distributed ADMM algorithm for Global Variable Consensus Optimization. In International Conference on Machine Learning.

Digital Library

[65]

Ruiliang Zhang and James Kwok. 2014b. Asynchronous distributed ADMM for consensus optimization. In International Conference on Machine Learning.

Digital Library

[66]

Dengyong Zhou, Olivier Bousquet, Thomas Navin Lal, Jason Weston, and Bernhard Schölkopf. 2004. Learning with local and global consistency. In Advances in Neural Information Processing Systems.

Digital Library

[67]

Xiaojin Zhu, Zoubin Ghahramani, John Lafferty, and others. 2003. Semi-supervised learning using gaussian fields and harmonic functions. In International Conference on Machine Learning.

Digital Library

Cited By

Yin W(2023)Integration of Rural Revitalization Strategy and Modernized Agricultural Governance Based on Intelligent Big Data AnalysisApplied Mathematics and Nonlinear Sciences10.2478/amns.2023.2.014049:1Online publication date: 6-Dec-2023
https://doi.org/10.2478/amns.2023.2.01404
Xie SWu YLiao KChen LLiu CShen HTang MSun L(2023)Fed-SC: One-Shot Federated Subspace Clustering over High-Dimensional Data2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00222(2905-2918)Online publication date: Apr-2023
https://doi.org/10.1109/ICDE55515.2023.00222
Li JTao ZWu YZhong BFu Y(2022)Large-Scale Subspace Clustering by Independent Distributed and Parallel CodingIEEE Transactions on Cybernetics10.1109/TCYB.2021.305205652:9(9090-9100)Online publication date: Sep-2022
https://doi.org/10.1109/TCYB.2021.3052056
Show More Cited By

Index Terms

Parallel Sparse Subspace Clustering via Joint Sample and Parameter Blockwise Partition
1. Computing methodologies
  1. Machine learning
  2. Parallel computing methodologies

Recommendations

Subspace clustering based on latent low rank representation with Frobenius norm minimization

The problem of subspace clustering which refers to segmenting a collection of data samples approximately drawn from a union of linear subspaces is considered in this paper. Among existing subspace clustering algorithms, low rank representation (LRR) ...
Robust Sparse and Low-rank Subspace Clustering with Log-based Nonconvex Approximations
ICBDT '22: Proceedings of the 5th International Conference on Big Data Technologies

In this paper, we propose a new subspace clustering method that learns a simultaneous low-rank and sparse representation matrix from data. Other than using the nuclear and ℓ1 norms, nonconvex low-rank and sparse approximations are adopted to more ...
Sparse sample self-representation for subspace clustering

This paper proposes a new subspace clustering method based on sparse sample self-representation (SSR). The proposed method considers SSR to solve the problem that affinity matrix does not strictly follow the structure of subspace, and also utilizes ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems

ACM Transactions on Embedded Computing Systems Volume 16, Issue 3

Special Issue on Embedded Computing for IoT, Special Issue on Big Data and Regular Papers

August 2017

610 pages

ISSN:1539-9087

EISSN:1558-3465

DOI:10.1145/3072970

Editor:
Sandeep K. Shukla
Indian Institute of Technology, India

Issue’s Table of Contents

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 09 May 2017

Accepted: 01 February 2017

Revised: 01 January 2017

Received: 01 January 2016

Published in TECS Volume 16, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Natural Science Foundation of Jiangsu Province of China
National Natural Science Foundation of China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
235
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)2

Reflects downloads up to 08 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yin W(2023)Integration of Rural Revitalization Strategy and Modernized Agricultural Governance Based on Intelligent Big Data AnalysisApplied Mathematics and Nonlinear Sciences10.2478/amns.2023.2.014049:1Online publication date: 6-Dec-2023
https://doi.org/10.2478/amns.2023.2.01404
Xie SWu YLiao KChen LLiu CShen HTang MSun L(2023)Fed-SC: One-Shot Federated Subspace Clustering over High-Dimensional Data2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00222(2905-2918)Online publication date: Apr-2023
https://doi.org/10.1109/ICDE55515.2023.00222
Li JTao ZWu YZhong BFu Y(2022)Large-Scale Subspace Clustering by Independent Distributed and Parallel CodingIEEE Transactions on Cybernetics10.1109/TCYB.2021.305205652:9(9090-9100)Online publication date: Sep-2022
https://doi.org/10.1109/TCYB.2021.3052056
Li MLi XLi J(2020)High-Dimensional Clustering for Incomplete Mixed Dataset Using Artificial IntelligenceIEEE Access10.1109/ACCESS.2020.29868138(69629-69638)Online publication date: 2020
https://doi.org/10.1109/ACCESS.2020.2986813
Rodrigues MGuimarães BPereira FKandemir MJimborean AMoseley T(2019)Generation of in-bounds inputs for arrays in memory-unsafe languagesProceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization10.5555/3314872.3314890(136-148)Online publication date: 16-Feb-2019
https://dl.acm.org/doi/10.5555/3314872.3314890
Tian BLi W(2018)Community detection method based on mixed-norm sparse subspace clusteringNeurocomputing10.1016/j.neucom.2017.10.060275:C(2150-2161)Online publication date: 31-Jan-2018
https://dl.acm.org/doi/10.1016/j.neucom.2017.10.060
Zhou LWang QSun XKulicki PCastiglione A(2018)Quantum technique for access control in cloud computing IIJournal of Network and Computer Applications10.1016/j.jnca.2017.11.012103:C(178-184)Online publication date: 1-Feb-2018
https://dl.acm.org/doi/10.1016/j.jnca.2017.11.012
Seshadri KS. Mercy SManohar S(2017)A distributed parallel algorithm for inferring hierarchical groups from large‐scale text corpusesConcurrency and Computation: Practice and Experience10.1002/cpe.440430:11Online publication date: 20-Dec-2017
https://doi.org/10.1002/cpe.4404

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents