Abstract
Subspace clustering methods partition the data that lie in or close to a union of subspaces in accordance with the subspace structure. Such methods with sparsity prior, such as sparse subspace clustering (SSC) (Elhamifar and Vidal in IEEE Trans Pattern Anal Mach Intell 35(11):2765–2781, 2013) with the sparsity induced by the \(\ell ^{1}\)-norm, are demonstrated to be effective in subspace clustering. Most of those methods require certain assumptions, e.g. independence or disjointness, on the subspaces. However, these assumptions are not guaranteed to hold in practice and they limit the application of existing sparse subspace clustering methods. In this paper, we propose \(\ell ^{0}\)-induced sparse subspace clustering (\(\ell ^{0}\)-SSC). In contrast to the required assumptions, such as independence or disjointness, on subspaces for most existing sparse subspace clustering methods, we prove that \(\ell ^{0}\)-SSC guarantees the subspace-sparse representation, a key element in subspace clustering, for arbitrary distinct underlying subspaces almost surely under the mild i.i.d. assumption on the data generation. We also present the “no free lunch” theorem which shows that obtaining the subspace representation under our general assumptions can not be much computationally cheaper than solving the corresponding \(\ell ^{0}\) sparse representation problem of \(\ell ^{0}\)-SSC. A novel approximate algorithm named Approximate \(\ell ^{0}\)-SSC (A\(\ell ^{0}\)-SSC) is developed which employs proximal gradient descent to obtain a sub-optimal solution to the optimization problem of \(\ell ^{0}\)-SSC with theoretical guarantee. The sub-optimal solution is used to build a sparse similarity matrix upon which spectral clustering is performed for the final clustering results. Extensive experimental results on various data sets demonstrate the superiority of A\(\ell ^{0}\)-SSC compared to other competing clustering methods. Furthermore, we extend \(\ell ^{0}\)-SSC to semi-supervised learning by performing label propagation on the sparse similarity matrix learnt by A\(\ell ^{0}\)-SSC and demonstrate the effectiveness of the resultant semi-supervised learning method termed \(\ell ^{0}\)-sparse subspace label propagation (\(\ell ^{0}\)-SSLP).





Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
Continuous distribution here indicates that the data distribution is non-degenerate in the sense that the probability measure of any hyperplane of dimension less than that of the subspace is 0.
References
Asuncion, A. D. N. (2007). UCI Machine learning repository. http://www.ics.uci.edu/~mlearn/MLRepository.html.
Bhatia, K., Jain, P., & Kar, P. (2015). Robust regression via hard thresholding. In Advances in neural information processing systems 28: Annual conference on neural information processing systems 2015 (pp 721–729). Montreal, December 7–12, 2015.
Bolte, J., Sabach, S., & Teboulle, M. (2014). Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Mathematical Programming, 146(1–2), 459–494.
Candes, E., & Tao, T. (2005). Decoding by linear programming. IEEE Transactions on Information Theory, 51(12), 4203–4215.
Cands, E. J. (2008). The restricted isometry property and its implications for compressed sensing. Comptes Rendus Mathematique, 346(910), 589–592.
Cheng, H., Liu, Z., Yang, L., & Chen, X. (2013). Sparse representation and learning in visual recognition: Theory and applications. Signal Processing, 93(6), 1408–1425.
Cheng, B., Yang, J., Yan, S., Fu, Y., & Huang, T. S. (2010). Learning with l1-graph for image analysis. IEEE Transactions on Image Processing, 19(4), 858–866.
Dyer, E. L., Sankaranarayanan, A. C., & Baraniuk, R. G. (2013). Greedy feature selection for subspace clustering. Journal of Machine Learning Research, 14, 2487–2517.
Elhamifar, E., & Vidal, R. (2011). Sparse manifold clustering and embedding. In NIPS (pp. 55–63).
Elhamifar, E., & Vidal, R. (2013). Sparse subspace clustering: Algorithm, theory, and applications. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(11), 2765–2781.
Hyder, M., & Mahata, K. (2009). An approximate 10 norm minimization algorithm for compressed sensing. In IEEE international conference on acoustics, speech and signal processing, 2009. ICASSP 2009 (pp. 3365–3368)
Jenatton, R., Mairal, J., Bach, F. R., & Obozinski, G. R. (2010). Proximal methods for sparse hierarchical dictionary learning. In Proceedings of the 27th international conference on machine learning (ICML-10) (pp. 487–494).
Karasuyama, M., & Mamitsuka, H. (2013). Manifold-based similarity adaptation for label propagation. In Advances in neural information processing systems 26: 27th annual conference on neural information processing systems 2013 (pp. 1547–1555). Proceedings of a meeting held December 5–8, 2013, Lake Tahoe.
Li, J., Kong, Y., Fu, Y. (2017). Sparse subspace clustering by learning approximation \(\mathscr {l}\)0 codes. In Proceedings of the 31st AAAI conference on artificial intelligence (pp. 2189–2195). San Francisco, February 4–9, 2017.
Liu, G., Lin, Z., & Yu, Y. (2010). Robust subspace segmentation by low-rank representation. In Proceedings of the 27th international conference on machine learning (ICML-10) (pp. 663–670). Haifa, June 21–24, 2010.
Liu, G., Lin, Z., Yan, S., Sun, J., Yu, Y., & Ma, Y. (2013). Robust recovery of subspace structures by low-rank representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1), 171–184.
Mairal, J., Bach, F. R., Ponce, J., Sapiro, G., Zisserman, A. (2008). Supervised dictionary learning. In Advances in neural information processing systems 21, Proceedings of the 22nd annual conference on neural information processing systems (pp. 1033–1040). Vancouver, December 8–11, 2008.
Mairal, J., Bach, F., Ponce, J., & Sapiro, G. (2010). Online learning for matrix factorization and sparse coding. Journal of Machine Learning Research, 11, 19–60.
Mancera, L., & Portilla, J. (2006). L0-norm-based sparse representation through alternate projections. In 2006 IEEE international conference on image processing (pp. 2089–2092).
Ng, A. Y., Jordan, M. I., & Weiss, Y. (2001). On spectral clustering: Analysis and an algorithm. In NIPS (pp. 849–856).
Park, D., Caramanis, C., & Sanghavi, S. (2014). Greedy subspace clustering. In Advances in neural information processing systems 27: Annual conference on neural information processing systems 2014 (pp. 2753–2761). Montreal, December 8–13, 2014.
Peng, X., Yi, Z., & Tang, H. (2015) Robust subspace clustering via thresholding ridge regression. In AAAI conference on artificial intelligence (AAAI) (pp. 3827–3833). AAAI
Plummer, D., & Lovász, L. (1986). Matching theory. Amsterdam: North-Holland Mathematics Studies, Elsevier.
Soltanolkotabi, M., & Cands, E. J. (2012). A geometric analysis of subspace clustering with outliers. The Annals of Statistics, 40(4), 2195–2238.
Soltanolkotabi, M., Elhamifar, E., & Cands, E. J. (2014). Robust subspace clustering. The Annals of Statistics, 42(2), 669–699.
Tropp, J. A. (2004). Greed is good: Algorithmic results for sparse approximation. IEEE Transactions on Information Theory, 50(10), 2231–2242.
Vidal, R. (2011). Subspace clustering. IEEE Signal Processing Magazine, 28(2), 52–68.
Wang, Y., & Xu, H. (2013). Noisy sparse subspace clustering. In Proceedings of the 30th international conference on machine learning, ICML 2013 (pp. 89–97). Atlanta, 16–21 June 2013.
Wang, Z., Gu, Q., Ning, Y., & Liu, H. (2015). High dimensional EM algorithm: Statistical optimization and asymptotic normality. In Advances in neural information processing systems 28: Annual conference on neural information processing systems 2015 (pp. 2521–2529). Montreal, December 7–12, 2015.
Wang, Y., Wang, Y. X., & Singh, A. (2016). Graph connectivity in noisy sparse subspace clustering. In CoRR abs/1504.01046.
Wang, Y. X., Xu, H., & Leng, C. (2013). Provable subspace clustering: When LRR meets SSC. In C. Burges, L. Bottou, M. Welling, Z. Ghahramani, & K. Weinberger (Eds.), Advances in Neural Information Processing Systems 26 (pp. 64–72). New York: Curran Associates, Inc.
Yan, S., & Wang, H. (2009). Semi-supervised learning by sparse representation. In SDM (pp. 792–801).
Yang, Y., Feng, J., Jojic, N., Yang, J., & Huang, T. S. (2016). \(\mathscr {l}\) 0 \(\mathscr {l}\) 0 -sparse subspace clustering. In Computer vision—ECCV 2016—14th European conference. Amsterdam, October 11–14, 2016, Proceedings, Part II (pp. 731–747). https://doi.org/10.1007/978-3-319-46475-6_45.
Yang, Y., Wang, Z., Yang, J., Han, J., & Huang, T. (2014a). Regularized l1-graph for data clustering. In Proceedings of the British machine vision conference. BMVA Press.
Yang, Y., Wang, Z., Yang, J., Wang, J., Chang, S., & Huang, T. S. (2014b). Data clustering by laplacian regularized l1-graph. In Proceedings of the 28th AAAI conference on artificial intelligence (pp. 3148–3149). Québec City, July 27–31, 2014.
Yang, J., Yu, K., Gong, Y., & Huang, T. S. (2009) Linear spatial pyramid matching using sparse coding for image classification. In CVPR (pp. 1794–1801).
You, C., Robinson, D., & Vidal, R. (2016). Scalable sparse subspace clustering by orthogonal matching pursuit. In IEEE Conference on computer vision and pattern recognition (CVPR) (pp. 3918–3927). Las Vegas, NV, June 27–30, 2016. https://doi.org/10.1109/CVPR.2016.425.
Zhang, T., Ghanem, B., Liu, S., Xu, C., & Ahuja, N. (2013). Low-rank sparse coding for image classification. In IEEE international conference on computer vision, ICCV 2013 (pp. 281–288). Sydney, December 1–8, 2013.
Zhang, C. H., & Zhang, T. (2012). A general theory of concave regularization for high-dimensional sparse estimation problems. Statistical Science, 27(4), 576–593.
Zheng, X., Cai, D., He, X., Ma, W. Y., & Lin, X. (2004). Locality preserving clustering for image database. In Proceedings of the 12th annual ACM international conference on multimedia, MULTIMEDIA’04 (pp. 885–891). New York: ACM
Zheng, M., Bu, J., Chen, C., Wang, C., Zhang, L., Qiu, G., et al. (2011). Graph regularized sparse coding for image representation. IEEE Transactions on Image Processing, 20(5), 1327–1336.
Zhou, D., Bousquet, O., Lal, T. N., Weston, J., & Schölkopf, B. (2003). Learning with local and global consistency. In advances in neural information processing systems 16. Neural Information Processing Systems, NIPS 2003. Vancouver, December 8–13, 2003.
Zhu, X., Ghahramani, Z., & Lafferty, J. D. (2003). Semi-supervised learning using gaussian fields and harmonic functions. In Proceedings of the 28th international conference machine learning (ICML 2003) (pp. 912–919). Washington, August 21–24, 2003.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Edwin Hancock, Richard Wilson, Will Smith, Adrian Bors, Nick Pears.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This material is based upon work supported by the findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. The work of Yingzhen Yang was supported in part by an IBM gift grant to Beckman Institute, UIUC. The work of Jiashi Feng was partially supported by National University of Singapore startup Grant R-263-000-C08-133 and Ministry of Education of Singapore AcRF Tier One Grant R-263-000-C21-112.
Rights and permissions
About this article
Cite this article
Yang, Y., Feng, J., Jojic, N. et al. Subspace Learning by \(\ell ^{0}\)-Induced Sparsity. Int J Comput Vis 126, 1138–1156 (2018). https://doi.org/10.1007/s11263-018-1092-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-018-1092-4