Abstract
Spectral Clustering is a popular learning paradigm that employs the eigenvectors and eigenvalues of an appropriate input matrix for approximating the clustering objective. Albeit its empirical success in diverse application areas, spectral clustering has been criticized for its inefficiency when dealing with large-size datasets. This is mainly due to the fact that the complexity of most eigenvector algorithms is cubic with respect to the number of instances and even memory efficient iterative eigensolvers (such as the Power Method) may converge very slowly to the desired eigenvector solutions. In this paper, inspired from the relevant work on Pagerank we propose a semi-supervised framework for spectral clustering that provably improves the efficiency of the Power Method for computing the Spectral Clustering solution. The proposed method is extremely suitable for large and sparse matrices, where it is demonstrated to converge to the eigenvector solution with just a few Power Method iterations. The proposed framework reveals a novel perspective of semi-supervised spectral methods and demonstrates that the efficiency of spectral clustering can be enhanced not only by data compression but also by introducing the appropriate supervised bias to the input Laplacian matrix. Apart from the efficiency gains, the proposed framework is also demonstrated to improve the quality of the derived cluster models.
Similar content being viewed by others
References
Alzate C, Suykens JAK (2009) A regularized formulation for spectral clustering with pairwise constraints. In: IJCNN’09: Proceedings of the 2009 international joint conference on neural networks. IEEE Press, Piscataway, NJ, USA, pp 1338–1345
Bie TD, Suykens JAK, Moor BD (2004) Learning from general label constraints. In: Fred ALN, Caelli T, Duin RPW, Campilho AC, Ridder D SSPR/SPR, Lecture Notes in Computer Science. Springer, Berlin, pp 671–679
Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw 30(1–7): 107–117
Golub GH, Van Loan CF (1996) Matrix computations. Johns Hopkins University Press, Baltimore
Haveliwala T, Kamvar S (2003) The second eigenvalue of the google matrix. Stanford University Technical Report 2003-20
Kamvar SD, Klein D, Manning CD (2003) Spectral learning. In: Gottlob G, Walsh T, (eds) IJCAI, Morgan Kaufmann, pp 561–566
Kulis B, Basu S, Dhillon IS, Mooney RJ (2005) Semi-supervised graph clustering: a kernel approach. In: Raedt LD, Wrobel S, (eds) ICML. ACM international conference proceeding series, vol 119. ACM pp 457–464
Li Z, Liu J, Tang X (2009) Constrained clustering via spectral regularization. In: CVPR, IEEE, pp 421–428
Lu Z, Carreira-Perpiñán MÁ (2008) Constrained spectral clustering through affinity propagation. In: CVPR, IEEE Computer Society
Mavroeidis D, Bingham E (2008) Enhancing the stability of spectral ordering with sparsification and partial supervision: Application to paleontological data. In: ICDM, IEEE Computer Society, pp 462–471
Mavroeidis D, Bingham E (2010) Enhancing the stability and efficiency of spectral ordering with partial supervision and feature selection. Knowledge and Information Systems 23(2): 243–265
Meilă M, Shortreed S, Xu L (2005) Regularized spectral learning. In: Proceedings of the 10th international workshop on artificial intelligence and statistics (AISTATS)
Stewart GW, Sun J (1990) Matrix perturbation theory. Academic Press, London
von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4): 395–416
Yan D, Huang L, Jordan MI (2009) Fast approximate spectral clustering. In: IV JFE, Fogelman-Soulié F, Flach PA, Zaki MJ, (eds) KDD, ACM, pp 907–916
Zhou D, Bousquet O, Lal TN, Weston J, Schölkopf B (2003) Learning with local and global consistency. In: Thrun S, Saul LK, Schölkopf B NIPS. MIT Press, Cambridge
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editors: José L Balcázar, Francesco Bonchi, Aristides Gionis, Michéle Sebag.
Rights and permissions
About this article
Cite this article
Mavroeidis, D. Accelerating spectral clustering with partial supervision. Data Min Knowl Disc 21, 241–258 (2010). https://doi.org/10.1007/s10618-010-0191-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-010-0191-9