A Scalable Spectral Clustering Algorithm Based on Landmark-Embedding and Cosine Similarity

Chen, Guangliang

doi:10.1007/978-3-319-97785-0_6

Guangliang Chen¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11004))

Included in the following conference series:

Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR)

1287 Accesses

Abstract

We extend our recent work on scalable spectral clustering with cosine similarity (ICPR’18) to other kinds of similarity functions, in particular, the Gaussian RBF. In the previous work, we showed that for sparse or low-dimensional data, spectral clustering with the cosine similarity can be implemented directly through efficient operations on the data matrix such as elementwise manipulation, matrix-vector multiplication and low-rank SVD, thus completely avoiding the weight matrix. For other similarity functions, we present an embedding-based approach that uses a small set of landmark points to convert the given data into sparse feature vectors and then applies the scalable computing framework for the cosine similarity. Our algorithm is simple to implement, has clear interpretations, and naturally incorporates an outliers removal procedure. Preliminary results show that our proposed algorithm yields higher accuracy than existing scalable algorithms while running fast.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Landmark-Based Spectral Clustering with Local Similarity Representation

MATLAB Implementation Details of a Scalable Spectral Clustering Algorithm with the Cosine Similarity

Clustering High-Dimensional Data via Spectral Clustering Using Collaborative Representation Coefficients

Notes

1.
Available at http://yann.lecun.com/exdb/mnist/.
2.
This is similarity-based feature representation. Note that there is also work on dissimilarity representation [7, 13].
3.
https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/.
4.
Code available at http://www.cad.zju.edu.cn/home/dengcai/Data/Clustering.html.
5.
This empirical rule is derived as $\ell =\frac{1}{2}\cdot \sqrt{\frac{n}{k}} \cdot k = \frac{1}{2}\sqrt{nk}$, with the intuition that the value of $\ell $ should be proportional to both the (average) cluster size and number of clusters. For the data sets in Table 1, such an $\ell $ is always a few hundred.

References

Aggarwal, C.C., Zhai, C.: A survey of text clustering algorithms. In: Aggarwal, C., Zhai, C. (eds.) Mining Text Data, pp. 77–128. Springer, Boston (2012). https://doi.org/10.1007/978-1-4614-3223-4_4
Chapter Google Scholar
Cai, D., Chen, X.: Large scale spectral clustering via landmark-based sparse representation. IEEE Trans. Cybern. 45(8), 1669–1680 (2015)
Article Google Scholar
Chen, G.: Scalable spectral clustering with cosine similarity. In: Proceedings of the 24th International Conference on Pattern Recognition (ICPR), Beijing, China (2018)
Google Scholar
Jain, S., Munos, R., Stephan, F., Zeugmann, T. (eds.): ALT 2013. LNCS (LNAI), vol. 8139. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40935-6
Book Google Scholar
Chung, F.R.K.: Spectral graph theory. In: CBMS Regional Conference Series in Mathematics, vol. 92. AMS (1996)
Google Scholar
Coifman, R., Lafon, S.: Diffusion maps. Appl. Comput. Harmonic Anal. 21(1), 5–30 (2006)
Article MathSciNet Google Scholar
Duin, R., Pekalska, E.: The dissimilarity space: bridging structural and statistical pattern recognition. Pattern Recogn. Lett. 33(7), 826–832 (2012)
Article Google Scholar
Fowlkes, C., Belongie, S., Chung, F., Malik, J.: Spectral grouping using the Nyström method. IEEE Trans. Pattern Anal. Mach. Intell. 26(2), 214–225 (2004)
Article Google Scholar
von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)
Article MathSciNet Google Scholar
Meila, M., Shi, J.: A random walks view of spectral segmentation. In: Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics (2001)
Google Scholar
Moazzen, Y., Tasdemir, K.: Sampling based approximate spectral clustering ensemble for partitioning data sets. In: Proceedings of the 23rd International Conference on Pattern Recognition (2016)
Google Scholar
Ng, A., Jordan, M., Weiss, Y.: On spectral clustering: analysis and an algorithm. Adv. Neural Inf. Process. Syst. 14, 849–856 (2001)
Google Scholar
Pekalska, E., Duin, R.: The Dissimilarity Representation for Pattern Recognition. World Scientific, Singapore (2005)
Book Google Scholar
Pham, K., Chen, G.: Large-scale spectral clustering using diffusion coordinates on landmark-based bipartite graphs. In: Proceedings of the 12th Workshop on Graph-based Natural Language Processing (TextGraphs-2012), pp. 28–37. Association for Computational Linguistics (2018)
Google Scholar
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)
Article Google Scholar
Tasdemir, K.: Vector quantization based approximate spectral clustering of large datasets. Pattern Recogn. 45(8), 3034–3044 (2012)
Article Google Scholar
Wang, L., Leckie, C., Kotagiri, R., Bezdek, J.: Approximate pairwise clustering for large data sets via sampling plus extension. Pattern Recogn. 44, 222–235 (2011)
Article Google Scholar
Wang, L., Leckie, C., Ramamohanarao, K., Bezdek, J.: Approximate spectral clustering. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS (LNAI), vol. 5476, pp. 134–146. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-01307-2_15
Chapter Google Scholar
Yan, D., Huang, L., Jordan, M.: Fast approximate spectral clustering. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 907–916 (2009)
Google Scholar

Download references

Acknowledgments

We thank the anonymous reviewers for their helpful feedback. This work was motivated by a project sponsored by Verizon Wireless, which had the goal of grouping customers based on similar profile characteristics. G. Chen was supported by the Simons Foundation Collaboration Grant for Mathematicians.

Author information

Authors and Affiliations

Department of Mathematics and Statistics, San José State University, San José, CA, 95192, USA
Guangliang Chen

Authors

Guangliang Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guangliang Chen .

Editor information

Editors and Affiliations

Beihang University, Beijing, China
Xiao Bai
University of York, York, United Kingdom
Edwin R. Hancock
IBM Research – Thomas J. Watson Research, Yorktown Heights, New York, USA
Tin Kam Ho
University of York, Heslington, York, United Kingdom
Richard C. Wilson
University of Cagliari, Cagliari, Italy
Battista Biggio
Data 61 - CSIRO, Canberra, Aust Capital Terr, Australia
Antonio Robles-Kelly

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, G. (2018). A Scalable Spectral Clustering Algorithm Based on Landmark-Embedding and Cosine Similarity. In: Bai, X., Hancock, E., Ho, T., Wilson, R., Biggio, B., Robles-Kelly, A. (eds) Structural, Syntactic, and Statistical Pattern Recognition. S+SSPR 2018. Lecture Notes in Computer Science(), vol 11004. Springer, Cham. https://doi.org/10.1007/978-3-319-97785-0_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-97785-0_6
Published: 02 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-97784-3
Online ISBN: 978-3-319-97785-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Scalable Spectral Clustering Algorithm Based on Landmark-Embedding and Cosine Similarity