Random Projection for k-means Clustering

Sieranoja, Sami; Fränti, Pasi

doi:10.1007/978-3-319-91253-0_63

Sami Sieranoja¹⁸ &
Pasi Fränti¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10841))

Included in the following conference series:

International Conference on Artificial Intelligence and Soft Computing

2287 Accesses
8 Citations

Abstract

We study how much the k-means can be improved if initialized by random projections. The first variant takes two random data points and projects the points to the axis defined by these two points. The second one uses furthest point heuristic for the second point. When repeated 100 times, cluster level errors of a single run of k-means reduces from CI = 4.5 to 0.8, on average. We also propose simple projective indicator that predicts when the projection-heuristic is expected to work well.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Al-Daoud, M.B., Roberts, S.A.: New methods for the initialisation of clusters. Pattern Recogn. Lett. 17(5), 451–455 (1996)
Article Google Scholar
Anderberg, M.R.: Cluster Analysis for Applications. Academic Press, NewYork (1973)
MATH Google Scholar
Bai, L., Cheng, X., Liang, J., Shen, H., Guo, Y.: Fast density clustering strategies based on the k-means algorithm. Pattern Recogn. 71, 375–386 (2017)
Article Google Scholar
Boley, D.: Principal direction divisive partitioning. Data Min. Knowl. Disc. 2(4), 325–344 (1998)
Article Google Scholar
Boutsidis, C., Zouzias, A., Mahoney, M.W., Drineas, P.: Randomized dimensionality reduction for k-means clustering. IEEE Trans. Inf. Theory 61(2), 1045–1062 (2015)
Article MathSciNet Google Scholar
Cardoso, A., Wichert, A.: Iterative random projections for high-dimensional data clustering. Pattern Recogn. Lett. 33, 1749–1755 (2012)
Article Google Scholar
Carraher, L.A., Wilsey, P.A., Moitra, A., Dey, S.: Random projection clustering on streaming data. In: IEEE International Conference on Data Mining Workshops, pp. 708–715 (2016)
Google Scholar
Celebi, M.E., Kingravi, H.A., Vela, P.A.: A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Syst. Appl. 40, 200–210 (2013)
Article Google Scholar
Cleju, I., Fränti, P., Wu, X.: Clustering based on principal curve. In: Kalviainen, H., Parkkinen, J., Kaarna, A. (eds.) SCIA 2005. LNCS, vol. 3540, pp. 872–881. Springer, Heidelberg (2005). https://doi.org/10.1007/11499145_88
Chapter Google Scholar
Dasgupta, S.: Experiments with random projection. In: Uncertainty in Artificial Intelligence, pp. 143–151 (2000)
Google Scholar
Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley, New York (1973)
MATH Google Scholar
Erisoglu, M., Calis, N., Sakallioglu, S.: A new algorithm for initial cluster centers in k-means algorithm. Pattern Recogn. Lett. 32(14), 1701–1705 (2011)
Article Google Scholar
Fern, X.Z., Brodley, C.E.: Random projection for high dimensional data clustering: a cluster ensemble approach. In: International Conference on Machine Learning (ICMC), Washington, DC (2003)
Google Scholar
Fränti, P.: Genetic algorithm with deterministic crossover for vector quantization. Pattern Recogn. Lett. 21(1), 61–68 (2000)
Article Google Scholar
Fränti, P., Kaukoranta, T., Nevalainen, O.: On the splitting method for VQ codebook generation. Opt. Eng. 36(11), 3043–3051 (1997)
Article Google Scholar
Fränti, P.: Efficiency of random swap clustering. J. Big Data 5(13), 1–29 (2018)
MathSciNet Google Scholar
Fränti, P., Rezaei, M., Zhao, Q.: Centroid index: cluster level similarity measure. Pattern Recogn. 47(9), 3034–3045 (2014)
Article Google Scholar
Fränti, P., Tuononen, M., Virmajoki, O.: Deterministic and randomized local search algorithms for clustering. In: IEEE International Conference on Multimedia and Expo, Hannover, Germany, pp. 837–840, June 2008
Google Scholar
Fränti, P., Virmajoki, O.: Iterative shrinking method for clustering problems. Pattern Recogn. 39(5), 761–765 (2006)
Article Google Scholar
Fränti, P., Virmajoki, O., Hautamäki, V.: Fast agglomerative clustering using a k-nearest neighbor graph. IEEE Trans. Pattern Anal. Mach. Intell. 28(11), 1875–1881 (2006)
Article Google Scholar
González, R., Tou, J.: Pattern Recognition Principles. Addison-Wesley, Boston (1974)
MATH Google Scholar
He, J., Lan, M., Tan, C.-L., Sung, S.-Y., Low, H.-B.: Initialization of cluster refinement algorithms: a review and comparative study. In: IEEE International Joint Conference on Neural Networks (2004)
Google Scholar
Huang, C.-M., Harris, R.W.: A comparison of several vector quantization codebook generation approaches. IEEE Trans. Image Process. 2(1), 108–112 (1993)
Article Google Scholar
Kaukoranta, T., Fränti, P., Nevalainen, O.: A fast exact GLA based on code vector activity detection. IEEE Trans. Image Process. 9(8), 1337–1342 (2000)
Article Google Scholar
Krishna, K., Murty, M.N.: Genetic k-means algorithm. IEEE Trans. Syst. Man Cybern. Part B 29(3), 433–439 (1999)
Article Google Scholar
Kärkkäinen, I., Fränti, P.: Dynamic local search algorithm for the clustering problem. Research Report A-2002-6 (2002)
Google Scholar
Peña, J.M., Lozano, J.A., Larrañaga, P.: An empirical comparison of four initialization methods for the k-means algorithm. Pattern Recogn. Lett. 20(10), 1027–1040 (1999)
Article Google Scholar
Ra, S.-W., Kim, J.-K.: A fast mean-distance-ordered partial codebook search algorithm for image vector quantization. IEEE Trans. Circ. Syst. 40, 576–579 (1993)
Article Google Scholar
Rezaei, M., Fränti, P.: Set-matching methods for external cluster validity. IEEE Trans. Knowl. Data Eng. 28(8), 2173–2186 (2016)
Article Google Scholar
Steinley, D., Brusco, M.J.: Initializing k-means batch clustering: a critical evaluation of several techniques. J. Classif. 24, 99–121 (2007)
Article MathSciNet Google Scholar
Su, T., Dy, J.G.: In search of deterministic methods for initializing k-means and Gaussian mixture clustering. Intell. Data Anal. 11(4), 319–338 (2007)
Article Google Scholar
Wu, X.: Optimal quantization by matrix searching. J. Algorithms 12(4), 663–673 (1991)
Article MathSciNet Google Scholar
Wu, X., Zhang, K.: A better tree-structured vector quantizer. In: IEEE Data Compression Conference, Snowbird, UT, pp. 392–401 (1991)
Google Scholar
Yan, D., Huang, L., Jordan, M.I.: Fast approximate spectral clustering. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 907–916, 2009
Google Scholar
Yedla, M., Pathakota, S.R., Srinivasa, T.M.: Enhancing k-means clustering algorithm with improved initial center. Int. J. Comput. Sci. Inf. Technol. 1(2), 121–125 (2010)
Google Scholar
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: a new data clustering algorithm and its applications. Data Min. Knowl. Disc. 1(2), 141–182 (1997)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing, University of Eastern Finland, Joensuu, Finland
Sami Sieranoja & Pasi Fränti

Authors

Sami Sieranoja
View author publications
You can also search for this author in PubMed Google Scholar
Pasi Fränti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pasi Fränti .

Editor information

Editors and Affiliations

Częstochowa University of Technology, Częstochowa, Poland
Leszek Rutkowski
Częstochowa University of Technology, Częstochowa, Poland
Rafał Scherer
Częstochowa University of Technology, Częstochowa, Poland
Marcin Korytkowski
University of Alberta, Edmonton, AB, Canada
Witold Pedrycz
AGH University of Science and Technology, Kraków, Poland
Ryszard Tadeusiewicz
University of Louisville, Louisville, KY, USA
Jacek M. Zurada

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sieranoja, S., Fränti, P. (2018). Random Projection for k-means Clustering. In: Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J. (eds) Artificial Intelligence and Soft Computing. ICAISC 2018. Lecture Notes in Computer Science(), vol 10841. Springer, Cham. https://doi.org/10.1007/978-3-319-91253-0_63

Download citation

DOI: https://doi.org/10.1007/978-3-319-91253-0_63
Published: 11 May 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91252-3
Online ISBN: 978-3-319-91253-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics