Skip to main content

Random Projection for k-means Clustering

  • Conference paper
  • First Online:
Artificial Intelligence and Soft Computing (ICAISC 2018)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10841))

Included in the following conference series:

Abstract

We study how much the k-means can be improved if initialized by random projections. The first variant takes two random data points and projects the points to the axis defined by these two points. The second one uses furthest point heuristic for the second point. When repeated 100 times, cluster level errors of a single run of k-means reduces from CI = 4.5 to 0.8, on average. We also propose simple projective indicator that predicts when the projection-heuristic is expected to work well.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Al-Daoud, M.B., Roberts, S.A.: New methods for the initialisation of clusters. Pattern Recogn. Lett. 17(5), 451–455 (1996)

    Article  Google Scholar 

  2. Anderberg, M.R.: Cluster Analysis for Applications. Academic Press, NewYork (1973)

    MATH  Google Scholar 

  3. Bai, L., Cheng, X., Liang, J., Shen, H., Guo, Y.: Fast density clustering strategies based on the k-means algorithm. Pattern Recogn. 71, 375–386 (2017)

    Article  Google Scholar 

  4. Boley, D.: Principal direction divisive partitioning. Data Min. Knowl. Disc. 2(4), 325–344 (1998)

    Article  Google Scholar 

  5. Boutsidis, C., Zouzias, A., Mahoney, M.W., Drineas, P.: Randomized dimensionality reduction for k-means clustering. IEEE Trans. Inf. Theory 61(2), 1045–1062 (2015)

    Article  MathSciNet  Google Scholar 

  6. Cardoso, A., Wichert, A.: Iterative random projections for high-dimensional data clustering. Pattern Recogn. Lett. 33, 1749–1755 (2012)

    Article  Google Scholar 

  7. Carraher, L.A., Wilsey, P.A., Moitra, A., Dey, S.: Random projection clustering on streaming data. In: IEEE International Conference on Data Mining Workshops, pp. 708–715 (2016)

    Google Scholar 

  8. Celebi, M.E., Kingravi, H.A., Vela, P.A.: A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Syst. Appl. 40, 200–210 (2013)

    Article  Google Scholar 

  9. Cleju, I., Fränti, P., Wu, X.: Clustering based on principal curve. In: Kalviainen, H., Parkkinen, J., Kaarna, A. (eds.) SCIA 2005. LNCS, vol. 3540, pp. 872–881. Springer, Heidelberg (2005). https://doi.org/10.1007/11499145_88

    Chapter  Google Scholar 

  10. Dasgupta, S.: Experiments with random projection. In: Uncertainty in Artificial Intelligence, pp. 143–151 (2000)

    Google Scholar 

  11. Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley, New York (1973)

    MATH  Google Scholar 

  12. Erisoglu, M., Calis, N., Sakallioglu, S.: A new algorithm for initial cluster centers in k-means algorithm. Pattern Recogn. Lett. 32(14), 1701–1705 (2011)

    Article  Google Scholar 

  13. Fern, X.Z., Brodley, C.E.: Random projection for high dimensional data clustering: a cluster ensemble approach. In: International Conference on Machine Learning (ICMC), Washington, DC (2003)

    Google Scholar 

  14. Fränti, P.: Genetic algorithm with deterministic crossover for vector quantization. Pattern Recogn. Lett. 21(1), 61–68 (2000)

    Article  Google Scholar 

  15. Fränti, P., Kaukoranta, T., Nevalainen, O.: On the splitting method for VQ codebook generation. Opt. Eng. 36(11), 3043–3051 (1997)

    Article  Google Scholar 

  16. Fränti, P.: Efficiency of random swap clustering. J. Big Data 5(13), 1–29 (2018)

    MathSciNet  Google Scholar 

  17. Fränti, P., Rezaei, M., Zhao, Q.: Centroid index: cluster level similarity measure. Pattern Recogn. 47(9), 3034–3045 (2014)

    Article  Google Scholar 

  18. Fränti, P., Tuononen, M., Virmajoki, O.: Deterministic and randomized local search algorithms for clustering. In: IEEE International Conference on Multimedia and Expo, Hannover, Germany, pp. 837–840, June 2008

    Google Scholar 

  19. Fränti, P., Virmajoki, O.: Iterative shrinking method for clustering problems. Pattern Recogn. 39(5), 761–765 (2006)

    Article  Google Scholar 

  20. Fränti, P., Virmajoki, O., Hautamäki, V.: Fast agglomerative clustering using a k-nearest neighbor graph. IEEE Trans. Pattern Anal. Mach. Intell. 28(11), 1875–1881 (2006)

    Article  Google Scholar 

  21. González, R., Tou, J.: Pattern Recognition Principles. Addison-Wesley, Boston (1974)

    MATH  Google Scholar 

  22. He, J., Lan, M., Tan, C.-L., Sung, S.-Y., Low, H.-B.: Initialization of cluster refinement algorithms: a review and comparative study. In: IEEE International Joint Conference on Neural Networks (2004)

    Google Scholar 

  23. Huang, C.-M., Harris, R.W.: A comparison of several vector quantization codebook generation approaches. IEEE Trans. Image Process. 2(1), 108–112 (1993)

    Article  Google Scholar 

  24. Kaukoranta, T., Fränti, P., Nevalainen, O.: A fast exact GLA based on code vector activity detection. IEEE Trans. Image Process. 9(8), 1337–1342 (2000)

    Article  Google Scholar 

  25. Krishna, K., Murty, M.N.: Genetic k-means algorithm. IEEE Trans. Syst. Man Cybern. Part B 29(3), 433–439 (1999)

    Article  Google Scholar 

  26. Kärkkäinen, I., Fränti, P.: Dynamic local search algorithm for the clustering problem. Research Report A-2002-6 (2002)

    Google Scholar 

  27. Peña, J.M., Lozano, J.A., Larrañaga, P.: An empirical comparison of four initialization methods for the k-means algorithm. Pattern Recogn. Lett. 20(10), 1027–1040 (1999)

    Article  Google Scholar 

  28. Ra, S.-W., Kim, J.-K.: A fast mean-distance-ordered partial codebook search algorithm for image vector quantization. IEEE Trans. Circ. Syst. 40, 576–579 (1993)

    Article  Google Scholar 

  29. Rezaei, M., Fränti, P.: Set-matching methods for external cluster validity. IEEE Trans. Knowl. Data Eng. 28(8), 2173–2186 (2016)

    Article  Google Scholar 

  30. Steinley, D., Brusco, M.J.: Initializing k-means batch clustering: a critical evaluation of several techniques. J. Classif. 24, 99–121 (2007)

    Article  MathSciNet  Google Scholar 

  31. Su, T., Dy, J.G.: In search of deterministic methods for initializing k-means and Gaussian mixture clustering. Intell. Data Anal. 11(4), 319–338 (2007)

    Article  Google Scholar 

  32. Wu, X.: Optimal quantization by matrix searching. J. Algorithms 12(4), 663–673 (1991)

    Article  MathSciNet  Google Scholar 

  33. Wu, X., Zhang, K.: A better tree-structured vector quantizer. In: IEEE Data Compression Conference, Snowbird, UT, pp. 392–401 (1991)

    Google Scholar 

  34. Yan, D., Huang, L., Jordan, M.I.: Fast approximate spectral clustering. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 907–916, 2009

    Google Scholar 

  35. Yedla, M., Pathakota, S.R., Srinivasa, T.M.: Enhancing k-means clustering algorithm with improved initial center. Int. J. Comput. Sci. Inf. Technol. 1(2), 121–125 (2010)

    Google Scholar 

  36. Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: a new data clustering algorithm and its applications. Data Min. Knowl. Disc. 1(2), 141–182 (1997)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pasi Fränti .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sieranoja, S., Fränti, P. (2018). Random Projection for k-means Clustering. In: Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J. (eds) Artificial Intelligence and Soft Computing. ICAISC 2018. Lecture Notes in Computer Science(), vol 10841. Springer, Cham. https://doi.org/10.1007/978-3-319-91253-0_63

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-91253-0_63

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-91252-3

  • Online ISBN: 978-3-319-91253-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics