Abstract
We study how much the k-means can be improved if initialized by random projections. The first variant takes two random data points and projects the points to the axis defined by these two points. The second one uses furthest point heuristic for the second point. When repeated 100 times, cluster level errors of a single run of k-means reduces from CI = 4.5 to 0.8, on average. We also propose simple projective indicator that predicts when the projection-heuristic is expected to work well.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Al-Daoud, M.B., Roberts, S.A.: New methods for the initialisation of clusters. Pattern Recogn. Lett. 17(5), 451–455 (1996)
Anderberg, M.R.: Cluster Analysis for Applications. Academic Press, NewYork (1973)
Bai, L., Cheng, X., Liang, J., Shen, H., Guo, Y.: Fast density clustering strategies based on the k-means algorithm. Pattern Recogn. 71, 375–386 (2017)
Boley, D.: Principal direction divisive partitioning. Data Min. Knowl. Disc. 2(4), 325–344 (1998)
Boutsidis, C., Zouzias, A., Mahoney, M.W., Drineas, P.: Randomized dimensionality reduction for k-means clustering. IEEE Trans. Inf. Theory 61(2), 1045–1062 (2015)
Cardoso, A., Wichert, A.: Iterative random projections for high-dimensional data clustering. Pattern Recogn. Lett. 33, 1749–1755 (2012)
Carraher, L.A., Wilsey, P.A., Moitra, A., Dey, S.: Random projection clustering on streaming data. In: IEEE International Conference on Data Mining Workshops, pp. 708–715 (2016)
Celebi, M.E., Kingravi, H.A., Vela, P.A.: A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Syst. Appl. 40, 200–210 (2013)
Cleju, I., Fränti, P., Wu, X.: Clustering based on principal curve. In: Kalviainen, H., Parkkinen, J., Kaarna, A. (eds.) SCIA 2005. LNCS, vol. 3540, pp. 872–881. Springer, Heidelberg (2005). https://doi.org/10.1007/11499145_88
Dasgupta, S.: Experiments with random projection. In: Uncertainty in Artificial Intelligence, pp. 143–151 (2000)
Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley, New York (1973)
Erisoglu, M., Calis, N., Sakallioglu, S.: A new algorithm for initial cluster centers in k-means algorithm. Pattern Recogn. Lett. 32(14), 1701–1705 (2011)
Fern, X.Z., Brodley, C.E.: Random projection for high dimensional data clustering: a cluster ensemble approach. In: International Conference on Machine Learning (ICMC), Washington, DC (2003)
Fränti, P.: Genetic algorithm with deterministic crossover for vector quantization. Pattern Recogn. Lett. 21(1), 61–68 (2000)
Fränti, P., Kaukoranta, T., Nevalainen, O.: On the splitting method for VQ codebook generation. Opt. Eng. 36(11), 3043–3051 (1997)
Fränti, P.: Efficiency of random swap clustering. J. Big Data 5(13), 1–29 (2018)
Fränti, P., Rezaei, M., Zhao, Q.: Centroid index: cluster level similarity measure. Pattern Recogn. 47(9), 3034–3045 (2014)
Fränti, P., Tuononen, M., Virmajoki, O.: Deterministic and randomized local search algorithms for clustering. In: IEEE International Conference on Multimedia and Expo, Hannover, Germany, pp. 837–840, June 2008
Fränti, P., Virmajoki, O.: Iterative shrinking method for clustering problems. Pattern Recogn. 39(5), 761–765 (2006)
Fränti, P., Virmajoki, O., Hautamäki, V.: Fast agglomerative clustering using a k-nearest neighbor graph. IEEE Trans. Pattern Anal. Mach. Intell. 28(11), 1875–1881 (2006)
González, R., Tou, J.: Pattern Recognition Principles. Addison-Wesley, Boston (1974)
He, J., Lan, M., Tan, C.-L., Sung, S.-Y., Low, H.-B.: Initialization of cluster refinement algorithms: a review and comparative study. In: IEEE International Joint Conference on Neural Networks (2004)
Huang, C.-M., Harris, R.W.: A comparison of several vector quantization codebook generation approaches. IEEE Trans. Image Process. 2(1), 108–112 (1993)
Kaukoranta, T., Fränti, P., Nevalainen, O.: A fast exact GLA based on code vector activity detection. IEEE Trans. Image Process. 9(8), 1337–1342 (2000)
Krishna, K., Murty, M.N.: Genetic k-means algorithm. IEEE Trans. Syst. Man Cybern. Part B 29(3), 433–439 (1999)
Kärkkäinen, I., Fränti, P.: Dynamic local search algorithm for the clustering problem. Research Report A-2002-6 (2002)
Peña, J.M., Lozano, J.A., Larrañaga, P.: An empirical comparison of four initialization methods for the k-means algorithm. Pattern Recogn. Lett. 20(10), 1027–1040 (1999)
Ra, S.-W., Kim, J.-K.: A fast mean-distance-ordered partial codebook search algorithm for image vector quantization. IEEE Trans. Circ. Syst. 40, 576–579 (1993)
Rezaei, M., Fränti, P.: Set-matching methods for external cluster validity. IEEE Trans. Knowl. Data Eng. 28(8), 2173–2186 (2016)
Steinley, D., Brusco, M.J.: Initializing k-means batch clustering: a critical evaluation of several techniques. J. Classif. 24, 99–121 (2007)
Su, T., Dy, J.G.: In search of deterministic methods for initializing k-means and Gaussian mixture clustering. Intell. Data Anal. 11(4), 319–338 (2007)
Wu, X.: Optimal quantization by matrix searching. J. Algorithms 12(4), 663–673 (1991)
Wu, X., Zhang, K.: A better tree-structured vector quantizer. In: IEEE Data Compression Conference, Snowbird, UT, pp. 392–401 (1991)
Yan, D., Huang, L., Jordan, M.I.: Fast approximate spectral clustering. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 907–916, 2009
Yedla, M., Pathakota, S.R., Srinivasa, T.M.: Enhancing k-means clustering algorithm with improved initial center. Int. J. Comput. Sci. Inf. Technol. 1(2), 121–125 (2010)
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: a new data clustering algorithm and its applications. Data Min. Knowl. Disc. 1(2), 141–182 (1997)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Sieranoja, S., Fränti, P. (2018). Random Projection for k-means Clustering. In: Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J. (eds) Artificial Intelligence and Soft Computing. ICAISC 2018. Lecture Notes in Computer Science(), vol 10841. Springer, Cham. https://doi.org/10.1007/978-3-319-91253-0_63
Download citation
DOI: https://doi.org/10.1007/978-3-319-91253-0_63
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91252-3
Online ISBN: 978-3-319-91253-0
eBook Packages: Computer ScienceComputer Science (R0)