Abstract
K-means is a well-known clustering algorithm in data mining and machine learning. It is widely applicable in various domains such as computer vision, market segmentation, social network analysis, etc. However, k-means wastes a large amount of time on the unnecessary distance calculations. Thus accelerating k-means has become a worthy and important topic. Accelerated k-means algorithms can achieve the same result as k-means, but only faster. In this paper, we present a novel accelerated exact k-means algorithm named Fission-Fusion k-means that is significantly faster than the state-of-the-art accelerated k-means algorithms. The additional memory consumption of our algorithm is also much less than other accelerated k-means algorithms. Fission-Fusion k-means accelerates k-means by grouping number of points automatically during the iterations. It can balance these expenses well between distance calculations and the filtering time cost. We conduct extensive experiments on the real world datasets. In the experiments, real world datasets verify that Fission-Fusion k-means can considerably outperform the state-of-the-art accelerated k-means algorithms especially when the datasets are low-dimensional and the number of clusters is quite large. In addition, for more separated and naturally-clustered datasets, our algorithm is relatively faster than other accelerated k-means algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)
Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2007)
Sculley, D.: Web-scale k-means clustering. In: Proceedings of the 19th International Conference on World Wide Web, pp. 1177–1178 (2010)
Wang, J., Wang, J., Ke, Q., Zeng, G., and Li, S.: Fast approximate k-means via cluster closures. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3037–3044 (2012)
Pelleg, D., Moore, A.: Accelerating exact k-means algorithms with geometric reasoning. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 277–281 (1999)
Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.: An efficient k-means clustering algorithm: Analysis and implementation. IEEE Trans. Pattern Anal. Mach. Intell. 24, 881–892 (2002)
Elkan, C.: Using the triangle inequality to accelerate k- means. In: Proceedings of the 20th International Conference on Machine Learning (ICML), pp. 147–153 (2003)
Hamerly, G.: Making k-means even faster. In: SIAM International Conference on Data Mining (SDM), pp. 130–140 (2010)
Drake, J., Hamerly, G.: Accelerated k-means with adaptive distance bounds. In: 5th NIPS Workshop on Optimization for Machine Learning, pp. 579–587 (2012)
Drake, J.: Faster k-means clustering (2013). Accessed online 19 August 2015
Ding, Y., Zhao, Y., Shen, X., Musuvathi, M., Mytkowicz, T.: Yinyang k-means: A drop-in replacement of the classic k-means with consistent speedup. In: Proceedings of the 32nd International Conference on Machine Learning (ICML), pp. 579–587 (2015)
Ryšavý, P., Hamerly, G.: Geometric methods to accelerate k-means algorithms. In: SIAM International Conference on Data Mining (SDM), pp. 324–332 (2016)
Bottesch, T., Bühler, T., Kächele, M.: Speeding up k-means by approximating euclidean distances via block vectors. In: Proceedings of the 33rd International Conference on Machine Learning, New York (2016)
Newling, J., Fleuret, F.: Fast K-means with accurate bounds. In: Proceedings of the 33rd International Conference on Machine Learning, New York (2016)
Bache, K., Lichman, M.: UCI machine learning repository (2013). url: http://archive.ics.uci.edu/ml/
Joensuu: clustering datasets – Joensuu homepage url: https://cs.joensuu.fi/sipu/datasets/
Rong-En, F.: LIBSVM homepage url:https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Yu, Q., Dai, BR. (2017). Accelerating K-Means by Grouping Points Automatically. In: Bellatreche, L., Chakravarthy, S. (eds) Big Data Analytics and Knowledge Discovery. DaWaK 2017. Lecture Notes in Computer Science(), vol 10440. Springer, Cham. https://doi.org/10.1007/978-3-319-64283-3_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-64283-3_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-64282-6
Online ISBN: 978-3-319-64283-3
eBook Packages: Computer ScienceComputer Science (R0)