ABSTRACT
We present three different optimization techniques for k-means clustering algorithm to improve the running time without decreasing the accuracy of the cluster centers significantly. Our first optimization restructures loops to improve cache behavior when executing on multicore architectures. The remaining two optimizations skip select points to reduce execution latency. Our sensitivity analysis suggests that the performance can be enhanced through a good understanding of the data and careful configuration of the parameters.
- Corel Corporation. http://www.corel.com/.Google Scholar
- A. Frank and A. Asuncion. {UCI} machine learning repository, 2010.Google Scholar
- S. P. Lloyd. Least squares quantization in pcm. IEEE Transactions on Information Theory, 28(2):129--136, 1982. Google ScholarDigital Library
- NU-MineBench. http://cucis.ece.northwestern.edu/projects/DMS/MineBench.html.Google Scholar
- Wind River Simics. http://www.virtutech.com/.Google Scholar
Index Terms
- Improving the performance of k-means clustering through computation skipping and data locality optimizations
Recommendations
Improved k- means clustering algorithm for two dimensional data
CCSEIT '12: Proceedings of the Second International Conference on Computational Science, Engineering and Information TechnologyClustering is a procedure of organizing the objects in groups whose member exhibits some kind of similarity. So a cluster is a collection of objects which are alike and are different from the objects belonging to other clusters. K-Means is one of ...
Clustering stability-based Evolutionary K-Means
Evolutionary K-Means (EKM), which combines K-Means and genetic algorithm, solves K-Means' initiation problem by selecting parameters automatically through the evolution of partitions. Currently, EKM algorithms usually choose silhouette index as cluster ...
Interpolation-based k-means Clustering Improvement for Sparse, High Dimensional Data
ICCBDC '19: Proceedings of the 2019 3rd International Conference on Cloud and Big Data ComputingThe k-means algorithm is characterized by simple implementation and fast speed, and is the most widely used clustering algorithm. Aiming at the shortcomings of k-means algorithm in noise sensitivity in high-dimensional sparse data sets, the IB k-means (...
Comments