Abstract
k-means has recently been recognized as one of the best algorithms for clustering unsupervised data. Since the k-means depends mainly on distance calculation between all data points and the centers then the cost will be high when the size of the dataset is big (for example more than 500MG points). We suggested a two stage algorithm to reduce the cost of calculation for huge datasets. The first stage is fast calculation depending on small portion of the data to produce the best location of the centers. The second stage is the slow calculation in which the initial centers are taken from the first stage. The fast and slow stages are representing the movement of the centers. In the slow stage the whole dataset can be used to get the exact location of the centers. The cost of the calculation of the fast stage is very low due to the small size of the data chosen. The cost of the calculation of the slow stage is also small due to the low number of iterations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Arhter, D., Vassilvitskii, S.: How Slow is the kMeans Method? In: SCG 2006, Sedona, Arizona, USA (2006)
Elkan, C.: Using the Triangle Inequality to Accelerate K –Means. In: Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003), Washington DC (2003)
Pakhira, M.K.: A Modified k-means Algorithm to Avoid Empty Clusters. International Journal of Recent Trends in Engineering 1(1) (May 2009)
Dude, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley-Interscience Publication, Hoboken (2000)
Har-Peled, S., Sadri, B.: How fast is the k-means method? Algorithmica 41(3), 185–202 (2005)
Bradley, P.S., Fayyad, U.M.: Refining Initial Points for Kmeans Clustering. Technical Report of Microsoft Research Center. Redmond,California, USA (1998)
Wu, F.X.: Genetic weighted k-means algorithm for clustering large-scale gene expression data. BMC Bioinformatics 9 (2008)
Khan, S.S., Ahmed, A.: Cluster center initialization for Kmeans algorithm. Pattern Recognition Letters 25(11), 1293–1302 (2004)
Hodgson, M.E.: Reducing computational requirements of the minimum-distance classifier. Remote Sensing of Environments 25, 117–128 (1988)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Salman, R., Kecman, V., Li, Q., Strack, R., Test, E. (2011). Two-Stage Clustering with k-Means Algorithm. In: Özcan, A., Zizka, J., Nagamalai, D. (eds) Recent Trends in Wireless and Mobile Networks. CoNeCo WiMo 2011 2011. Communications in Computer and Information Science, vol 162. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21937-5_11
Download citation
DOI: https://doi.org/10.1007/978-3-642-21937-5_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21936-8
Online ISBN: 978-3-642-21937-5
eBook Packages: Computer ScienceComputer Science (R0)