Two-Stage Clustering with k-Means Algorithm

Salman, Raied; Kecman, Vojislav; Li, Qi; Strack, Robert; Test, Erick

doi:10.1007/978-3-642-21937-5_11

Raied Salman⁴,
Vojislav Kecman⁴,
Qi Li⁴,
Robert Strack⁴ &
…
Erick Test⁴

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 162))

Included in the following conference series:

1603 Accesses
5 Citations

Abstract

k-means has recently been recognized as one of the best algorithms for clustering unsupervised data. Since the k-means depends mainly on distance calculation between all data points and the centers then the cost will be high when the size of the dataset is big (for example more than 500MG points). We suggested a two stage algorithm to reduce the cost of calculation for huge datasets. The first stage is fast calculation depending on small portion of the data to produce the best location of the centers. The second stage is the slow calculation in which the initial centers are taken from the first stage. The fast and slow stages are representing the movement of the centers. In the slow stage the whole dataset can be used to get the exact location of the centers. The cost of the calculation of the fast stage is very low due to the small size of the data chosen. The cost of the calculation of the slow stage is also small due to the low number of iterations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Arhter, D., Vassilvitskii, S.: How Slow is the kMeans Method? In: SCG 2006, Sedona, Arizona, USA (2006)
Google Scholar
Elkan, C.: Using the Triangle Inequality to Accelerate K –Means. In: Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003), Washington DC (2003)
Google Scholar
Pakhira, M.K.: A Modified k-means Algorithm to Avoid Empty Clusters. International Journal of Recent Trends in Engineering 1(1) (May 2009)
Google Scholar
Dude, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley-Interscience Publication, Hoboken (2000)
Google Scholar
Har-Peled, S., Sadri, B.: How fast is the k-means method? Algorithmica 41(3), 185–202 (2005)
Article MathSciNet MATH Google Scholar
Bradley, P.S., Fayyad, U.M.: Refining Initial Points for Kmeans Clustering. Technical Report of Microsoft Research Center. Redmond,California, USA (1998)
Google Scholar
Wu, F.X.: Genetic weighted k-means algorithm for clustering large-scale gene expression data. BMC Bioinformatics 9 (2008)
Google Scholar
Khan, S.S., Ahmed, A.: Cluster center initialization for Kmeans algorithm. Pattern Recognition Letters 25(11), 1293–1302 (2004)
Article Google Scholar
Hodgson, M.E.: Reducing computational requirements of the minimum-distance classifier. Remote Sensing of Environments 25, 117–128 (1988)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, Virginia Commonwealth University, 601 West Main Street, Richmond, VA, 23284-3068, USA
Raied Salman, Vojislav Kecman, Qi Li, Robert Strack & Erick Test

Authors

Raied Salman
View author publications
You can also search for this author in PubMed Google Scholar
Vojislav Kecman
View author publications
You can also search for this author in PubMed Google Scholar
Qi Li
View author publications
You can also search for this author in PubMed Google Scholar
Robert Strack
View author publications
You can also search for this author in PubMed Google Scholar
Erick Test
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Girne American University, Girne, TRNC, Turkey
Abdulkadir Özcan
Mendel University, Brno, Czech Republic
Jan Zizka
Wireilla Net Solutions PTY Ltd, Melbourne, Victoria, Australia
Dhinaharan Nagamalai

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Salman, R., Kecman, V., Li, Q., Strack, R., Test, E. (2011). Two-Stage Clustering with k-Means Algorithm. In: Özcan, A., Zizka, J., Nagamalai, D. (eds) Recent Trends in Wireless and Mobile Networks. CoNeCo WiMo 2011 2011. Communications in Computer and Information Science, vol 162. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21937-5_11

Download citation

DOI: https://doi.org/10.1007/978-3-642-21937-5_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21936-8
Online ISBN: 978-3-642-21937-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics