Skip to main content

Identifying stable objects for accelerating the classification phase of k-means

  • Conference paper
  • First Online:
Advances on P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC 2016)

Abstract

This work presents an improved version of the K-Means algorithm, this version consists in a simple heuristic where objects that remains in the same group, between the current and the previous iteration, are identified and excluded from calculi in the classification phase for subsequent iterations. In order to evaluate the improved version versus the standard, three synthetic and seven well-known real instances of specialized literature were used. Experimental results showed that the proposed heuristic spends less time than the standard algorithm. The best result was obtained when the Transactions instance was grouped into 200 clusters, achieving a time reduction of 90.1% regarding the standard version, with only a grouping quality reduction of 3.97%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Jain, A., and Dubes, R.: Algorithms for Clustering Data, Prentice Hall, Englewood Cliff, Nueva Jersey (1988)

    Google Scholar 

  2. Junjie, W.: Advances in K-Means Clustering A Data Mining Thinking, Doctoral Thesis, Tsinghua University, China, Springer (2012)

    Google Scholar 

  3. Scoltock J.: A survey of the literature of cluster analysis,” The Computer Journal, 25 (1982) 130-134

    Google Scholar 

  4. Al-Zoubi, B., Hudaib, A., Huneiti, A., and Hammo, B.: New Efficient Strattegy to Accelerate K-Means Clustering Algorithm, American Journal of Applied Sciences, 5:9 (2008) 1247-1250

    Google Scholar 

  5. Xu, R. and Wunsch II, D.: Survey of clustering algorithm, IEEE Transactions on Neural Networks, 16:3 (2005) 645-678

    Google Scholar 

  6. Everitt, B. S., Laudau, S., Leese, M., and Stahl, D.: Cluster Analysis. John Wiley and Sons, Inc., London, United Kindom (2011)

    Google Scholar 

  7. Chen, M., Mao, S., Zhang, Y., Leung, V.: Big data: related technologies, Challenges and future prospects, Springer (2014)

    Google Scholar 

  8. Li, K.C., Jiang, H., Yang, T. L.: Big Data: Algorithms, Analytics, and Applications, CRC Press Taylor and Francis Group, New York (2015)

    Google Scholar 

  9. MacQueen, J.: Some Methods for Classification and Analysis of Multivariate Observations, in Fifth Berkeley Symposium on Mathematics, Statistics and Probability, University of California Press, Berkeley, Calif., (1967) 281–296

    Google Scholar 

  10. Wu, X., Kumar, V., Quinlan, J.L., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G.J., Ng, A., Liu, B, Yu, P.S., Zhou, Z., Steinbach, M., Hand, D.J., and, Steinberg, D.: Top 10 algorithms in data mining, Journal of Knowledge and Information System, 14 (2008) 1-37

    Google Scholar 

  11. Khan, S. S., and Ahmad, A.: Cluster center initialization algorithm for K-Means clustering, Pattern Recognition Letters, 25 (2004) 1293–1302

    Google Scholar 

  12. Redmond, S. J. and Heneghan, C.: A method for initializing the K-Means clustering algorithm using kd-trees, Pattern Recognition Letters, 28: 8 (2007) 965–973

    Google Scholar 

  13. Zalik, K. R.: An efficient K-Means clustering algorithm, Pattern Recognition Letters, 29 (2008) 1385–1391

    Google Scholar 

  14. Li, C. S.: Cluster Center Initialization Method for K-Means Algorithm over Data Sets with Two Clusters, Procedia Engineering, 24, (2011) 324–328

    Google Scholar 

  15. Eltibi, M. F. and Ashour, W.M.: Initializing K-Means Clustering Algorithm using Statistical Information, International Journal of Computer Applications, 29:7 (2011) 51–55

    Google Scholar 

  16. Agha, M. E. and Ashour, W. M.: Efficient and Fast Initialization Algorithm for K-Means Clustering, International Journal of Intelligent Systems and Applications, 1:1 (2012) 21–31

    Google Scholar 

  17. Kaur, N., Kaur, J., and Kaur, N.: Efficient K-Means clustering algorithm using ranking method in data mining, International Journal of Advanced Research in Computer Engineering & Technology, 1:3 (2012) 85–91

    Google Scholar 

  18. Perez, J., Pazos, R., Cruz, L., Reyes, G., Basave, R. and Fraire, H.: Improving the Efficiency and Efficacy of the K-Means Clustering Algorithm through a New Convergence Condition, in Computational Science and Its Applications - ICCSA, Kuala Lumpur, Malaysia (2007) 674–682.

    Google Scholar 

  19. Yu, S., Tranchevent, L. C., Liu, X., Glänzel, W., Suykens, J. A. K., Moor, B.D., and Moreau, Y.: Optimized Data Fusion for Kernel K-Means Clustering, IEEE Transactions on Pattern Analysis and Machine Intelligence, 34:5 (2012) 1031–1039

    Google Scholar 

  20. Mexicano, A., Rodríguez, R., Cervantes, S., Montes, P., Jiménez, M., Almanza, N. and Abrego, A.: The early stop heuristic: A new convergence criterion for K-means, in AIP Conf. Proc. 1738, ICNAAM2015, Rhodes Greece (2016) 3100031–3100314.

    Google Scholar 

  21. Lai, J.Z.C. and Liaw, Y.C.: Improvement of the K-Means clustering filtering algorithm, Pattern Recognition, 41(2008) 3677–3681

    Google Scholar 

  22. Fahim, A. M., Salem, A. M., Torkey, F. A. and Ramadan, M. A.: An Efficient Enhanced KMeans Clustering Algorithm, Journal of Zhejiang University-Science, 7:10 (2006) 1626–1633

    Google Scholar 

  23. Sheeba, A., Mahfooz, S., Khusro, S. and Javed, H.: Enhanced K-Mean Clustering Algorithm to Reduce Number of Iterations and Time Complexity, Middle-East Journal of Scientific Research, 12:7 (2012) 959–963

    Google Scholar 

  24. Pérez, J., Martínez, A., Almanza, N., Mexicano, A., and Pazos, R.: Improvement to the KMeans algorithm by using its geometric and cluster neighborhood properties, in Proceedings of ICITSEM 2014, Dubai, UAE (2014) 21–26.

    Google Scholar 

  25. Pérez, J., Pires, C. E., Balby, L., Mexicano, A. and Hidalgo, M.: Early Classification: A New Heuristic to Improve the Classification Step of K-Means, Journal of Information and Data Management, 4:2 (2013) 94–103

    Google Scholar 

  26. Mexicano, A., Rodriguez, R., Cervantes, S., Ponce, R. and Bernal, W.: Fast means: Enhancing the K-Means algorithm by accelerating its early classification version, in AIP Conf. Proc. 1648, ICNAAM2014, Rhodes Greece (2015) 8200041–8200044

    Google Scholar 

  27. Pérez, J., Pazos, R., Hidalgo, M., Almanza, N., Díaz-Parra, O., Santaolaya, R., and Caballero, V.: An improvement to the K-Means algorithm oriented to big data, in AIP Conf. Proc. 1648, ICNAAM2014, Rhodes Greece (2015) 8200021–8200024

    Google Scholar 

  28. Pérez, J., Pazos, R., Olivares, V., Hidalgo, M., Ruiz, J., Martínez, A., Almanza, N., and González, M.: Optimization of the K-Means algorithm for the solution of high dimensional instances, in AIP Conf. Proc. 1738, ICNAAM2015, Rhodes Greece (2016) 3100021–3100214

    Google Scholar 

  29. Merz, C., Murphy, P., and Aha, D.: UCI Repository of Machine Learning Databases. Department of Information and Computer Science, University of California, http://www.ics.uci.edu/mlearn/MLRepository.html, 2016.

  30. http://sci2s.ugr.es/keel/datasets.php, Knowledge Extraction based on Evolutionary Learning, KEEL-dataset, last view: July 2016.

  31. http://www.flickr.com/map/,Photography’s repository, last view: July 2016.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to S. Cervantes .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Mexicano, A. et al. (2017). Identifying stable objects for accelerating the classification phase of k-means. In: Xhafa, F., Barolli, L., Amato, F. (eds) Advances on P2P, Parallel, Grid, Cloud and Internet Computing. 3PGCIC 2016. Lecture Notes on Data Engineering and Communications Technologies, vol 1. Springer, Cham. https://doi.org/10.1007/978-3-319-49109-7_88

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-49109-7_88

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-49108-0

  • Online ISBN: 978-3-319-49109-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics