Abstract
This work presents an improved version of the K-Means algorithm, this version consists in a simple heuristic where objects that remains in the same group, between the current and the previous iteration, are identified and excluded from calculi in the classification phase for subsequent iterations. In order to evaluate the improved version versus the standard, three synthetic and seven well-known real instances of specialized literature were used. Experimental results showed that the proposed heuristic spends less time than the standard algorithm. The best result was obtained when the Transactions instance was grouped into 200 clusters, achieving a time reduction of 90.1% regarding the standard version, with only a grouping quality reduction of 3.97%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Jain, A., and Dubes, R.: Algorithms for Clustering Data, Prentice Hall, Englewood Cliff, Nueva Jersey (1988)
Junjie, W.: Advances in K-Means Clustering A Data Mining Thinking, Doctoral Thesis, Tsinghua University, China, Springer (2012)
Scoltock J.: A survey of the literature of cluster analysis,” The Computer Journal, 25 (1982) 130-134
Al-Zoubi, B., Hudaib, A., Huneiti, A., and Hammo, B.: New Efficient Strattegy to Accelerate K-Means Clustering Algorithm, American Journal of Applied Sciences, 5:9 (2008) 1247-1250
Xu, R. and Wunsch II, D.: Survey of clustering algorithm, IEEE Transactions on Neural Networks, 16:3 (2005) 645-678
Everitt, B. S., Laudau, S., Leese, M., and Stahl, D.: Cluster Analysis. John Wiley and Sons, Inc., London, United Kindom (2011)
Chen, M., Mao, S., Zhang, Y., Leung, V.: Big data: related technologies, Challenges and future prospects, Springer (2014)
Li, K.C., Jiang, H., Yang, T. L.: Big Data: Algorithms, Analytics, and Applications, CRC Press Taylor and Francis Group, New York (2015)
MacQueen, J.: Some Methods for Classification and Analysis of Multivariate Observations, in Fifth Berkeley Symposium on Mathematics, Statistics and Probability, University of California Press, Berkeley, Calif., (1967) 281–296
Wu, X., Kumar, V., Quinlan, J.L., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G.J., Ng, A., Liu, B, Yu, P.S., Zhou, Z., Steinbach, M., Hand, D.J., and, Steinberg, D.: Top 10 algorithms in data mining, Journal of Knowledge and Information System, 14 (2008) 1-37
Khan, S. S., and Ahmad, A.: Cluster center initialization algorithm for K-Means clustering, Pattern Recognition Letters, 25 (2004) 1293–1302
Redmond, S. J. and Heneghan, C.: A method for initializing the K-Means clustering algorithm using kd-trees, Pattern Recognition Letters, 28: 8 (2007) 965–973
Zalik, K. R.: An efficient K-Means clustering algorithm, Pattern Recognition Letters, 29 (2008) 1385–1391
Li, C. S.: Cluster Center Initialization Method for K-Means Algorithm over Data Sets with Two Clusters, Procedia Engineering, 24, (2011) 324–328
Eltibi, M. F. and Ashour, W.M.: Initializing K-Means Clustering Algorithm using Statistical Information, International Journal of Computer Applications, 29:7 (2011) 51–55
Agha, M. E. and Ashour, W. M.: Efficient and Fast Initialization Algorithm for K-Means Clustering, International Journal of Intelligent Systems and Applications, 1:1 (2012) 21–31
Kaur, N., Kaur, J., and Kaur, N.: Efficient K-Means clustering algorithm using ranking method in data mining, International Journal of Advanced Research in Computer Engineering & Technology, 1:3 (2012) 85–91
Perez, J., Pazos, R., Cruz, L., Reyes, G., Basave, R. and Fraire, H.: Improving the Efficiency and Efficacy of the K-Means Clustering Algorithm through a New Convergence Condition, in Computational Science and Its Applications - ICCSA, Kuala Lumpur, Malaysia (2007) 674–682.
Yu, S., Tranchevent, L. C., Liu, X., Glänzel, W., Suykens, J. A. K., Moor, B.D., and Moreau, Y.: Optimized Data Fusion for Kernel K-Means Clustering, IEEE Transactions on Pattern Analysis and Machine Intelligence, 34:5 (2012) 1031–1039
Mexicano, A., Rodríguez, R., Cervantes, S., Montes, P., Jiménez, M., Almanza, N. and Abrego, A.: The early stop heuristic: A new convergence criterion for K-means, in AIP Conf. Proc. 1738, ICNAAM2015, Rhodes Greece (2016) 3100031–3100314.
Lai, J.Z.C. and Liaw, Y.C.: Improvement of the K-Means clustering filtering algorithm, Pattern Recognition, 41(2008) 3677–3681
Fahim, A. M., Salem, A. M., Torkey, F. A. and Ramadan, M. A.: An Efficient Enhanced KMeans Clustering Algorithm, Journal of Zhejiang University-Science, 7:10 (2006) 1626–1633
Sheeba, A., Mahfooz, S., Khusro, S. and Javed, H.: Enhanced K-Mean Clustering Algorithm to Reduce Number of Iterations and Time Complexity, Middle-East Journal of Scientific Research, 12:7 (2012) 959–963
Pérez, J., Martínez, A., Almanza, N., Mexicano, A., and Pazos, R.: Improvement to the KMeans algorithm by using its geometric and cluster neighborhood properties, in Proceedings of ICITSEM 2014, Dubai, UAE (2014) 21–26.
Pérez, J., Pires, C. E., Balby, L., Mexicano, A. and Hidalgo, M.: Early Classification: A New Heuristic to Improve the Classification Step of K-Means, Journal of Information and Data Management, 4:2 (2013) 94–103
Mexicano, A., Rodriguez, R., Cervantes, S., Ponce, R. and Bernal, W.: Fast means: Enhancing the K-Means algorithm by accelerating its early classification version, in AIP Conf. Proc. 1648, ICNAAM2014, Rhodes Greece (2015) 8200041–8200044
Pérez, J., Pazos, R., Hidalgo, M., Almanza, N., Díaz-Parra, O., Santaolaya, R., and Caballero, V.: An improvement to the K-Means algorithm oriented to big data, in AIP Conf. Proc. 1648, ICNAAM2014, Rhodes Greece (2015) 8200021–8200024
Pérez, J., Pazos, R., Olivares, V., Hidalgo, M., Ruiz, J., Martínez, A., Almanza, N., and González, M.: Optimization of the K-Means algorithm for the solution of high dimensional instances, in AIP Conf. Proc. 1738, ICNAAM2015, Rhodes Greece (2016) 3100021–3100214
Merz, C., Murphy, P., and Aha, D.: UCI Repository of Machine Learning Databases. Department of Information and Computer Science, University of California, http://www.ics.uci.edu/mlearn/MLRepository.html, 2016.
http://sci2s.ugr.es/keel/datasets.php, Knowledge Extraction based on Evolutionary Learning, KEEL-dataset, last view: July 2016.
http://www.flickr.com/map/,Photography’s repository, last view: July 2016.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Mexicano, A. et al. (2017). Identifying stable objects for accelerating the classification phase of k-means. In: Xhafa, F., Barolli, L., Amato, F. (eds) Advances on P2P, Parallel, Grid, Cloud and Internet Computing. 3PGCIC 2016. Lecture Notes on Data Engineering and Communications Technologies, vol 1. Springer, Cham. https://doi.org/10.1007/978-3-319-49109-7_88
Download citation
DOI: https://doi.org/10.1007/978-3-319-49109-7_88
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49108-0
Online ISBN: 978-3-319-49109-7
eBook Packages: EngineeringEngineering (R0)