ABSTRACT
Recently, the world is experiencing generating the huge amount of data in different domains. Data mining and Data analytics and are the practices used for analyzing data and extracting hidden knowledge. One of the major data mining methods which is used to analysis of data is data clustering. Data clustering ease the extract information from each cluster separately. There are many algorithms used to perform the clustering. One of the most famous algorithms which is used for clustering for more than half a century is k-means. By many optimization and enhancement, K-means still considers as the most popular clustering algorithm which is still being used in various domains. This research attempts to conduct a Systematic literature Review (SLR) to collect, classify, and analyze the primary studies about the different version of k-means clustering algorithm. This SLR gives a means of finding, appraising, and interpreting existing researches pertinent to the topic. By narrowing down the crucial sections of debate, we are hoping to establish a foundation for upcoming researches.
- N. Ghadiri, M. Ghaffari, and M. A. Nikbakht, “BigFCM: Fast, precise and scalable FCM on hadoop,” Futur. Gener. Comput. Syst., vol. 77, pp. 29–39, 2017.Google ScholarDigital Library
- K. Rezaei and H. Rezaei, “HFSMOOK-Means: An Improved K-Means Algorithm Using Hesitant Fuzzy Sets and Multi-objective Optimization,” Arab. J. Sci. Eng., vol. 45, no. 8, pp. 6241–6257, 2020.Google ScholarCross Ref
- J. J. D. Cabrera, A. M. Sison, and R. P. Medina, “Centroid 360: An enhanced centroid initialization method for K means algorithm,” ACM Int. Conf. Proceeding Ser., pp. 230–235, 2019.Google ScholarDigital Library
- B. A. Kitchenham and S. Charters, “Guidelines for performing Systematic Literature Reviews in Software Engineering,” Citeseer, 2007.Google Scholar
- J. MacQueen, “Some methods for classification and analysis of multivariate observations,” Proc. Fifth Berkeley Symp. Math. Stat. Probab., vol. 1, no. 233, pp. 281–297, 1967.Google Scholar
- A. S. Shirkhorshidi and S. Aghabozorgi, “Big Data Clustering: A Review,” Comput. Sci. Its Appl. – ICCSA 2014, vol. 8583, no. June, 2014.Google Scholar
- R. Jothi, S. K. Mohanty, and A. Ojha, “DK-means: a deterministic K-means clustering algorithm for gene expression analysis,” Pattern Anal. Appl., vol. 22, no. 2, pp. 649–667, 2019.Google ScholarDigital Library
- A. M. El-Mandouh, H. A. Mahmoud, L. A. Abd-Elmegid, and M. H. Haggag, “Optimized K-means clustering model based on gap statistic,” Int. J. Adv. Comput. Sci. Appl., vol. 10, no. 1, pp. 183–188, 2019.Google Scholar
- G. Laccetti, M. Lapegna, V. Mele, D. Romano, and L. Szustak, “Performance enhancement of a dynamic K-means algorithm through a parallel adaptive strategy on multicore CPUs,” J. Parallel Distrib. Comput., vol. 145, pp. 34–41, 2020.Google ScholarCross Ref
- T. G. Debelee, F. Schwenker, S. Rahimeto, and D. Yohannes, “Evaluation of modified adaptive k-means segmentation algorithm,” Comput. Vis. Media, vol. 5, no. 4, pp. 347–361, 2019.Google ScholarCross Ref
- G. V. Oliveira, F. P. Coutinho, R. J. G. B. Campello, and M. C. Naldi, “Improving k-means through distributed scalable metaheuristics,” Neurocomputing, vol. 246, pp. 45–57, 2017.Google ScholarDigital Library
- W. Lu, “Improved K-Means Clustering Algorithm for Big Data Mining under Hadoop Parallel Framework,” J. Grid Comput., vol. 18, no. 2, pp. 239–250, 2020.Google ScholarDigital Library
- Y. L. Zhang and Y. N. Wang, “An improved sampling K-means clustering algorithm based on MapReduce,” ICNC-FSKD 2017 - 13th Int. Conf. Nat. Comput. Fuzzy Syst. Knowl. Discov., pp. 1934–1939, 2018.Google Scholar
- S. Khanmohammadi, N. Adibeig, and S. Shanehbandy, “An improved overlapping k-means clustering method for medical applications,” Expert Syst. Appl., vol. 67, pp. 12–18, 2017.Google ScholarDigital Library
- M. Sivaguru and M. Punniyamoorthy, “Performance-enhanced rough k -means clustering algorithm,” Soft Comput., vol. 2, 2020.Google ScholarDigital Library
- J. Qi, Y. Yu, L. Wang, and J. Liu, “K∗-means: An effective and efficient k-means clustering algorithm,” Proc. - 2016 IEEE Int. Conf. Big Data Cloud Comput. BDCloud 2016, Soc. Comput. Networking, Soc. 2016 Sustain. Comput. Commun. Sustain. 2016, pp. 242–249, 2016.Google ScholarCross Ref
- Y. Xiong, Q. Peng, and Z. Zhang, “Research on MapReduce parallel optimization method based on improved K-means clustering algorithm,” ACM Int. Conf. Proceeding Ser.,, 2020.Google ScholarDigital Library
- L. Zhang, J. Qu, M. Gao, and M. Zhao, “Improvement of k-means algorithm based on density,” Proc. 2019 IEEE 8th Jt. Int. Inf. Technol. Artif. Intell. Conf. ITAIC 2019, no. Itaic, pp. 1070–1073, 2019.Google ScholarCross Ref
- S. S. Yu, S. W. Chu, C. M. Wang, Y. K. Chan, and T. C. Chang, “Two improved k-means algorithms,” Appl. Soft Comput. J., vol. 68, pp. 747–755, 2018.Google ScholarDigital Library
- T. Wang and J. Gao, “An Improved K-Means Algorithm Based on Kurtosis Test,” J. Phys. Conf. Ser., vol. 1267, no. 1, 2019.Google ScholarCross Ref
- X. Wang and Y. Bai, “The global Minmax k-means algorithm,” Springerplus, vol. 5, no. 1, 2016.Google Scholar
- C. Lutz, S. Breb, T. Rabl, S. Zeuch, and V. Mark, “Efficient and Scalable k‑Means on GPUs,” Datenbank Spektrum, pp. 157–169, 2018.Google ScholarCross Ref
- C. Sreedhar, N. Kasiviswanath, and P. Chenna Reddy, “Clustering large datasets using K-means modified inter and intra clustering (KM-I2C) in Hadoop,” J. Big Data, vol. 4, no. 1, 2017.Google ScholarCross Ref
- R. M. Esteves, T. Hacker, and C. Rong, “Competitive K-means: A new accurate and distributed K-means algorithm for large datasets,” Proc. Int. Conf. Cloud Comput. Technol. Sci. CloudCom, vol. 1, pp. 17–24, 2013.Google ScholarDigital Library
- B. Xiao, Z. Wang, Q. Liu, and X. Liu, “SMK-means: An improved mini batch k-means algorithm based on mapreduce with big data,” Comput. Mater. Contin., vol. 56, no. 3, pp. 365–379, 2018.Google Scholar
- G. Zhang, C. Zhang, and H. Zhang, “Improved K-means algorithm based on density Canopy,” Knowledge-Based Syst., vol. 145, pp. 289–297, 2018.Google ScholarDigital Library
- M. Ashkartizabi and M. Aminghafari, “Functional data clustering using K-means and random projection with applications to climatological data,” Stoch. Environ. Res. Risk Assess., vol. 32, no. 1, pp. 83–104, 2018.Google ScholarCross Ref
- S. Y. Huang and B. Zhang, “Research on improved k-means clustering algorithm based on hadoop platform,” Proc. - 2019 Int. Conf. Mach. Learn. Big Data Bus. Intell. MLBDBI 2019, pp. 301–303, 2019.Google ScholarCross Ref
- R. A. Haraty, M. Dimishkieh, and M. Masud, “An Enhanced k-Means Clustering Algorithm for Pattern Discovery in Healthcare Data,” Int. J. Distrib. Sens. Networks, vol. 11, no. 6, p. 615740, 2015.Google ScholarDigital Library
- X. Hou, “An Improved K-means Clustering Algorithm Based on Hadoop Platform,” Advances in Intelligent Systems and Computing, vol. 928. pp. 1101–1109, 2020.Google ScholarCross Ref
- S. Dhanasekaran, R. Sundarrajan, B. S. Murugan, S. Kalaivani, and V. Vasudevan, “Enhanced Map Reduce Techniques for Big Data Analytics based on K-Means Clustering,” IEEE Int. Conf. Intell. Tech. Control. Optim. Signal Process. INCOS 2019, pp. 0–4, 2019.Google ScholarCross Ref
- X. Wei and Y. Li, “Research on improved k-means algorithm based on hadoop,” Proc. - 2017 4th Int. Conf. Inf. Sci. Control Eng. ICISCE 2017, pp. 593–598, 2017.Google Scholar
- K. Wu, W. Zeng, T. Wu, and Y. An, “Research and improve on K-means algorithm based on hadoop,” Proc. IEEE Int. Conf. Softw. Eng. Serv. Sci. ICSESS, vol. 2015-Novem, pp. 334–337, 2015.Google ScholarCross Ref
Recommendations
Improvement in k-Means Clustering Algorithm Using Data Clustering
ICCUBEA '15: Proceedings of the 2015 International Conference on Computing Communication Control and AutomationThe set of objects having same characteristics are organized in groups and clusters of these objects reformed known as Data Clustering. It is an unsupervisedlearning technique for classification of data. K-means algorithm is widely used and famous ...
Ensemble-Initialized k-Means Clustering
ICMLC '19: Proceedings of the 2019 11th International Conference on Machine Learning and ComputingAs one of the most classical clustering techniques, the k-means clustering has been widely used in various areas over the past few decades. Despite its significant success, there are still several challenging issues in the k-means clustering research, ...
The Projected Dip-means Clustering Algorithm
SETN '18: Proceedings of the 10th Hellenic Conference on Artificial IntelligenceOne of the major research issues in data clustering concerns the estimation of number of clusters. In previous work, the dip-means clustering algorithm has been proposed as a successful attempt to tackle this problem. Dip-means is an incremental ...
Comments