Abstract
Evolutionary algorithms have shown their powerful capabilities in different machine learning problems including clustering which is a growing area of research nowadays. In this paper, we propose an efficient clustering technique based on the evolution behavior of genetic algorithm and an advanced variant of nearest neighbor search technique based on assignment and election mechanisms. The goal of the proposed algorithm is to improve the quality of clustering results by finding a solution that maximizes the separation between different clusters and maximizes the cohesion between data points in the same cluster. Our proposed algorithm which we refer to as “EvoNP” is tested with 15 well-known data sets using 5 well-known external evaluation measures and is compared with 7 well-regarded clustering algorithms . The experiments are conducted in two phases: evaluation of the best fitness function for the algorithm and evaluation of the algorithm against other clustering algorithms. The results show that the proposed algorithm works well with silhouette coefficient fitness function and outperforms the other algorithms for the majority of the data sets. The source code of EvoNP is available at http://evo-ml.com/evonp/.








Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.Notes
References
Aljarah I, Ludwig SA (2013) A new clustering approach based on glowworm swarm optimization. In: 2013 IEEE congress on evolutionary computation. IEEE, pp 2642–2649
Aljarah I, Ala’M A-Z, Faris H, Hassonah MA, Mirjalili S, Saadeh H, (2018) Simultaneous feature selection and support vector machine optimization using the grasshopper optimization algorithm. Cogn Comput 10:478–495
Aljarah I, Mafarja M, Heidari AA, Faris H, Mirjalili S (2020a) Clustering analysis using a novel locality-informed grey wolf-inspired clustering approach. Knowl Inf Syst 62(2):507–539
Aljarah I, Mafarja M, Heidari AA, Faris H, Mirjalili S (2020b) Multi-verse optimizer: theory, literature review, and application in data clustering. In: Mirjalili S, Song Dong J, Lewis A (eds) Nature-inspired optimizers, vol 811. Springer, Cham, pp 123–141
Ankerst M, Breunig MM, Kriegel HP, Sander J (1999) Optics: ordering points to identify the clustering structure. ACM Sigmod Record, ACM 28:49–60
Anton H (2013) Elementary linear algebra. Binder ready version. Wiley, New York
Arthur D, Vassilvitskii S (2007) k-means++: the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms, Society for Industrial and Applied Mathematics, pp 1027–1035
Beg A, Islam MZ (2015) Clustering by genetic algorithm-high quality chromosome selection for initial population. In: 2015 IEEE 10th Conference on industrial electronics and applications (ICIEA), IEEE, pp 129–134
Caliński T, Harabasz J (1974) A dendrite method for cluster analysis. Commun Stat Theory Methods 3(1):1–27
Chen J, Liu D, Hao F, Wang H (2020) Community detection in dynamic signed network: an intimacy evolutionary clustering algorithm. J Ambient Intell Hum Comput 11(2):891–900
Chen S, Ma B, Zhang K (2009) On the similarity metric and the distance metric. Theor Comput Sci 410(24–25):2365–2376
de Andrade Silva J, Hruschka ER, Gama J (2017) An evolutionary algorithm for clustering data streams with a variable number of clusters. Expert Syst Appl 67:228–238
Dheeru D, Karra Taniskidou E (2017) UCI machine learning repository. http://archive.ics.uci.edu/ml. Accessed 17 Sep 2020
Djenouri Y, Belhadi A, Fournier-Viger P, Lin JCW (2018) Fast and effective cluster-based information retrieval using frequent closed itemsets. Inf Sci 453:154–167
Dunn JC (1973) A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters. J Cybern 3(3):32–57
Ester M, Kriegel HP, Sander J, Xu X et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. KDD 96:226–231
Fränti P, Sieranoja S (2018) K-means properties on six clustering benchmark datasets. http://cs.uef.fi/sipu/datasets/. Accessed 17 Sep 2020
Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Elsevier, USA
Hang W, Choi KS, Wang S (2017) Synchronization clustering based on central force optimization and its extension for large-scale datasets. Knowl Based Syst 118:31–44
Hassani M, Seidl T (2017) Using internal evaluation measures to validate the quality of diverse stream clustering algorithms. Vietnam J Comput Sci 4(3):171–183
Hoffmann BS (2010) Similarity search with set intersection as a distance measure. Dissertation, University of Stuttgart
Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of ICNN’95-International Conference on Neural Networks, IEEE, vol 4, pp 1942–1948
Kerr MK, Churchill GA (2001) Bootstrapping cluster analysis: assessing the reliability of conclusions from microarray experiments. Proc Natl Acad Sci 98(16):8961–8965
Kostiainen T, Lampinen J (2001) Self-organizing map as a probability density model. In: IJCNN’01. International joint conference on neural networks. Proceedings (Cat. No. 01CH37222), IEEE, vol 1, pp 394–399
Kumar S, Pant M, Kumar M, Dutt A (2018) Colour image segmentation with histogram and homogeneity histogram difference using evolutionary algorithms. Int J Mach Learn Cybern 9(1):163–183
Kushwaha N, Pant M (2018) Fuzzy magnetic optimization clustering algorithm with its application to health care. J Ambient Intell Human Comput. https://doi.org/10.1007/s12652-018-0941-x
Lee CY, Antonsson E (2000) Dynamic partitional clustering using evolution strategies. In: Industrial Electronics Society, 2000. IECON 2000. 26th Annual Conference of the IEEE, IEEE, vol 4, pp 2716–2721
Liang X, Li W, Zhang Y, Zhou M (2015) An adaptive particle swarm optimization method based on clustering. Soft Comput Fusion Found Methodol Appl 19(2):431–448
Liu A, Su Y, Nie W, Kankanhalli MS (2017) Hierarchical clustering multi-task learning for joint human action grouping and recognition. IEEE Trans Pattern Anal Mach Intell 39(1):102–114
Liu Y, Wu X, Shen Y (2011) Automatic clustering using genetic algorithms. Appl Math Comput 218(4):1267–1279
Mansour EM, Ahmadi A (2019) A novel clustering algorithm based on fully-informed particle swarm. In: 2019 IEEE Congress on Evolutionary Computation (CEC), IEEE, pp 713–720
Martins JA, Mazayev A, Correia N, Schütz G, Barradas A (2017) Gacn: self-clustering genetic algorithm for constrained networks. IEEE Commun Lett 21(3):628–631
Mei JP, Wang Y, Chen L, Miao C (2017) Large scale document categorization with fuzzy clustering. IEEE Trans Fuzzy Syst 25(5):1239–1251
Mendes R, Vilela JP (2017) Privacy-preserving data mining: methods, metrics, and applications. IEEE Access 5:10562–10582
Mezni H, Arab SA, Benslimane D, Benouaret K (2020) An evolutionary clustering approach based on temporal aspects for context-aware service recommendation. J Ambient Intell Hum Comput 11(1):119–138
Nanda SJ, Panda G (2014) A survey on nature inspired metaheuristic algorithms for partitional clustering. Swarm Evol Comput 16:1–18
Nerurkar P, Shirke A, Chandane M, Bhirud S (2018) A novel heuristic for evolutionary clustering. Procedia Comput Sci 125:780–789
Ni Q, Pan Q, Du H, Cao C, Zhai Y (2017) A novel cluster head selection algorithm based on fuzzy clustering and particle swarm optimization. IEEE/ACM Trans Comput Biol Bioinf (TCBB) 14(1):76–84
Novikov A (2018) annoviko/pyclustering: pyclustering 0.8.2 release. https://doi.org/10.5281/zenodo.1491324. Accessed 17 Sep 2020
Özbakır L, Turna F (2017) Clustering performance comparison of new generation meta-heuristic algorithms. Knowl Based Syst 130:1–16
Ozyirmidokuz EK, Uyar K, Ozyirmidokuz MH (2015) A data mining based approach to a firm’s marketing channel. Procedia Econ Financ 27:77–84
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830
Peng P, Addam O, Elzohbi M, Özyer ST, Elhajj A, Gao S, Liu Y, Özyer T, Kaya M, Ridley M et al (2014) Reporting and analyzing alternative clustering solutions by employing multi-objective genetic algorithm and conducting experiments on cancer data. Knowl Based Syst 56:108–122
Pimpale RA, Butey P (2015) A review on nature inspired algorithms for clustering. Int J Emerg Trend Technol Comput Sci 4:58–62
Prakash J, Singh PK (2015) Particle swarm optimization with k-means for simultaneous feature selection and data clustering. In: 2015 Second International Conference on soft computing and machine intelligence (ISCMI), IEEE, pp 74–78
Qaddoura R, Al Manaseer W, Abushariah MA, Alshraideh MA (2020a) Dental radiography segmentation using expectation-maximization clustering and grasshopper optimizer. Multimed Tools Appl 79:22027–22045
Qaddoura R, Faris H, Aljarah I (2020b) An efficient clustering algorithm based on the k-nearest neighbors with an indexing ratio. Int J Mach Learn Cybern 11(3):675–714
Qaddoura R, Faris H, Aljarah I, Castillo PA (2020c) Evocluster: an open-source nature-inspired optimization clustering framework in python. In: International conference on the applications of evolutionary computation (Part of EvoStar), Springer, pp 20–36
Qasem M, Thulasiraman P (2019) Evaluation and validation of semi-supervised ant-inspired sentence-level sentiment prediction clustering. In: 2019 IEEE Congress on evolutionary computation (CEC), IEEE, pp 2738–2745
Rahman MA, Islam MZ (2014) A hybrid clustering technique combining a novel genetic algorithm with k-means. Knowl Based Syst 71:345–365
Raitoharju J, Samiee K, Kiranyaz S, Gabbouj M (2017) Particle swarm clustering fitness evaluation with computational centroids. Swarm Evol Comput 34:103–118
Romano S, Vinh NX, Bailey J, Verspoor K (2016) Adjusting for chance clustering comparison measures. J Mach Learn Res 17(1):4635–4666
Rosenberg A, Hirschberg J (2007) V-measure: a conditional entropy-based external cluster evaluation measure. EMNLP-CoNLL 7:410–420
Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65
Scully D (2010) Web-scale k-means clustering. In: Proceedings of the 19th international conference on World wide web, pp 1177–1178
Sharma M, Purohit G, Mukherjee S (2018) Information retrieves from brain mri images for tumor detection using hybrid technique k-means and artificial neural network (kmann). In: Networking communication and data knowledge engineering, Springer, pp 145–157
Sheikh RH, Raghuwanshi MM, Jaiswal AN (2008) Genetic algorithm based clustering: a survey. In: First international conference on emerging trends in engineering and technology, IEEE, pp 314–319
Shukri S, Faris H, Aljarah I, Mirjalili S, Abraham A (2018) Evolutionary static and dynamic clustering algorithms based on multi-verse optimizer. Eng Appl Artif Intell 72:54–66
Siddiqi UF, Sait SM (2017) A new heuristic for the data clustering problem. IEEE Access 5:6801–6812
Srivastava V, Tripathi BK, Pathak VK (2014) Biometric recognition by hybridization of evolutionary fuzzy clustering with functional neural networks. J Ambient Intell Hum Comput 5(4):525–537
Steinhaus H (1956) Sur la division des corps materiels en parties. Bull Acad Polon Sci 4:801–804
Steinley D, Brusco MJ, Hubert L (2016) The variance of the adjusted rand index. Psychol Methods 21(2):261
Storn R, Price K (1997) Differential evolution-a simple and efficient heuristic for global optimization over continuous spaces. J Glob Optim 11(4):341–359
Tam HH, Ng SC, Lui AK, Leung MF (2017) Improved activation schema on automatic clustering using differential evolution algorithm. In: 2017 IEEE Congress on Evolutionary Computation (CEC), IEEE, pp 1749–1756
Vinh NX, Epps J, Bailey J (2010) Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance. J Mach Learn Res 11:2837–2854
Wu ZX, Huang KW, Chen JL, Yang CS (2019) A memetic fuzzy whale optimization algorithm for data clustering. In: 2019 IEEE Congress on Evolutionary Computation (CEC), IEEE, pp 1446–1452
Xu R, Xu J, Wunsch DC (2012) A comparison study of validity indices on swarm-intelligence-based clustering. IEEE Trans Syst Man Cybern Part B (Cybernetics) 42(4):1243–1256
Zhang T, Ramakrishnan R, Livny M (1996) Birch: an efficient data clustering method for very large databases. ACM Sigmod Record, ACM 25:103–114
Zhou Y, Wu H, Luo Q, Abdel-Baset M (2019) Automatic data clustering using nature-inspired symbiotic organism search algorithm. Knowl Based Syst 163:546–557
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Qaddoura, R., Faris, H. & Aljarah, I. An efficient evolutionary algorithm with a nearest neighbor search technique for clustering analysis. J Ambient Intell Human Comput 12, 8387–8412 (2021). https://doi.org/10.1007/s12652-020-02570-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-020-02570-2