Skip to main content

Advertisement

Log in

An efficient evolutionary algorithm with a nearest neighbor search technique for clustering analysis

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

Evolutionary algorithms have shown their powerful capabilities in different machine learning problems including clustering which is a growing area of research nowadays. In this paper, we propose an efficient clustering technique based on the evolution behavior of genetic algorithm and an advanced variant of nearest neighbor search technique based on assignment and election mechanisms. The goal of the proposed algorithm is to improve the quality of clustering results by finding a solution that maximizes the separation between different clusters and maximizes the cohesion between data points in the same cluster. Our proposed algorithm which we refer to as “EvoNP” is tested with 15 well-known data sets using 5 well-known external evaluation measures and is compared with 7 well-regarded clustering algorithms . The experiments are conducted in two phases: evaluation of the best fitness function for the algorithm and evaluation of the algorithm against other clustering algorithms. The results show that the proposed algorithm works well with silhouette coefficient fitness function and outperforms the other algorithms for the majority of the data sets. The source code of EvoNP is available at http://evo-ml.com/evonp/.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. http://scikit-learn.org/stable/modules/clustering.html.

  2. https://pypi.org/project/pyclustering/.

  3. http://yarpiz.com/64/ypml101-evolutionary-clustering.

  4. https://archive.ics.uci.edu/ml/.

  5. http://cs.uef.fi/sipu/datasets/.

  6. https://elki-project.github.io/datasets/.

  7. https://www.naftaliharris.com/blog/visualizing-K-means-clustering/.

References

  • Aljarah I, Ludwig SA (2013) A new clustering approach based on glowworm swarm optimization. In: 2013 IEEE congress on evolutionary computation. IEEE, pp 2642–2649

  • Aljarah I, Ala’M A-Z, Faris H, Hassonah MA, Mirjalili S, Saadeh H, (2018) Simultaneous feature selection and support vector machine optimization using the grasshopper optimization algorithm. Cogn Comput 10:478–495

    Article  Google Scholar 

  • Aljarah I, Mafarja M, Heidari AA, Faris H, Mirjalili S (2020a) Clustering analysis using a novel locality-informed grey wolf-inspired clustering approach. Knowl Inf Syst 62(2):507–539

    Article  Google Scholar 

  • Aljarah I, Mafarja M, Heidari AA, Faris H, Mirjalili S (2020b) Multi-verse optimizer: theory, literature review, and application in data clustering. In: Mirjalili S, Song Dong J, Lewis A (eds) Nature-inspired optimizers, vol 811. Springer, Cham, pp 123–141

    Chapter  Google Scholar 

  • Ankerst M, Breunig MM, Kriegel HP, Sander J (1999) Optics: ordering points to identify the clustering structure. ACM Sigmod Record, ACM 28:49–60

    Article  Google Scholar 

  • Anton H (2013) Elementary linear algebra. Binder ready version. Wiley, New York

    Google Scholar 

  • Arthur D, Vassilvitskii S (2007) k-means++: the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms, Society for Industrial and Applied Mathematics, pp 1027–1035

  • Beg A, Islam MZ (2015) Clustering by genetic algorithm-high quality chromosome selection for initial population. In: 2015 IEEE 10th Conference on industrial electronics and applications (ICIEA), IEEE, pp 129–134

  • Caliński T, Harabasz J (1974) A dendrite method for cluster analysis. Commun Stat Theory Methods 3(1):1–27

    Article  MathSciNet  MATH  Google Scholar 

  • Chen J, Liu D, Hao F, Wang H (2020) Community detection in dynamic signed network: an intimacy evolutionary clustering algorithm. J Ambient Intell Hum Comput 11(2):891–900

    Article  Google Scholar 

  • Chen S, Ma B, Zhang K (2009) On the similarity metric and the distance metric. Theor Comput Sci 410(24–25):2365–2376

    Article  MathSciNet  MATH  Google Scholar 

  • de Andrade Silva J, Hruschka ER, Gama J (2017) An evolutionary algorithm for clustering data streams with a variable number of clusters. Expert Syst Appl 67:228–238

    Article  Google Scholar 

  • Dheeru D, Karra Taniskidou E (2017) UCI machine learning repository. http://archive.ics.uci.edu/ml. Accessed 17 Sep 2020

  • Djenouri Y, Belhadi A, Fournier-Viger P, Lin JCW (2018) Fast and effective cluster-based information retrieval using frequent closed itemsets. Inf Sci 453:154–167

    Article  MathSciNet  MATH  Google Scholar 

  • Dunn JC (1973) A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters. J Cybern 3(3):32–57

    Article  MathSciNet  MATH  Google Scholar 

  • Ester M, Kriegel HP, Sander J, Xu X et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. KDD 96:226–231

    Google Scholar 

  • Fränti P, Sieranoja S (2018) K-means properties on six clustering benchmark datasets. http://cs.uef.fi/sipu/datasets/. Accessed 17 Sep 2020

  • Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Elsevier, USA

    MATH  Google Scholar 

  • Hang W, Choi KS, Wang S (2017) Synchronization clustering based on central force optimization and its extension for large-scale datasets. Knowl Based Syst 118:31–44

    Article  Google Scholar 

  • Hassani M, Seidl T (2017) Using internal evaluation measures to validate the quality of diverse stream clustering algorithms. Vietnam J Comput Sci 4(3):171–183

    Article  Google Scholar 

  • Hoffmann BS (2010) Similarity search with set intersection as a distance measure. Dissertation, University of Stuttgart

  • Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of ICNN’95-International Conference on Neural Networks, IEEE, vol 4, pp 1942–1948

  • Kerr MK, Churchill GA (2001) Bootstrapping cluster analysis: assessing the reliability of conclusions from microarray experiments. Proc Natl Acad Sci 98(16):8961–8965

    Article  MATH  Google Scholar 

  • Kostiainen T, Lampinen J (2001) Self-organizing map as a probability density model. In: IJCNN’01. International joint conference on neural networks. Proceedings (Cat. No. 01CH37222), IEEE, vol 1, pp 394–399

  • Kumar S, Pant M, Kumar M, Dutt A (2018) Colour image segmentation with histogram and homogeneity histogram difference using evolutionary algorithms. Int J Mach Learn Cybern 9(1):163–183

    Article  Google Scholar 

  • Kushwaha N, Pant M (2018) Fuzzy magnetic optimization clustering algorithm with its application to health care. J Ambient Intell Human Comput. https://doi.org/10.1007/s12652-018-0941-x

    Article  Google Scholar 

  • Lee CY, Antonsson E (2000) Dynamic partitional clustering using evolution strategies. In: Industrial Electronics Society, 2000. IECON 2000. 26th Annual Conference of the IEEE, IEEE, vol 4, pp 2716–2721

  • Liang X, Li W, Zhang Y, Zhou M (2015) An adaptive particle swarm optimization method based on clustering. Soft Comput Fusion Found Methodol Appl 19(2):431–448

    Google Scholar 

  • Liu A, Su Y, Nie W, Kankanhalli MS (2017) Hierarchical clustering multi-task learning for joint human action grouping and recognition. IEEE Trans Pattern Anal Mach Intell 39(1):102–114

    Article  Google Scholar 

  • Liu Y, Wu X, Shen Y (2011) Automatic clustering using genetic algorithms. Appl Math Comput 218(4):1267–1279

    MathSciNet  MATH  Google Scholar 

  • Mansour EM, Ahmadi A (2019) A novel clustering algorithm based on fully-informed particle swarm. In: 2019 IEEE Congress on Evolutionary Computation (CEC), IEEE, pp 713–720

  • Martins JA, Mazayev A, Correia N, Schütz G, Barradas A (2017) Gacn: self-clustering genetic algorithm for constrained networks. IEEE Commun Lett 21(3):628–631

    Article  Google Scholar 

  • Mei JP, Wang Y, Chen L, Miao C (2017) Large scale document categorization with fuzzy clustering. IEEE Trans Fuzzy Syst 25(5):1239–1251

    Article  Google Scholar 

  • Mendes R, Vilela JP (2017) Privacy-preserving data mining: methods, metrics, and applications. IEEE Access 5:10562–10582

    Article  Google Scholar 

  • Mezni H, Arab SA, Benslimane D, Benouaret K (2020) An evolutionary clustering approach based on temporal aspects for context-aware service recommendation. J Ambient Intell Hum Comput 11(1):119–138

    Article  Google Scholar 

  • Nanda SJ, Panda G (2014) A survey on nature inspired metaheuristic algorithms for partitional clustering. Swarm Evol Comput 16:1–18

    Article  Google Scholar 

  • Nerurkar P, Shirke A, Chandane M, Bhirud S (2018) A novel heuristic for evolutionary clustering. Procedia Comput Sci 125:780–789

    Article  Google Scholar 

  • Ni Q, Pan Q, Du H, Cao C, Zhai Y (2017) A novel cluster head selection algorithm based on fuzzy clustering and particle swarm optimization. IEEE/ACM Trans Comput Biol Bioinf (TCBB) 14(1):76–84

    Article  Google Scholar 

  • Novikov A (2018) annoviko/pyclustering: pyclustering 0.8.2 release. https://doi.org/10.5281/zenodo.1491324. Accessed 17 Sep 2020

  • Özbakır L, Turna F (2017) Clustering performance comparison of new generation meta-heuristic algorithms. Knowl Based Syst 130:1–16

    Article  Google Scholar 

  • Ozyirmidokuz EK, Uyar K, Ozyirmidokuz MH (2015) A data mining based approach to a firm’s marketing channel. Procedia Econ Financ 27:77–84

    Article  Google Scholar 

  • Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830

    MathSciNet  MATH  Google Scholar 

  • Peng P, Addam O, Elzohbi M, Özyer ST, Elhajj A, Gao S, Liu Y, Özyer T, Kaya M, Ridley M et al (2014) Reporting and analyzing alternative clustering solutions by employing multi-objective genetic algorithm and conducting experiments on cancer data. Knowl Based Syst 56:108–122

    Article  Google Scholar 

  • Pimpale RA, Butey P (2015) A review on nature inspired algorithms for clustering. Int J Emerg Trend Technol Comput Sci 4:58–62

    Google Scholar 

  • Prakash J, Singh PK (2015) Particle swarm optimization with k-means for simultaneous feature selection and data clustering. In: 2015 Second International Conference on soft computing and machine intelligence (ISCMI), IEEE, pp 74–78

  • Qaddoura R, Al Manaseer W, Abushariah MA, Alshraideh MA (2020a) Dental radiography segmentation using expectation-maximization clustering and grasshopper optimizer. Multimed Tools Appl 79:22027–22045

    Article  Google Scholar 

  • Qaddoura R, Faris H, Aljarah I (2020b) An efficient clustering algorithm based on the k-nearest neighbors with an indexing ratio. Int J Mach Learn Cybern 11(3):675–714

    Article  Google Scholar 

  • Qaddoura R, Faris H, Aljarah I, Castillo PA (2020c) Evocluster: an open-source nature-inspired optimization clustering framework in python. In: International conference on the applications of evolutionary computation (Part of EvoStar), Springer, pp 20–36

  • Qasem M, Thulasiraman P (2019) Evaluation and validation of semi-supervised ant-inspired sentence-level sentiment prediction clustering. In: 2019 IEEE Congress on evolutionary computation (CEC), IEEE, pp 2738–2745

  • Rahman MA, Islam MZ (2014) A hybrid clustering technique combining a novel genetic algorithm with k-means. Knowl Based Syst 71:345–365

    Article  Google Scholar 

  • Raitoharju J, Samiee K, Kiranyaz S, Gabbouj M (2017) Particle swarm clustering fitness evaluation with computational centroids. Swarm Evol Comput 34:103–118

    Article  Google Scholar 

  • Romano S, Vinh NX, Bailey J, Verspoor K (2016) Adjusting for chance clustering comparison measures. J Mach Learn Res 17(1):4635–4666

    MathSciNet  MATH  Google Scholar 

  • Rosenberg A, Hirschberg J (2007) V-measure: a conditional entropy-based external cluster evaluation measure. EMNLP-CoNLL 7:410–420

    Google Scholar 

  • Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65

    Article  MATH  Google Scholar 

  • Scully D (2010) Web-scale k-means clustering. In: Proceedings of the 19th international conference on World wide web, pp 1177–1178

  • Sharma M, Purohit G, Mukherjee S (2018) Information retrieves from brain mri images for tumor detection using hybrid technique k-means and artificial neural network (kmann). In: Networking communication and data knowledge engineering, Springer, pp 145–157

  • Sheikh RH, Raghuwanshi MM, Jaiswal AN (2008) Genetic algorithm based clustering: a survey. In: First international conference on emerging trends in engineering and technology, IEEE, pp 314–319

  • Shukri S, Faris H, Aljarah I, Mirjalili S, Abraham A (2018) Evolutionary static and dynamic clustering algorithms based on multi-verse optimizer. Eng Appl Artif Intell 72:54–66

    Article  Google Scholar 

  • Siddiqi UF, Sait SM (2017) A new heuristic for the data clustering problem. IEEE Access 5:6801–6812

    Article  Google Scholar 

  • Srivastava V, Tripathi BK, Pathak VK (2014) Biometric recognition by hybridization of evolutionary fuzzy clustering with functional neural networks. J Ambient Intell Hum Comput 5(4):525–537

    Article  Google Scholar 

  • Steinhaus H (1956) Sur la division des corps materiels en parties. Bull Acad Polon Sci 4:801–804

    MathSciNet  MATH  Google Scholar 

  • Steinley D, Brusco MJ, Hubert L (2016) The variance of the adjusted rand index. Psychol Methods 21(2):261

    Article  Google Scholar 

  • Storn R, Price K (1997) Differential evolution-a simple and efficient heuristic for global optimization over continuous spaces. J Glob Optim 11(4):341–359

    Article  MathSciNet  MATH  Google Scholar 

  • Tam HH, Ng SC, Lui AK, Leung MF (2017) Improved activation schema on automatic clustering using differential evolution algorithm. In: 2017 IEEE Congress on Evolutionary Computation (CEC), IEEE, pp 1749–1756

  • Vinh NX, Epps J, Bailey J (2010) Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance. J Mach Learn Res 11:2837–2854

    MathSciNet  MATH  Google Scholar 

  • Wu ZX, Huang KW, Chen JL, Yang CS (2019) A memetic fuzzy whale optimization algorithm for data clustering. In: 2019 IEEE Congress on Evolutionary Computation (CEC), IEEE, pp 1446–1452

  • Xu R, Xu J, Wunsch DC (2012) A comparison study of validity indices on swarm-intelligence-based clustering. IEEE Trans Syst Man Cybern Part B (Cybernetics) 42(4):1243–1256

    Article  Google Scholar 

  • Zhang T, Ramakrishnan R, Livny M (1996) Birch: an efficient data clustering method for very large databases. ACM Sigmod Record, ACM 25:103–114

    Article  Google Scholar 

  • Zhou Y, Wu H, Luo Q, Abdel-Baset M (2019) Automatic data clustering using nature-inspired symbiotic organism search algorithm. Knowl Based Syst 163:546–557

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ibrahim Aljarah.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qaddoura, R., Faris, H. & Aljarah, I. An efficient evolutionary algorithm with a nearest neighbor search technique for clustering analysis. J Ambient Intell Human Comput 12, 8387–8412 (2021). https://doi.org/10.1007/s12652-020-02570-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-020-02570-2

Keywords

Navigation