Skip to main content

Advertisement

Log in

EvoCluster: An Open-Source Nature-Inspired Optimization Clustering Framework

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

EvoCluster is an open source and cross-platform framework implemented in Python language, which includes the most well-known and recent nature-inspired metaheuristic optimizers that are customized to perform partitional clustering tasks. This paper is an extension to the existing EvoCluster framework in which it includes different distance measures for the objective function, different techniques of detecting the k value, and a user option to consider either supervised or unsupervised datasets. The current implementation of the framework includes ten metaheuristic optimizers, thirty datasets, five objective functions, twelve evaluation measures, more than twenty distance measures, and ten different ways for detecting the k value. The source code of EvoCluster is publicly available at http://evo-ml.com/evocluster/.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. http://scikit-learn.org/stable/datasets/index.html.

  2. https://archive.ics.uci.edu/ml/.

  3. http://cs.uef.fi/sipu/datasets/.

  4. https://elki-project.github.io/datasets/.

  5. https://sci2s.ugr.es/keel/datasets.php.

  6. https://www.naftaliharris.com/blog/visualizing-K-means-clustering/.

References

  1. Achtert E, Kriegel HP, Zimek A. Elki: a software system for evaluation of subspace clustering algorithms. In: International conference on scientific and statistical database management. Springer. 2008. p. 580–585.

  2. Al-Madi N., Aljarah I, Ludwig SA. Parallel glowworm swarm optimization clustering algorithm based on MapReduce. 2014 IEEE Symposium on Swarm Intelligence, Orlando, FL, USA, 2014. pp. 1–8. https://doi.org/10.1109/SIS.2014.7011794.

  3. Aljarah I, Ala’M AZ, Faris H, Hassonah MA, Mirjalili S, Saadeh H. Simultaneous feature selection and support vector machine optimization using the grasshopper optimization algorithm. Cogn Comput. 2018;10(3):478–495.

    Article  Google Scholar 

  4. Aljarah I, Ludwig SA. Parallel particle swarm optimization clustering algorithm based on MapReduce methodology. 2012 Fourth World Congress on Nature and Biologically Inspired Computing (NaBIC), Mexico City, Mexico, 2012, pp. 104–111. https://doi.org/10.1109/NaBIC.2012.6402247.

  5. Aljarah I, Ludwig SA. MapReduce intrusion detection system based on a particle swarm optimization clustering algorithm. 2013 IEEE Congress on evolutionary computation, Cancun, Mexico, 2013, pp. 955–962. https://doi.org/10.1109/CEC.2013.6557670.

  6. Aljarah I, Ludwig SA. A new clustering approach based on Glowworm Swarm Optimization. 2013 IEEE congress on evolutionary computation, cancun, Mexico, 2013, pp. 2642–2649. https://doi.org/10.1109/CEC.2013.6557888.

  7. Aljarah I, Mafarja M, Heidari AA, Faris H, Mirjalili S. Clustering analysis using a novel locality-informed grey wolf-inspired clustering approach. Knowl Inf Syst. 2020;62(2):507–539.

    Article  Google Scholar 

  8. Aljarah I, Mafarja M, Heidari AA, Faris H, Mirjalili S. Multi-verse optimizer: theory, literature review, and application in data clustering. In: Nature-inspired optimizers. Springer; 2020. p. 123–141.

  9. Beyer HG, Schwefel HP. Evolution strategies: a comprehensive introduction. Nat Comput. 2002;1(1):3–52. https://doi.org/10.1023/A:1015059928466.

    Article  MathSciNet  MATH  Google Scholar 

  10. Cahon S, Melab N, Talbi EG. Paradiseo: a framework for the reusable design of parallel and distributed metaheuristics. J Heuristics. 2004;10(3):357–80. https://doi.org/10.1023/B:HEUR.0000026900.92269.ec.

    Article  MATH  Google Scholar 

  11. Chang DX, Zhang XD, Zheng CW. A genetic algorithm with gene rearrangement for k-means clustering. Pattern Recognit. 2009;42(7):1210–22.

    Article  Google Scholar 

  12. Chang S, Shihong Y, Qi L. Clustering Characteristics of UCI Dataset. 2020 39th Chinese Control Conference (CCC), Shenyang, China, 2020, pp. 6301–6306. https://doi.org/10.23919/CCC50068.2020.9189507.

  13. Chowdhury K, Chaudhuri D, Pal AK. A novel objective function based clustering with optimal number of clusters. In: Methodologies and application issues of contemporary computing framework. Springer, Singapore; 2018; pp. 23–32.

  14. Davies DL, Bouldin DW. A cluster separation measure. IEEE Trans Pattern Anal Mach Intell. 1979;2:224–7.

    Article  Google Scholar 

  15. Durillo JJ, Nebro AJ. jmetal: a java framework for multi-objective optimization. Adv Eng Softw. 2011;42:760–71.

    Article  Google Scholar 

  16. Faris H, Aljarah I, Mirjalili S, Castillo P, Merelo J. EvoloPy: an Open-source Nature-inspired optimization framework in python. In: 2020 Proceedings of the 8th international joint conference on computational intelligence - Volume 3: ECTA, (IJCCI 2016) pp. 171–177. ISBN: 978-989-758-201-1. https://doi.org/10.5220/0006048201710177.

  17. Finch H. Comparison of distance measures in cluster analysis with dichotomous data. J Data Sci. 2005;3(1):85–100.

    Google Scholar 

  18. Fortin FA, De Rainville FM, Gardner MA, Parizeau M, Gagné C. DEAP: evolutionary algorithms made easy. J Mach Learn Res. 2012;13:2171–5.

    MathSciNet  Google Scholar 

  19. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH. The weka data mining software: an update. ACM SIGKDD Explor Newsl. 2009;11(1):10–8.

    Article  Google Scholar 

  20. Hartmut Pohlheim: Geatbx: the genetic and evolutionary algorithm toolbox for matlab (2006). http://www.geatbx.com/. Accessed 28 Feb 2021.

  21. Hassani M, Seidl T. Using internal evaluation measures to validate the quality of diverse stream clustering algorithms. Vietnam J Comput Sci. 2017;4(3):171–83.

    Article  Google Scholar 

  22. Holland J. Genetic algorithms. New York: Scientific American; 1992. p. 66–72.

    Google Scholar 

  23. Huang A. Similarity measures for text document clustering. In: Proceedings of the sixth New Zealand computer science research student conference (NZCSRSC2008), vol. 4. New Zealand: Christchurch; 2008. pp. 9–56.

  24. Hubert L, Arabie P. Comparing partitions. J Classif. 1985;2(1):193–218.

    Article  Google Scholar 

  25. Hughes EJ. Evolutionary multi-objective ranking with uncertainty and noise. In: International conference on evolutionary Multi-Criterion optimization. Springer, Berlin, Heidelberg; 2001. pp. 329–343.

  26. Keijzer M, Merelo, JJ, Romero G, Schoenauer M. Evolving objects: a general purpose evolutionary computation library. In: International conference on artificial evolution (Evolution Artificielle). Springer, Berlin, Heidelberg; 2001. pp. 231–242.

  27. Kennedy J, Eberhart R. Particle swarm optimization. Proceedings of ICNN'95 - International conference on neural networks. Perth, WA, Australia. 1995. pp. 1942–1948. https://doi.org/10.1109/ICNN.1995.488968.

  28. Khurma RA, Aljarah I, Sharieh A, Mirjalili S. Evolopy-fs: An open-source nature-inspired optimization framework in python for feature selection. In book: Evolutionary machine learning techniques. Springer, Singapore. 2020. pp. 131–173.

  29. Kingrani SK, Levene M, Zhang D. Estimating the number of clusters using diversity. Artif Intell Res. 2018;7(1):15–22.

    Article  Google Scholar 

  30. Klawonn F, Keller A. Fuzzy clustering based on modified distance measures. In: International symposium on intelligent data analysis. Springer; 1999. p. 291–301.

  31. Korošec P, Šilc JA. distributed ant-based algorithm for numerical optimization. In: Proceedings of the 2009 workshop on Bio-inspired algorithms for distributed systems-BADS 09. Association for computing machinery (ACM). 2009. p. 37–44. https://doi.org/10.1145/1555284.1555291.

  32. Krishna TS, Babu AY, Kumar RK. Determination of optimal clusters for a Non-hierarchical clustering paradigm K-Means algorithm. In: Proceedings of international conference on computational intelligence and data engineering; Springer, Singapore. 2018. pp. 301–316.

  33. Kumar S, Pant M, Kumar M, Dutt A. Colour image segmentation with histogram and homogeneity histogram difference using evolutionary algorithms. Int J Mach Learn Cybern. 2018;9(1):163–183.

    Article  Google Scholar 

  34. Liu A, Su Y, Nie W, Kankanhalli MS. Hierarchical clustering multi-task learning for joint human action grouping and recognition. IEEE Trans Pattern Anal Mach Intell. 2017;39(1):102–14.

    Article  Google Scholar 

  35. Magni P, Ferrazzi F, Sacchi L, Bellazzi R. Timeclust: a clustering tool for gene expression time series. Bioinformatics. 2007;24(3):430–2.

    Article  Google Scholar 

  36. Matthew Wall: Galib: A c++ library of genetic algorithm components (1996). http://lancet.mit.edu/ga/. Accessed 28 Feb 2021.

  37. Mhembere D, Zheng D, Priebe CE, Vogelstein JT, Burns R. clusternor: a numa-optimized clustering framework. 2019. arXiv preprint arXiv:1902.09527

  38. Mirjalili S. Moth-flame optimization algorithm: a novel nature-inspired heuristic paradigm. Knowl Based Syst. 2015;89:228–49. https://doi.org/10.1016/j.knosys.2015.07.006.

    Article  Google Scholar 

  39. Mirjalili S, Gandomi AH, Mirjalili SZ, Saremi S, Faris H, Mirjalili SM. Salp swarm algorithm: a bio-inspired optimizer for engineering design problems. Adv Eng Softw. 2017;114:163–91.

    Article  Google Scholar 

  40. Mirjalili S, Lewis A. The whale optimization algorithm. Adv Eng Softw. 2016;95:51–67.

    Article  Google Scholar 

  41. Mirjalili S, Mirjalili SM, Hatamlou A. Multi-verse optimizer: a nature-inspired algorithm for global optimization. Neural Comput Appl. 2016;27(2):495–513. https://doi.org/10.1007/s00521-015-1870-7.

    Article  Google Scholar 

  42. Mirjalili S, Mirjalili SM, Lewis A. Grey wolf optimizer. Adv Eng Softw. 2014;69:46–61.

    Article  Google Scholar 

  43. Nanda SJ, Panda G. A survey on nature inspired metaheuristic algorithms for partitional clustering. Swarm Evol Comput. 2014;16:1–18.

    Article  Google Scholar 

  44. Paukkeri MS, Kivimäki I, Tirunagari S, Oja E, Honkela T. Effect of dimensionality reduction on different distance measures in document clustering. In: International conference on neural information processing. Springer; 2011. p. 167–176.

  45. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, et al. Scikit-learn: machine learning in python. J Mach Learn Res. 2011;12(Oct):2825–30.

    MathSciNet  MATH  Google Scholar 

  46. Peng P, Addam O, Elzohbi M, Özyer ST, Elhajj A, Gao S, Liu Y, Özyer T, Kaya M, Ridley M, et al. Reporting and analyzing alternative clustering solutions by employing multi-objective genetic algorithm and conducting experiments on cancer data. Knowl Based Syst. 2014;56:108–22.

    Article  Google Scholar 

  47. Prakash J, Singh PK. Particle swarm optimization with k-means for simultaneous feature selection and data clustering. In: 2015 second international conference on soft computing and machine intelligence (ISCMI). IEEE; 2015. p. 74–78.

  48. Qaddoura R, Al Manaseer W, Abushariah MA, Alshraideh MA. Dental radiography segmentation using expectation-maximization clustering and grasshopper optimizer. Multimed Tools Appl. 2020;79:22027–45.

    Article  Google Scholar 

  49. Qaddoura R, Faris H, Aljarah I. An efficient clustering algorithm based on the k-nearest neighbors with an indexing ratio. Int J Mach Learn Cybern. 2020;11(3):675–714.

    Article  Google Scholar 

  50. Qaddoura R, Faris H, Aljarah I. An efficient evolutionary algorithm with a nearest neighbor search technique for clustering analysis. J Ambient Intell Humaniz Comput. 2020;1–26.

  51. Qaddoura R, Faris H, Aljarah I, Castillo PA. Evocluster: an open-source nature-inspired optimization clustering framework in python. In: International conference on the applications of evolutionary computation (Part of EvoStar). Springer; 2020. p. 20–36.

  52. Qaddoura R, Faris H, Aljarah I, Merelo J, Castillo P. Empirical evaluation of distance measures for nearest point with indexing ratio clustering algorithm. In: Proceedings of the 12th International joint conference on computational intelligence - Vol 1. NCTA, pp. 430-438. ISBN 978-989-758-475-6 2020. https://doi.org/10.5220/0010121504300438.

  53. Qaddoura R, Aljarah I, Faris H, Mirjalili S. A grey Wolf-Based clustering algorithm for medical diagnosis problems. In: Aljarah I, Faris H, Mirjalili S. (eds) Evolutionary data clustering: algorithms and applications. Algorithms for intelligent systems. Springer, Singapore. 2021. pp. 73–87. https://doi.org/10.1007/978-981-33-4191-3_3.

  54. Qaddoura R, Aljarah I, Faris H, Almomani I. A. classification approach based on evolutionary clustering and its application for ransomware detection. In: Aljarah I, Faris H, Mirjalili S. (eds) Evolutionary data clustering: algorithms and applications. Algorithms for intelligent systems. Springer, Singapore. 2021. pp. 237–248. https://doi.org/10.1007/978-981-33-4191-3_11.

  55. Raitoharju J, Samiee K, Kiranyaz S, Gabbouj M. Particle swarm clustering fitness evaluation with computational centroids. Swarm Evol Comput. 2017;34:103–118.

    Article  Google Scholar 

  56. Rand WM. Objective criteria for the evaluation of clustering methods. J Am Stat Assoc. 1971;66(336):846–50.

    Article  Google Scholar 

  57. Risso D, Purvis L, Fletcher RB, Das D, Ngai J, Dudoit S, Purdom E. clusterexperiment and rsec: a bioconductor package and framework for clustering of single-cell and other large gene expression datasets. PLoS Comput Biol. 2018;14(9):e1006378.

    Article  Google Scholar 

  58. Robles-Berumen H, Zafra A, Fardoun HM, Ventura S. Leac: an efficient library for clustering with evolutionary algorithms. Knowl Based Syst. 2019;179:117–9.

    Article  Google Scholar 

  59. Romano S, Vinh NX, Bailey J, Verspoor K. Adjusting for chance clustering comparison measures. J Mach Learn Res. 2016;17(1):4635–66.

    MathSciNet  MATH  Google Scholar 

  60. Rosenberg A, Hirschberg J. V-measure: a conditional entropy-based external cluster evaluation measure. EMNLP-CoNLL. 2007;7:410–20.

    Google Scholar 

  61. Sheikh RH, Raghuwanshi MM, Jaiswal AN. Genetic algorithm based clustering: a survey. In: First international conference on emerging trends in engineering and technology. IEEE; 2008. p. 314–319.

  62. Shi Y, Eberhart R. A modified particle swarm optimizer. In: 1998 IEEE international conference on evolutionary computation proceedings. IEEE world congress on computational intelligence (Cat. No. 98TH8360). IEEE; 1998. p. 69–73.

  63. Shukri S, Faris H, Aljarah I, Mirjalili S, Abraham A. Evolutionary static and dynamic clustering algorithms based on multi-verse optimizer. Eng Appl Artif Intell. 2018;72:54–66.

    Article  Google Scholar 

  64. Vergara VM, Salman M, Abrol A, Espinoza FA, Calhoun VD. Determining the number of states in dynamic functional connectivity using cluster validity indexes. J Neurosci Methods. 2020;337:108651.

    Article  Google Scholar 

  65. Vinh NX, Epps J, Bailey J. Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. J Mach Learn Res. 2010;11(Oct):2837–54.

    MathSciNet  MATH  Google Scholar 

  66. Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, Cournapeau D, Burovski E, Peterson P, Weckesser W, Bright J, van der Walt SJ, Brett M, Wilson J, Millman KJ, Mayorov N, Nelson ARJ, Jones E, Kern R, Larson E, Carey C, Polat İ, Feng Y, Moore EW, ErPlas JV, Laxalde D, Perktold J, Cimrman R, Henriksen I, Quintero EA, Harris CR, Archibald AM, Ribeiro AH, Pedregosa F, van Mulbregt P, Contributors S. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods. 2020;17:261–72. https://doi.org/10.1038/s41592-019-0686-2.

    Article  Google Scholar 

  67. Vrbančič G, Brezočnik L, Mlakar U, Fister D, Fister I Jr. Niapy: Python microframework for building nature-inspired algorithms. J Open Sour Softw. 2018;3:613.

    Article  Google Scholar 

  68. Wagner S, Affenzeller M. The heuristiclab optimization environment. Tech. rep., University of Applied Sciences Upper Austria (2004). http://dev.heuristiclab.com/trac.fcgi/. Accessed 28 Feb 2021.

  69. Wilson GC, Mc Intyre A, Heywood MI. Resource review: three open source systems for evolving programs-lilgp, ecj and grammatical evolution. Genet Program Evol Mach. 2004;5(1):103–5.

    Article  Google Scholar 

  70. Wiwie C, Baumbach J, Röttger R. Comparing the performance of biomedical clustering methods. Nat Methods. 2015;12(11):1033.

    Article  Google Scholar 

  71. Yang XS. Firefly algorithm, stochastic test functions and design optimisation. Int J Bioinspired Comput. 2010;2(2):78–84. https://doi.org/10.1504/IJBIC.2010.032124.

    Article  Google Scholar 

  72. Yang XS. A new metaheuristic bat-inspired algorithm. In: González JR, Pelta DA, Cruz C, Terrazas G, Krasnogor N, editors. Nature inspired cooperative strategies for optimization (NICSO 2010). Berlin: Springer; 2010. p. 65–74. https://doi.org/10.1007/978-3-642-12538-6_6.

    Chapter  Google Scholar 

  73. Yang XS, Deb S. Cuckoo search via levy flights. In: World congress on nature biologically inspired computing, NaBIC; Coimbatore, India; 2009. p. 210–214. https://doi.org/10.1109/NABIC.2009.5393690

Download references

Acknowledgements

This work has been supported in part by: Ministerio español de Economía y Competitividad under project TIN2017-85727-C4-2-P (UGR-DeepBio).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pedro A. Castillo.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Evolution, the New AI Revolution” guest edited by Anikó Ekárt and Anna Isabel Esparcia-Alcázar.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qaddoura, R., Faris, H., Aljarah, I. et al. EvoCluster: An Open-Source Nature-Inspired Optimization Clustering Framework. SN COMPUT. SCI. 2, 185 (2021). https://doi.org/10.1007/s42979-021-00511-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-021-00511-0

Keywords

Navigation