The field of Metaheuristics has produced a large number of algorithms for continuous, black-box optimization. In contrast, there are few standard benchmark problem sets, limiting our ability to gain insight into the empirical performance of these algorithms. Clustering problems have been used many times in the literature to evaluate optimization algorithms. However, much of this work has occurred independently on different problem instances and the various experimental methodologies used have produced results which are frequently incomparable and provide little knowledge regarding the difficulty of the problems used, or any platform for comparing and evaluating the performance of algorithms. This paper discusses sum of squares clustering problems from the optimization viewpoint. Properties of the fitness landscape are analysed and it is proposed that these problems are highly suitable for algorithm benchmarking. A set of 27 problem instances (from 4-D to 40-D), based on three well-known datasets, is specified. Baseline experimental results are presented for the Covariance Matrix Adaptation-Evolution Strategy and several other standard algorithms. A web-repository has also been created for this problem set to facilitate future use for algorithm evaluation and comparison.

Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Berthier V (2015) Progressive differential evolution on clustering real world problems. In: Artificial evolution 2015, EA 2015—international conference on artificial evolution. Springer, Lyon. https://hal.inria.fr/hal-01215803
Blake C, Keogh E, Merz C (1998) UCI repository of machine learning databases. Retrieved from http://www.ics.uci.edu/~mlearn/MLRepository.html
Brimberg J, Hansen P, Mladenovic N, Taillard ED (2000) Improvements and comparison of heuristics for solving the uncapacitated multisource Weber problem. Oper Res 48(3):444–460
Chang DX, Zhang XD, Zheng CW (2009) A genetic algorithm with gene rearrangement for k-means clustering. Pattern Recognit 42(7):1210–1222
Du Merle O, Hansen P, Jaumard B, Mladenovic N (2000) An interior point algorithm for minimum sum-of-squares clustering. SIAM J Sci Comput 21(4):1485–1505
Fathian M, Amiri B, Maroosi A (2007) Application of honey-bee mating optimization algorithm on clustering. Appl Math Comput 190(2):1502–1513
Gallagher M (2000) Multi-layer perceptron error surfaces: visualization, structure and modelling. PhD thesis, Department of Computer Science and Electrical Engineering, University of Queensland
Gallagher M (2014) Clustering problems for more useful benchmarking of optimization algorithms. In: Simulated evolution and learning, (SEAL 2014). Springer, pp 131–142
Hatamlou A (2013) Black hole: a new heuristic optimization approach for data clustering. Inf Sci 222:175–184
Hecht-Nielsen R (1990) Neurocomputing. Addison-Wesley, Reading
Hooker JN (1996) Testing heuristics: we have it all wrong. J Heuristics 1:33–42
Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Sur 31(3):264–323
Kanade PM, Hall LO (2007) Fuzzy ants and clustering. Syst Man Cybern Part A: IEEE Trans Syst Hum 37(5):758–769
Kao Y, Cheng K (2006) An ACO-based clustering algorithm. In: Ant colony optimization and swarm intelligence (ANTS 2006). Springer, Berlin, pp 340–347
Likas A, Vlassis N, Verbeek JJ (2003) The global k-means clustering algorithm. Pattern Recognit 36(2):451–461
Liu R, Shen Z, Jiao L, Zhang W (2010) Immunodominance based clonal selection clustering algorithm. In: 2010 IEEE Congress on Evolutionary Computation (CEC), pp 1–7
Macready W, Wolpert, D (1996) What makes an optimization problem hard? Technical Report. SFI-TR-95-05-046, The Santa Fe Institute
Maulik U, Bandyopadhyay S (2000) Genetic algorithm-based clustering technique. Pattern Recognit 33(9):1455–1465
McGeoch CC (2002) Experimental analysis of optimization algorithms. In: Pardalos PM, Resende M (eds) Handbook of applied optimization, chap 24. Oxford University Press, Oxford, pp 1044–1052
Pena JM, Lozano JA, Larranaga P (1999) An empirical comparison of four initialization methods for the k-means algorithm. Pattern Recognit Lett 20(10):1027–1040
Rardin RL, Uzsoy R (2001) Experimental evaluation of heuristic optimization algorithms: a tutorial. J Heuristics 7:261–304
Salhi S, Gamal MDH (2003) A genetic algorithm based approach for the uncapacitated continuous location-allocation problem. Ann Oper Res 123:230–222
Shelokar P, Jayaraman VK, Kulkarni BD (2004) An ant colony approach for clustering. Anal Chim Acta 509(2):187–195
Steinley D (2006) K-means clustering: a half-century synthesis. Br J Math Stat Psychol 59:1–34
Stephens M (2000) Dealing with label switching in mixture models. J R Stat Soc (B) 62(4):795–809
Taherdangkoo M, Hossein Shirzadi M, Yazdi M, Hadi Bagheri M (2013) A robust clustering method based on blind, naked mole-rats (bnmr) algorithm. Swarm Evolut Comput 10:1–11
Vattani A (2011) k-means requires exponentially many iterations even in the plane. Discret Comput Geom 45(4):596–616
Wolpert DH, Macready WG (1997) No free lunch theorems for optimization. IEEE Trans Evol Comput 1(1):67–82
Xavier AE (2010) The hyperbolic smoothing clustering method. Pattern Recognit 43(3):731–737
Xiang WL, Zhu N, Ma SF, Meng XL, An MQ (2015) A dynamic shuffled differential evolution algorithm for data clustering. Neurocomputing 158:144–154
Xu R, Wunsch D II (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678
Ye F, Chen CY (2005) Alternative kpso-clustering algorithm. Tamkang J Sci Eng 8(2):165
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The author declares that he has no conflict of interest.
Additional information
Communicated by B. Xue and A. G. Chen.
M. Gallagher acknowledges the contribution of the Dagstuhl Theory of Evolutionary Algorithms Seminar 13271 (http://www.dagstuhl.de/13271/) to the work in this paper.
Rights and permissions
About this article
Cite this article
Gallagher, M. Towards improved benchmarking of black-box optimization algorithms using clustering problems. Soft Comput 20, 3835–3849 (2016). https://doi.org/10.1007/s00500-016-2094-1
Issue Date:
DOI: https://doi.org/10.1007/s00500-016-2094-1