Abstract
Clustering is an important task in data mining. It can be formulated as a global optimization problem which is challenging for existing global optimization techniques even in medium size data sets. Various heuristics were developed to solve the clustering problem. The global \(k\)-means and modified global \(k\)-means are among most efficient heuristics for solving the minimum sum-of-squares clustering problem. However, these algorithms are not always accurate in finding global or near global solutions to the clustering problem. In this paper, we introduce a new algorithm to improve the accuracy of the modified global \(k\)-means algorithm in finding global solutions. We use an auxiliary cluster problem to generate a set of initial points and apply the \(k\)-means algorithm starting from these points to find the global solution to the clustering problems. Numerical results on 16 real-world data sets clearly demonstrate the superiority of the proposed algorithm over the global and modified global \(k\)-means algorithms in finding global solutions to clustering problems.
Similar content being viewed by others
References
Al-Sultan, K.S.: A tabu search approach to the clustering problem. Pattern Recognit. 28(9), 1443–1451 (1995)
Al-Sultan, K.S., Khan, M.M.: Computational experience on four algorithms for the hard clustering problem. Pattern Recognit. Lett. 17, 295–308 (1996)
Bagirov, A.M., Rubinov, A.M., Yearwood, J.: A global optimisation approach to classification. Optim. Eng. 3(2), 129–155 (2002)
Bagirov, A.M., Rubinov, A.M., Soukhoroukova, N.V., Yearwood, J.: Supervised and unsupervised data classification via nonsmooth and global optimization. TOP: Span. Oper. Res. J. 11(1), 1–93 (2003)
Bagirov, A.M., Ugon, J.: An algorithm for minimizing clustering functions. Optimization 54(4–5), 351–368 (2005)
Bagirov, A.M., Yearwood, J.: A new nonsmooth optimization algorithm for minimum sum-of-squares clustering problems. Eur. J. Oper. Res. 170, 578–596 (2006)
Bagirov, A.M.: Modified global \(k\)-means algorithm for sum-of-squares clustering problems. Pattern Recognit. 41(10), 3192–3199 (2008)
Bagirov, A.M., Ugon, J., Webb, D.: Fast modified global \(k\)-means algorithm for sum-of-squares clustering problems. Pattern Recognit. 44, 866–876 (2011)
Blake, C., Keogh, E., Merz, C.J.: UCI Repository of Machine Learning Databases [http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science (1998)
Bock, H.H.: Clustering and neural networks. In: Rizzi, A., Vichi, M., Bock, H.H. (eds.) Advances in Data Science and Classification, pp. 265–277. Springer, Berlin (1998)
Brown, D.E., Entail, C.E.: A practical application of simulated annealing to the clustering problem. Pattern Recognit. 25, 401–412 (1992)
Clarke, F.H.: Optimization and Nonsmooth Analysis. Wiley-Interscience, New York (1983)
Demyanov, V.F., Bagirov, A.M., Rubinov, A.M.: A method of truncated codifferential with application to some problems of cluster analysis. J. Glob. Optim. 23(1), 63–80 (2002)
Diehr, G.: Evaluation of a branch and bound algorithm for clustering. SIAM J. Sci. Stat. Comput. 6, 268–284 (1985)
du Merle, O., Hansen, P., Jaumard, B., Mladenovic, N.: An interior point method for minimum sum-of-squares clustering. SIAM J. Sci. Comput. 21, 1485–1505 (2001)
Dubes, R., Jain, A.K.: Clustering techniques: the user’s dilemma. Pattern Recognit. 8, 247–260 (1976)
Fisher, R.A.: The use of multiple measurements in taxonomic problems, Ann. Eugenics, VII part II (1936) pp. 179–188. Reprinted. In: Fisher R.A. Contributions to Mathematical Statistics. Wiley (1950)
Hanjoul, P., Peeters, D.: A comparison of two dual-based procedures for solving the \(p\)-median problem. Eur. J. Oper. Res. 20, 387–396 (1985)
Hansen, P., Jaumard, B.: Cluster analysis and mathematical programming. Math. Program. 79(1–3), 191–215 (1997)
Hansen, P., Mladenovic, N.: \(J\)-means: a new heuristic for minimum sum-of-squares clustering. Pattern Recognit. 4, 405–413 (2001)
Hansen, P., Mladenovic, N.: Variable neighborhood decomposition search. J. Heuristics 7, 335–350 (2001)
Hansen, P., Ngai, E., Cheung, B.K., Mladenovic, N.: Analysis of global \(k\)-means, an incremental heuristic for minimum sum-of-squares clustering. J. Classif. 22(2), 287–310 (2005)
Koontz, W.L.G., Narendra, P.M., Fukunaga, K.: A branch and bound clustering algorithm. IEEE Trans. Comput. 24, 908–915 (1975)
Lai, J.Z.C., Huang, T.-J.: Fast global k-means clustering using cluster membership and inequality. Pattern Recognit. 43(3), 731-737 (2010)
Likas, A., Vlassis, M., Verbeek, J.: The global \(k\)-means clustering algorithm. Pattern Recognit. 36, 451–461 (2003)
Reinelt, G.: TSP-LIB-A Traveling salesman library. ORSA J. Comput. 3, 319–350 (1991)
Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)
Selim, S.Z., Al-Sultan, K.S.: A simulated annealing algorithm for the clustering. Pattern Recognit. 24(10), 1003–1008 (1991)
Sherali, H.D., Desai, J.: A global optimization RLT-based approach for solving the hard clustering problem. J. Glob. Optim. 32, 281–306 (2005)
Spath, H.: Cluster Analysis Algorithms. Ellis Horwood Limited, Chichester (1980)
Sun, L.X., Xie, Y.L., Song, X.H., Wang, J.H., Yu, R.Q.: Cluster analysis by simulated annealing. Comput. Chem. 18, 103–108 (1994)
Tan, M.P., Broach, J.R., Floudas, C.A.: A novel clustering approach and prediction of optimal number of clusters: global optimum search with enhanced positioning. J. Glob. Optim. 39(3), 323–346 (2007)
Xavier, A.E.: The hyperbolic smoothing clustering method. Pattern Recognit. 43(3), 731–737 (2010)
Xavier, A.E., Xavier, V.L.: Solving the minimum sum-of-squares clustering problem by hyperbolic smoothing and partition into boundary and gravitational regions. Pattern Recognit. 44(1), 70–77 (2011)
Acknowledgments
This paper was written during the visit of Dr. Burak Ordin to the University of Ballarat supported by The Scientific and Technological Research Council of Turkey (TUBITAK). The research by Adil M. Bagirov was supported under Australian Research Council’s Discovery Projects funding scheme (Project No. DP140103213). The authors are grateful to two anonymous referees for their criticism and comments which helped the authors to significantly improve the quality of the paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ordin, B., Bagirov, A.M. A heuristic algorithm for solving the minimum sum-of-squares clustering problems. J Glob Optim 61, 341–361 (2015). https://doi.org/10.1007/s10898-014-0171-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10898-014-0171-5