Skip to main content
Log in

A heuristic algorithm for solving the minimum sum-of-squares clustering problems

  • Published:
Journal of Global Optimization Aims and scope Submit manuscript

Abstract

Clustering is an important task in data mining. It can be formulated as a global optimization problem which is challenging for existing global optimization techniques even in medium size data sets. Various heuristics were developed to solve the clustering problem. The global \(k\)-means and modified global \(k\)-means are among most efficient heuristics for solving the minimum sum-of-squares clustering problem. However, these algorithms are not always accurate in finding global or near global solutions to the clustering problem. In this paper, we introduce a new algorithm to improve the accuracy of the modified global \(k\)-means algorithm in finding global solutions. We use an auxiliary cluster problem to generate a set of initial points and apply the \(k\)-means algorithm starting from these points to find the global solution to the clustering problems. Numerical results on 16 real-world data sets clearly demonstrate the superiority of the proposed algorithm over the global and modified global \(k\)-means algorithms in finding global solutions to clustering problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Al-Sultan, K.S.: A tabu search approach to the clustering problem. Pattern Recognit. 28(9), 1443–1451 (1995)

    Article  Google Scholar 

  2. Al-Sultan, K.S., Khan, M.M.: Computational experience on four algorithms for the hard clustering problem. Pattern Recognit. Lett. 17, 295–308 (1996)

    Article  Google Scholar 

  3. Bagirov, A.M., Rubinov, A.M., Yearwood, J.: A global optimisation approach to classification. Optim. Eng. 3(2), 129–155 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  4. Bagirov, A.M., Rubinov, A.M., Soukhoroukova, N.V., Yearwood, J.: Supervised and unsupervised data classification via nonsmooth and global optimization. TOP: Span. Oper. Res. J. 11(1), 1–93 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  5. Bagirov, A.M., Ugon, J.: An algorithm for minimizing clustering functions. Optimization 54(4–5), 351–368 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  6. Bagirov, A.M., Yearwood, J.: A new nonsmooth optimization algorithm for minimum sum-of-squares clustering problems. Eur. J. Oper. Res. 170, 578–596 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  7. Bagirov, A.M.: Modified global \(k\)-means algorithm for sum-of-squares clustering problems. Pattern Recognit. 41(10), 3192–3199 (2008)

    Article  MATH  Google Scholar 

  8. Bagirov, A.M., Ugon, J., Webb, D.: Fast modified global \(k\)-means algorithm for sum-of-squares clustering problems. Pattern Recognit. 44, 866–876 (2011)

    Article  MATH  Google Scholar 

  9. Blake, C., Keogh, E., Merz, C.J.: UCI Repository of Machine Learning Databases [http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science (1998)

  10. Bock, H.H.: Clustering and neural networks. In: Rizzi, A., Vichi, M., Bock, H.H. (eds.) Advances in Data Science and Classification, pp. 265–277. Springer, Berlin (1998)

    Chapter  Google Scholar 

  11. Brown, D.E., Entail, C.E.: A practical application of simulated annealing to the clustering problem. Pattern Recognit. 25, 401–412 (1992)

    Article  Google Scholar 

  12. Clarke, F.H.: Optimization and Nonsmooth Analysis. Wiley-Interscience, New York (1983)

    MATH  Google Scholar 

  13. Demyanov, V.F., Bagirov, A.M., Rubinov, A.M.: A method of truncated codifferential with application to some problems of cluster analysis. J. Glob. Optim. 23(1), 63–80 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  14. Diehr, G.: Evaluation of a branch and bound algorithm for clustering. SIAM J. Sci. Stat. Comput. 6, 268–284 (1985)

    Article  MATH  Google Scholar 

  15. du Merle, O., Hansen, P., Jaumard, B., Mladenovic, N.: An interior point method for minimum sum-of-squares clustering. SIAM J. Sci. Comput. 21, 1485–1505 (2001)

    Google Scholar 

  16. Dubes, R., Jain, A.K.: Clustering techniques: the user’s dilemma. Pattern Recognit. 8, 247–260 (1976)

    Article  Google Scholar 

  17. Fisher, R.A.: The use of multiple measurements in taxonomic problems, Ann. Eugenics, VII part II (1936) pp. 179–188. Reprinted. In: Fisher R.A. Contributions to Mathematical Statistics. Wiley (1950)

  18. Hanjoul, P., Peeters, D.: A comparison of two dual-based procedures for solving the \(p\)-median problem. Eur. J. Oper. Res. 20, 387–396 (1985)

    Article  MATH  MathSciNet  Google Scholar 

  19. Hansen, P., Jaumard, B.: Cluster analysis and mathematical programming. Math. Program. 79(1–3), 191–215 (1997)

    MATH  MathSciNet  Google Scholar 

  20. Hansen, P., Mladenovic, N.: \(J\)-means: a new heuristic for minimum sum-of-squares clustering. Pattern Recognit. 4, 405–413 (2001)

    Article  Google Scholar 

  21. Hansen, P., Mladenovic, N.: Variable neighborhood decomposition search. J. Heuristics 7, 335–350 (2001)

    Article  MATH  Google Scholar 

  22. Hansen, P., Ngai, E., Cheung, B.K., Mladenovic, N.: Analysis of global \(k\)-means, an incremental heuristic for minimum sum-of-squares clustering. J. Classif. 22(2), 287–310 (2005)

    Article  MathSciNet  Google Scholar 

  23. Koontz, W.L.G., Narendra, P.M., Fukunaga, K.: A branch and bound clustering algorithm. IEEE Trans. Comput. 24, 908–915 (1975)

    Article  MATH  MathSciNet  Google Scholar 

  24. Lai, J.Z.C., Huang, T.-J.: Fast global k-means clustering using cluster membership and inequality. Pattern Recognit. 43(3), 731-737 (2010)

    Google Scholar 

  25. Likas, A., Vlassis, M., Verbeek, J.: The global \(k\)-means clustering algorithm. Pattern Recognit. 36, 451–461 (2003)

    Article  Google Scholar 

  26. Reinelt, G.: TSP-LIB-A Traveling salesman library. ORSA J. Comput. 3, 319–350 (1991)

    Article  Google Scholar 

  27. Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)

    MATH  Google Scholar 

  28. Selim, S.Z., Al-Sultan, K.S.: A simulated annealing algorithm for the clustering. Pattern Recognit. 24(10), 1003–1008 (1991)

    Article  MathSciNet  Google Scholar 

  29. Sherali, H.D., Desai, J.: A global optimization RLT-based approach for solving the hard clustering problem. J. Glob. Optim. 32, 281–306 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  30. Spath, H.: Cluster Analysis Algorithms. Ellis Horwood Limited, Chichester (1980)

    Google Scholar 

  31. Sun, L.X., Xie, Y.L., Song, X.H., Wang, J.H., Yu, R.Q.: Cluster analysis by simulated annealing. Comput. Chem. 18, 103–108 (1994)

    Article  MATH  Google Scholar 

  32. Tan, M.P., Broach, J.R., Floudas, C.A.: A novel clustering approach and prediction of optimal number of clusters: global optimum search with enhanced positioning. J. Glob. Optim. 39(3), 323–346 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  33. Xavier, A.E.: The hyperbolic smoothing clustering method. Pattern Recognit. 43(3), 731–737 (2010)

    Article  MATH  Google Scholar 

  34. Xavier, A.E., Xavier, V.L.: Solving the minimum sum-of-squares clustering problem by hyperbolic smoothing and partition into boundary and gravitational regions. Pattern Recognit. 44(1), 70–77 (2011)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Acknowledgments

This paper was written during the visit of Dr. Burak Ordin to the University of Ballarat supported by The Scientific and Technological Research Council of Turkey (TUBITAK). The research by Adil M. Bagirov was supported under Australian Research Council’s Discovery Projects funding scheme (Project No. DP140103213). The authors are grateful to two anonymous referees for their criticism and comments which helped the authors to significantly improve the quality of the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Burak Ordin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ordin, B., Bagirov, A.M. A heuristic algorithm for solving the minimum sum-of-squares clustering problems. J Glob Optim 61, 341–361 (2015). https://doi.org/10.1007/s10898-014-0171-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10898-014-0171-5

Keywords

Navigation