Skip to main content
Log in

A Global Optimization RLT-based Approach for Solving the Hard Clustering Problem

  • Published:
Journal of Global Optimization Aims and scope Submit manuscript

Abstract

The field of cluster analysis is primarily concerned with the sorting of data points into different clusters so as to optimize a certain criterion. Rapid advances in technology have made it possible to address clustering problems via optimization theory. In this paper, we present a global optimization algorithm to solve the hard clustering problem, where each data point is to be assigned to exactly one cluster. The hard clustering problem is formulated as a nonlinear program, for which a tight linear programming relaxation is constructed via the Reformulation-Linearization Technique (RLT) in concert with additional valid inequalities that serve to defeat the inherent symmetry in the problem. This construct is embedded within a specialized branch-and-bound algorithm to solve the problem to global optimality. Pertinent implementation issues that can enhance the efficiency of the branch-and-bound algorithm are also discussed. Computational experience is reported using several standard data sets found in the literature as well as using synthetically generated larger problem instances. The results validate the robustness of the proposed algorithmic procedure and exhibit its dominance over the popular k-means clustering technique. Finally, a heuristic procedure to obtain a good quality solution at a relative ease of computational effort is also described.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. K.S. Al-Sultan M.M. Khan (1996) ArticleTitleComputational experience on four algorithms for the hard clustering problem Pattern Recognition Letters 17 295–308 Occurrence Handle10.1016/0167-8655(95)00122-0

    Article  Google Scholar 

  2. E. Balas (1988) ArticleTitleOn the convex hull of the union of certain polyhedra Operations Research Letters 7 IssueID6 279–283 Occurrence Handle10.1016/0167-6377(88)90058-2

    Article  Google Scholar 

  3. Bhuyan J.N., Raghavan V.V., Elayavalli V.K. (1991). Genetic algorithm for clustering with an ordered representation. Proceedings of the Fourth International Conference on Genetic Algorithms. San Diego CA

  4. Chazelle, B. (1991). An optimal convex hull algorithm and new results on cuttings, Annual Symposium on Foundations of Computer Science, 29–38

  5. R.C. Dubes (1987) ArticleTitleHow many clusters are best? – an experiment Pattern Recognition 20 645–663 Occurrence Handle10.1016/0031-3203(87)90034-3

    Article  Google Scholar 

  6. Forgy, E.W. (1966). Cluster analysis of multivariate data: efficiency versus interpretability of classification, Biometric Society Meetings, Riverside, CA, Abstract in Biometrics 21, 768

  7. M. Groetschel Y. Wakabayashi (1989) ArticleTitleCutting plane algorithm for a clustering problem Mathematical Programming Series B 45 IssueID1 59–96 Occurrence Handle10.1007/BF01589097

    Article  Google Scholar 

  8. J.A. Hartigan (1975) Clustering Algorithms John Wiley and Sons New York, N.Y

    Google Scholar 

  9. R.E. Jensen (1969) ArticleTitleA dynamic programming algorithm for cluster analysis Operations Research 17 1034–1057

    Google Scholar 

  10. Y Jung H Park Z. Du B.L. Drake (2003) ArticleTitleA decision criterion for the optimal number of clusters in hierarchical clustering Journal of Global Optimization 25 91–111 Occurrence Handle10.1023/A:1021394316112 Occurrence HandleMR1969429

    Article  MathSciNet  Google Scholar 

  11. A. Klaapper (1987) ArticleTitleLower bound on the complexity of the convex hull problem for simply polyhedra Information Processing Letters 25 IssueID3 159–161 Occurrence Handle10.1016/0020-0190(87)90126-8

    Article  Google Scholar 

  12. R.W. Klein R.C. Dubes (1989) ArticleTitleExperiments in projection and clustering by simulated annealing Pattern Recognition 22 213–220 Occurrence Handle10.1016/0031-3203(89)90067-8

    Article  Google Scholar 

  13. W.L. Koontz P.M. Narendra K. Fukunaga (1975) ArticleTitleA branch-and-bound clustering algorithm IEEE Transactions on Computing 23 908–914

    Google Scholar 

  14. Krovi, R. (1992). Genetic algorithm for clustering: a preliminary investigation, Proceedings of the 25th Hawaii International Conference on Systems Sciences, pp. 540–544

  15. Y Leung J. Zhang Z. Xu (1997) ArticleTitleNeural networks for convex hull computation IEEE Transactions on Neural Networks 8 IssueID3 601–611 Occurrence Handle10.1109/72.572099

    Article  Google Scholar 

  16. A.V. Lukashin R. Fuchs (2000) ArticleTitleAnalysis of temporal gene expression profiles: clustering by simulated annealing and determining the optimal number of clusters Bioinformatics 17 IssueID5 405–414 Occurrence Handle10.1093/bioinformatics/17.5.405

    Article  Google Scholar 

  17. U. Manber (1989) Introduction to Algorithms: A Creative Approach Adison-Wesley Publishing Company Reading, MA

    Google Scholar 

  18. P Mangiameli K.S. Chen D. West (1996) ArticleTitleA comparison of SOM neural network and hierarchical clustering methods European Journal of Operations Research 93 402–417 Occurrence Handle10.1016/0377-2217(96)00038-0

    Article  Google Scholar 

  19. McQueen, J.B. (1967). Some methods of classification and analysis of multivariate observations, Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, University of California Press, Berkeley CA, pp. 281–197

  20. J.M. Mulvey H.P. Crowder (1979) ArticleTitleCluster analysis: An application of lagrangian relaxation Management Science 25 IssueID4 329–340

    Google Scholar 

  21. M.R. Rao (1971) ArticleTitleCluster analysis and mathematical programming Journal of American Statistical Association 66 622–626

    Google Scholar 

  22. N.V. Sahinidis (1996) ArticleTitleBARON: A general purpose global optimization software package Journal of Global Optimization 8 201–205 Occurrence HandleMR1376505

    MathSciNet  Google Scholar 

  23. Sahinidis, N.V. (1999–2000). BARON: Branch-and-Reduce Optimization Navigator, User’s Manual, Version 4.0, Available for download at http://archimides.scs.uiuc.edu/baron.html

  24. Selim S.Z. (1982). A global algorithm for the clustering problem, Presentation at the ORSA/TIMS Joint Meeting, San Diego, CA

  25. S.Z. Selim K.S. Al-Sultan (1991) ArticleTitleA simulated annealing algorithm for the hard clustering problem Pattern Recognition 24 1003–1008 Occurrence Handle10.1016/0031-3203(91)90097-O

    Article  Google Scholar 

  26. H.D. Sherali W.P. Adams (1990) ArticleTitleA hierarchy of relaxations between the continuous and convex hull representations for zero-one programming problems SIAM Journal on Discrete Mathematics 3 IssueID3 411–430 Occurrence Handle10.1137/0403036

    Article  Google Scholar 

  27. H.D. Sherali W.P. Adams (1994) ArticleTitleA hierarchy of relaxations and convex hull characterizations for mixed-integer zero-one programming problems Discrete Applied Mathematics 52 83–106 Occurrence Handle10.1016/0166-218X(92)00190-W

    Article  Google Scholar 

  28. Sherali H.D., Adams W.P. (1999). Reformulation-linearization techniques for discrete optimization problems. In: Du D.Z., Pardalos P.M. (ed.). Handbook of Combinatorial Optimization 1, Kluwer Academic Publishers, pp. 479–532

  29. H.D. Sherali J.C. Smith (2001) ArticleTitleImproving discrete model representations via symmetry considerations Management Science 47 IssueID10 1396–1407 Occurrence Handle10.1287/mnsc.47.10.1396.10265

    Article  Google Scholar 

  30. H.D. Sherali C. Tuncbilek (1992) ArticleTitleA global optimization algorithm for polynomial programming problems using a reformulation-linearization technique Journal of Global Optimization 2 101–112 Occurrence Handle10.1007/BF00121304

    Article  Google Scholar 

  31. H.D. Sherali C. Tuncbilek (1995) ArticleTitleA reformulation-convexification approach for solving nonconvex quadratic programming problems Journal of Global Optimization 7 1–31 Occurrence Handle10.1007/BF01100203

    Article  Google Scholar 

  32. H. Späth (1980) Cluster Analysis Algorithms for Data Reduction and Classification of Objects John Wiley and Sons New York, NY

    Google Scholar 

  33. M Sultan D.A. Wigle C.A. Cumbaa M Maziarz J Glasgow M.S. Tsao I. Jurisica (2002) ArticleTitleBinary tree-structured vector quantization approach to clustering and visualizing microarray data Bioinformatics 18 IssueID1 111–119

    Google Scholar 

  34. H.D. Vinod (1969) ArticleTitleInteger programming and the theory of grouping Journal of American Statistical Society 64 506–519

    Google Scholar 

  35. J.H. Ward SuffixJr. (1963) ArticleTitleHierarchical grouping to optimize an objective function Journal of American Statistical Society 58 236–244

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hanif D. Sherali.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sherali, H.D., Desai, J. A Global Optimization RLT-based Approach for Solving the Hard Clustering Problem. J Glob Optim 32, 281–306 (2005). https://doi.org/10.1007/s10898-004-2706-7

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10898-004-2706-7

Keywords

Navigation