Skip to main content
Log in

Mathematical modeling and efficient optimization methods for the distance-dependent rearrangement clustering problem

  • Published:
Journal of Global Optimization Aims and scope Submit manuscript

Abstract

In this article we present a computational study for solving the distance-dependent rearrangement clustering problem using mixed-integer linear programming (MILP). To address sparse data sets, we present an objective function for evaluating the pair-wise interactions between two elements as a function of the distance between them in the final ordering. The physical permutations of the rows and columns of the data matrix can be modeled using mixed-integer linear programming and we present three models based on (1) the relative ordering of elements, (2) the assignment of elements to a final position, and (3) the assignment of a distance between a pair of elements. These models can be augmented with the use of cutting planes and heuristic methods to increase computational efficiency. The performance of the models is compared for three distinct re-ordering problems corresponding to glass transition temperature data for polymers and two drug inhibition data matrices. The results of the comparative study suggest that the assignment model is the most effective for identifying the optimal re-ordering of rows and columns of sparse data matrices.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Anderberg M.R.: Cluster Analysis for Applications. Academic Press, New York (1973)

    Google Scholar 

  2. Jain A.K., Flynn P.J.: Image segmentation using clustering. In: Ahuja, N., Bowyer, K. (eds) Advances in Image Understanding: A Festschrift for Azriel Rosenfeld, pp. 65–83. IEEE Press, Piscataway (1996)

    Google Scholar 

  3. Salton G.: Developments in automatic text retrieval. Science 253, 974–980 (1991)

    Article  Google Scholar 

  4. Eisen M.B., Spellman P.T., Brown P.O., Botstein D.: Cluster analysis and display of genome-wide expression patterns. Proc. Natl Acad. Sci. 95, 14863–14868 (1998)

    Article  Google Scholar 

  5. Zhang Y., Skolnick J.: SPICKER: A clustering approach to identify near-native protein folds. J. Comput. Chem. 25, 865–871 (2004)

    Article  Google Scholar 

  6. Mönnigmann M., Floudas C.A.: Protein loop structure prediction with flexible stem geometries. Protein Struct. Funct. Bioinform. 61, 748–762 (2005)

    Article  Google Scholar 

  7. Hartigan J.A., Wong M.A.: Algorithm AS 136: a K-means clustering algorithm. Appl. Stat. 28, 100–108 (1979)

    Article  Google Scholar 

  8. Edwards A.W.F., Cavalli-Sforza L.L.: A method for cluster analysis. Biometrics 21, 362–375 (1965)

    Article  Google Scholar 

  9. Wolfe J.H.: Pattern clustering by multivariate mixture analysis. Multivariate Behav. Res. 5, 329–350 (1970)

    Article  Google Scholar 

  10. Jain A.K., Mao J.: Artificial neural networks: a tutorial. IEEE Comput. 29, 31–44 (1996)

    Google Scholar 

  11. Klein R.W., Dubes R.C.: Experiments in projection and clustering by simulated annealing. Pattern Recognit. 22, 213–220 (1989)

    Article  Google Scholar 

  12. Raghavan, V.V., Birchand, K.: A clustering strategy based on a formalism of the reproductive process in a natural system. In: Proceedings of the Second International Conference on Information Storage and Retrieval, pp. 10–22. Dallas, Texas (1979)

  13. Bhuyan, J.N., Raghavan, V.V., Venkatesh, K.E.: Genetic algorithm for clustering with an ordered representation. In: Proceedings of the Fourth International Conference on Genetic Algorithms, pp. 408–415. San Mateo, California (1991)

  14. Tan M.P., Broach J.R., Floudas C.A.: A novel clustering approach and prediction of optimal number of clusters: global optimum search with enhanced positioning. J. Glob. Optim. 39(3), 323–346 (2007)

    Article  Google Scholar 

  15. Tan M.P., Broach J.R., Floudas C.A.: Evaluation of normalization and pre-clustering issues in a novel clustering approach: global optimum search with enhanced positioning. J. Bioin. Comp. Bio. 5(4), 895–913 (2007)

    Article  Google Scholar 

  16. Tan M.P., Smith E., Broach J.R., Floudas C.A.: Microarray data mining: a novel optimization-based approach to uncover biologically coherent structures. BMC Bioinform. 9, 268–283 (2008)

    Article  Google Scholar 

  17. Jain A.K., Murty M.N., Flynn P.J.: Data clustering: a review. ACM Comput. Surv. 31, 264–323 (1999)

    Article  Google Scholar 

  18. McCormick W.T., Schweitzer P.J., White T.W.: Problem decomposition and data reorganization by a clustering technique. Oper. Res. 20, 993–1009 (1972)

    Article  Google Scholar 

  19. Lenstra J.K.: Clustering a data array and the traveling salesman problem. Oper. Res. 22, 413–414 (1974)

    Article  Google Scholar 

  20. Lenstra J.K., Rinnooy Kan A.H.G.: Some simple applications of the traveling salesman problem. Oper. Res. Q. 26, 717–733 (1975)

    Article  Google Scholar 

  21. Alpert C.J., Kahng A.B.: Splitting an ordering into a partition to minimize diameter. J. Classif. 14, 51–74 (1997)

    Article  Google Scholar 

  22. Climer S., Zhang W.: Rearrangement clustering: pitfalls, remedies, and applications. J. Mach. Learn. 7, 919–943 (2006)

    Google Scholar 

  23. DiMaggio, P.A., McAllister, S.R., Floudas, C.A., Feng, X.J., Rabinowitz, J.D., Rabitz, H.A.: A network flow model for biclustering via optimal re-ordering of data matrices. J. Glob. Optim. (2009, in press)

  24. DiMaggio P.A., McAllister S.R., Floudas C.A., Feng X.J., Rabinowitz J.D., Rabitz H.A.: Biclustering via optimal re-ordering of data matrices in systems biology: rigourous methods and comparative studies. BMC Bioinform. 9, 458 (2008)

    Article  Google Scholar 

  25. Troyanskaya O., Cantor M., Sherlock G., Brown P., Hastie T., Tibshirani R., Botstein D., Altman R.B.: Missing value estimation methods for DNA microarrays. Bioinformatics 17, 520–525 (2001)

    Article  Google Scholar 

  26. Koopmans T.C., Beckmann M.J.: Assignment problems and the location of economic activities. Econometrica 25, 53–76 (1957)

    Article  Google Scholar 

  27. Pardalos, P.M., Rendl, F., Wolkowicz, H.: The quadratic assignment problem: a survey. In: Pardalos, P.M., Wolkowicz, H. (eds.) Quadratic Assignment and Related Problems, vol. 16 of DIMACS Series in Discrete Mathematics and Theoretical Computer Science, pp. 1–42. AMS, Rhode Island (1994)

  28. Anstreicher K., Brixius N., Goux J.P., Linderoth J.: Solving large quadratic assignment problems on computational grids. Math. Progr. 91(3), 563–588 (2002)

    Article  Google Scholar 

  29. Loiola E.M., de de Abreu N.M.M., Boaventura-Netto P.O., Hahn P., Querido T.: A survey for the quadratic assignment problem. Eur. J. Oper. Res. 176, 657–690 (2007)

    Article  Google Scholar 

  30. Adams W.P., Guignard M., Hahn P.M., Hightower W.L.: A level-2 reformulation-linearization technique bound for the quadratic assignment problem. Eur. J. Oper. Res. 180, 983–996 (2007)

    Article  Google Scholar 

  31. Singh S.P., Sharma R.R.K.: A review of different approaches to the facility layout problems. Int. J. Adv. Manuf. Technol. 30, 425–433 (2006)

    Article  Google Scholar 

  32. Reynolds C.H.: Designing diversed and focused combinatorial libraries of synthetic polymers. J. Comb. Chem. 1(4), 297–306 (1999)

    Article  Google Scholar 

  33. Floudas C.A., Grossmann I.E.: Synthesis of flexible heat exchanger networks with uncertain flowrates and temperatures. Comp. Chem. Eng. 11(4), 319–336 (1987)

    Article  Google Scholar 

  34. Ciric A.R., Floudas C.A.: A retrofit approach for heat-exchanger networks. Comp. Chem. Eng. 13(6), 703–715 (1989)

    Article  Google Scholar 

  35. Floudas C.A., Anastasiadis S.H.: Synthesis of distillation sequences with several multicomponent feed and product streams. Chem. Eng. Sci. 43(9), 2407–2419 (1988)

    Article  Google Scholar 

  36. Kokossis A.C., Floudas C.A.: Optimization of complex reactor networks-II: nonisothermal operation. Chem. Eng. Sci. 49(7), 1037–1051 (1994)

    Article  Google Scholar 

  37. Aggarwal A., Floudas C.A.: Synthesis of general separation sequences—nonsharp separations. Comp. Chem. Eng. 14(6), 631–653 (1990)

    Article  Google Scholar 

  38. CPLEX: ILOG CPLEX 9.1 User’s Manual (2005)

  39. McAllister S.R., Feng X.-J., DiMaggio P.A. Jr., Floudas C.A., Rabinowitz J.D., Rabitz H.: Descriptor-free molecular discovery in large libraries by adaptive substituent reordering. Bioorg. Med. Chem. Lett. 18, 5967–5970 (2008)

    Article  Google Scholar 

  40. DiMaggio, P.A., McAllister, S.R., Floudas, C.A., Feng, X.J., Rabinowitz, J.D., Rabitz, H.A.: Enhancing molecular discovery using descriptor-free rearrangement clustering techniques for sparse data sets (submitted for publication)

  41. Shenvi N., Geremia J.M., Rabitz H.: Substituent ordering and interpolation in molecular library optimization. J. Phys. Chem. 107, 2066–2074 (2003)

    Google Scholar 

  42. Burkard R.E., Karisch S.E., Rendl F.: QAPLIB—a quadratic assignment problem libary. J. Glob. Optim. 10(4), 391–403 (1997)

    Article  Google Scholar 

  43. Gilmore P.C.: Optimal and suboptimal algorithms for the quadratic assignment problem. SIAM J. Appl. Math. 10, 305–313 (1962)

    Article  Google Scholar 

  44. Androulakis I.P., Maranas C.D., Floudas C.A.: Prediction of oligopeptide conformations via deterministic global optimization. J. Glob. Optim. 11, 1–34 (1997)

    Article  Google Scholar 

  45. Klepeis J.L., Floudas C.A.: Free energy calculations for peptides via deterministic global optimization. J. Chem. Phys. 110, 7491–7512 (1999)

    Article  Google Scholar 

  46. Klepeis J.L., Floudas C.A., Morikis D., Lambris J.D.: Predicting peptide structures using NMR data and deterministic global optimization. J. Comp. Chem. 20(13), 1354–1370 (1999)

    Article  Google Scholar 

  47. Klepeis J.L., Floudas C.A.: Ab initio tertiary structure prediction of proteins. J. Glob. Optim. 25, 113–140 (2003)

    Article  Google Scholar 

  48. Klepeis J.L., Floudas C.A.: ASTRO-FOLD: a combinatorial and global optimization framework for ab initio prediction of three-dimensional structures of proteins from the amino acid sequence. Biophys. J. 85, 2119–2146 (2003)

    Article  Google Scholar 

  49. Klepeis J.L., Floudas C.A., Morikis D., Tsokos C.G., Argyropoulos E., Spruce L., Lambris J.D.: Integrated computational and experimenal approach for lead optimization and design of compstatin variants with improved activity. J. Am. Chem. Soc. 125(28), 8422–8423 (2003)

    Article  Google Scholar 

  50. Fung H.K., Floudas C.A., Taylor M.S., Zhang L., Morikis D.: Towards full sequence de novo protein design with flexible templates for human beta-defensin-2. Biophys. J. 94, 584–599 (2008)

    Article  Google Scholar 

  51. Lin X., Floudas C.A.: Design, synthesis and scheduling of multipurpose batch plants via an effective continuous-time formulation. Comp. Chem. Eng. 25, 665–674 (2001)

    Article  Google Scholar 

  52. Janak S.L., Lin X., Floudas C.A.: Enhanced continuous-time unit-specific event based formulation for short-term scheduling of multipurpose batch processes: Resource constraints and mixed storage policies. Ind. Eng. Chem. Res. 43, 2516–2533 (2004)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christodoulos A. Floudas.

Rights and permissions

Reprints and permissions

About this article

Cite this article

McAllister, S.R., DiMaggio, P.A. & Floudas, C.A. Mathematical modeling and efficient optimization methods for the distance-dependent rearrangement clustering problem. J Glob Optim 45, 111–129 (2009). https://doi.org/10.1007/s10898-008-9393-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10898-008-9393-8

Keywords

Navigation