Abstract
In this article we present a computational study for solving the distance-dependent rearrangement clustering problem using mixed-integer linear programming (MILP). To address sparse data sets, we present an objective function for evaluating the pair-wise interactions between two elements as a function of the distance between them in the final ordering. The physical permutations of the rows and columns of the data matrix can be modeled using mixed-integer linear programming and we present three models based on (1) the relative ordering of elements, (2) the assignment of elements to a final position, and (3) the assignment of a distance between a pair of elements. These models can be augmented with the use of cutting planes and heuristic methods to increase computational efficiency. The performance of the models is compared for three distinct re-ordering problems corresponding to glass transition temperature data for polymers and two drug inhibition data matrices. The results of the comparative study suggest that the assignment model is the most effective for identifying the optimal re-ordering of rows and columns of sparse data matrices.
Similar content being viewed by others
References
Anderberg M.R.: Cluster Analysis for Applications. Academic Press, New York (1973)
Jain A.K., Flynn P.J.: Image segmentation using clustering. In: Ahuja, N., Bowyer, K. (eds) Advances in Image Understanding: A Festschrift for Azriel Rosenfeld, pp. 65–83. IEEE Press, Piscataway (1996)
Salton G.: Developments in automatic text retrieval. Science 253, 974–980 (1991)
Eisen M.B., Spellman P.T., Brown P.O., Botstein D.: Cluster analysis and display of genome-wide expression patterns. Proc. Natl Acad. Sci. 95, 14863–14868 (1998)
Zhang Y., Skolnick J.: SPICKER: A clustering approach to identify near-native protein folds. J. Comput. Chem. 25, 865–871 (2004)
Mönnigmann M., Floudas C.A.: Protein loop structure prediction with flexible stem geometries. Protein Struct. Funct. Bioinform. 61, 748–762 (2005)
Hartigan J.A., Wong M.A.: Algorithm AS 136: a K-means clustering algorithm. Appl. Stat. 28, 100–108 (1979)
Edwards A.W.F., Cavalli-Sforza L.L.: A method for cluster analysis. Biometrics 21, 362–375 (1965)
Wolfe J.H.: Pattern clustering by multivariate mixture analysis. Multivariate Behav. Res. 5, 329–350 (1970)
Jain A.K., Mao J.: Artificial neural networks: a tutorial. IEEE Comput. 29, 31–44 (1996)
Klein R.W., Dubes R.C.: Experiments in projection and clustering by simulated annealing. Pattern Recognit. 22, 213–220 (1989)
Raghavan, V.V., Birchand, K.: A clustering strategy based on a formalism of the reproductive process in a natural system. In: Proceedings of the Second International Conference on Information Storage and Retrieval, pp. 10–22. Dallas, Texas (1979)
Bhuyan, J.N., Raghavan, V.V., Venkatesh, K.E.: Genetic algorithm for clustering with an ordered representation. In: Proceedings of the Fourth International Conference on Genetic Algorithms, pp. 408–415. San Mateo, California (1991)
Tan M.P., Broach J.R., Floudas C.A.: A novel clustering approach and prediction of optimal number of clusters: global optimum search with enhanced positioning. J. Glob. Optim. 39(3), 323–346 (2007)
Tan M.P., Broach J.R., Floudas C.A.: Evaluation of normalization and pre-clustering issues in a novel clustering approach: global optimum search with enhanced positioning. J. Bioin. Comp. Bio. 5(4), 895–913 (2007)
Tan M.P., Smith E., Broach J.R., Floudas C.A.: Microarray data mining: a novel optimization-based approach to uncover biologically coherent structures. BMC Bioinform. 9, 268–283 (2008)
Jain A.K., Murty M.N., Flynn P.J.: Data clustering: a review. ACM Comput. Surv. 31, 264–323 (1999)
McCormick W.T., Schweitzer P.J., White T.W.: Problem decomposition and data reorganization by a clustering technique. Oper. Res. 20, 993–1009 (1972)
Lenstra J.K.: Clustering a data array and the traveling salesman problem. Oper. Res. 22, 413–414 (1974)
Lenstra J.K., Rinnooy Kan A.H.G.: Some simple applications of the traveling salesman problem. Oper. Res. Q. 26, 717–733 (1975)
Alpert C.J., Kahng A.B.: Splitting an ordering into a partition to minimize diameter. J. Classif. 14, 51–74 (1997)
Climer S., Zhang W.: Rearrangement clustering: pitfalls, remedies, and applications. J. Mach. Learn. 7, 919–943 (2006)
DiMaggio, P.A., McAllister, S.R., Floudas, C.A., Feng, X.J., Rabinowitz, J.D., Rabitz, H.A.: A network flow model for biclustering via optimal re-ordering of data matrices. J. Glob. Optim. (2009, in press)
DiMaggio P.A., McAllister S.R., Floudas C.A., Feng X.J., Rabinowitz J.D., Rabitz H.A.: Biclustering via optimal re-ordering of data matrices in systems biology: rigourous methods and comparative studies. BMC Bioinform. 9, 458 (2008)
Troyanskaya O., Cantor M., Sherlock G., Brown P., Hastie T., Tibshirani R., Botstein D., Altman R.B.: Missing value estimation methods for DNA microarrays. Bioinformatics 17, 520–525 (2001)
Koopmans T.C., Beckmann M.J.: Assignment problems and the location of economic activities. Econometrica 25, 53–76 (1957)
Pardalos, P.M., Rendl, F., Wolkowicz, H.: The quadratic assignment problem: a survey. In: Pardalos, P.M., Wolkowicz, H. (eds.) Quadratic Assignment and Related Problems, vol. 16 of DIMACS Series in Discrete Mathematics and Theoretical Computer Science, pp. 1–42. AMS, Rhode Island (1994)
Anstreicher K., Brixius N., Goux J.P., Linderoth J.: Solving large quadratic assignment problems on computational grids. Math. Progr. 91(3), 563–588 (2002)
Loiola E.M., de de Abreu N.M.M., Boaventura-Netto P.O., Hahn P., Querido T.: A survey for the quadratic assignment problem. Eur. J. Oper. Res. 176, 657–690 (2007)
Adams W.P., Guignard M., Hahn P.M., Hightower W.L.: A level-2 reformulation-linearization technique bound for the quadratic assignment problem. Eur. J. Oper. Res. 180, 983–996 (2007)
Singh S.P., Sharma R.R.K.: A review of different approaches to the facility layout problems. Int. J. Adv. Manuf. Technol. 30, 425–433 (2006)
Reynolds C.H.: Designing diversed and focused combinatorial libraries of synthetic polymers. J. Comb. Chem. 1(4), 297–306 (1999)
Floudas C.A., Grossmann I.E.: Synthesis of flexible heat exchanger networks with uncertain flowrates and temperatures. Comp. Chem. Eng. 11(4), 319–336 (1987)
Ciric A.R., Floudas C.A.: A retrofit approach for heat-exchanger networks. Comp. Chem. Eng. 13(6), 703–715 (1989)
Floudas C.A., Anastasiadis S.H.: Synthesis of distillation sequences with several multicomponent feed and product streams. Chem. Eng. Sci. 43(9), 2407–2419 (1988)
Kokossis A.C., Floudas C.A.: Optimization of complex reactor networks-II: nonisothermal operation. Chem. Eng. Sci. 49(7), 1037–1051 (1994)
Aggarwal A., Floudas C.A.: Synthesis of general separation sequences—nonsharp separations. Comp. Chem. Eng. 14(6), 631–653 (1990)
CPLEX: ILOG CPLEX 9.1 User’s Manual (2005)
McAllister S.R., Feng X.-J., DiMaggio P.A. Jr., Floudas C.A., Rabinowitz J.D., Rabitz H.: Descriptor-free molecular discovery in large libraries by adaptive substituent reordering. Bioorg. Med. Chem. Lett. 18, 5967–5970 (2008)
DiMaggio, P.A., McAllister, S.R., Floudas, C.A., Feng, X.J., Rabinowitz, J.D., Rabitz, H.A.: Enhancing molecular discovery using descriptor-free rearrangement clustering techniques for sparse data sets (submitted for publication)
Shenvi N., Geremia J.M., Rabitz H.: Substituent ordering and interpolation in molecular library optimization. J. Phys. Chem. 107, 2066–2074 (2003)
Burkard R.E., Karisch S.E., Rendl F.: QAPLIB—a quadratic assignment problem libary. J. Glob. Optim. 10(4), 391–403 (1997)
Gilmore P.C.: Optimal and suboptimal algorithms for the quadratic assignment problem. SIAM J. Appl. Math. 10, 305–313 (1962)
Androulakis I.P., Maranas C.D., Floudas C.A.: Prediction of oligopeptide conformations via deterministic global optimization. J. Glob. Optim. 11, 1–34 (1997)
Klepeis J.L., Floudas C.A.: Free energy calculations for peptides via deterministic global optimization. J. Chem. Phys. 110, 7491–7512 (1999)
Klepeis J.L., Floudas C.A., Morikis D., Lambris J.D.: Predicting peptide structures using NMR data and deterministic global optimization. J. Comp. Chem. 20(13), 1354–1370 (1999)
Klepeis J.L., Floudas C.A.: Ab initio tertiary structure prediction of proteins. J. Glob. Optim. 25, 113–140 (2003)
Klepeis J.L., Floudas C.A.: ASTRO-FOLD: a combinatorial and global optimization framework for ab initio prediction of three-dimensional structures of proteins from the amino acid sequence. Biophys. J. 85, 2119–2146 (2003)
Klepeis J.L., Floudas C.A., Morikis D., Tsokos C.G., Argyropoulos E., Spruce L., Lambris J.D.: Integrated computational and experimenal approach for lead optimization and design of compstatin variants with improved activity. J. Am. Chem. Soc. 125(28), 8422–8423 (2003)
Fung H.K., Floudas C.A., Taylor M.S., Zhang L., Morikis D.: Towards full sequence de novo protein design with flexible templates for human beta-defensin-2. Biophys. J. 94, 584–599 (2008)
Lin X., Floudas C.A.: Design, synthesis and scheduling of multipurpose batch plants via an effective continuous-time formulation. Comp. Chem. Eng. 25, 665–674 (2001)
Janak S.L., Lin X., Floudas C.A.: Enhanced continuous-time unit-specific event based formulation for short-term scheduling of multipurpose batch processes: Resource constraints and mixed storage policies. Ind. Eng. Chem. Res. 43, 2516–2533 (2004)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
McAllister, S.R., DiMaggio, P.A. & Floudas, C.A. Mathematical modeling and efficient optimization methods for the distance-dependent rearrangement clustering problem. J Glob Optim 45, 111–129 (2009). https://doi.org/10.1007/s10898-008-9393-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10898-008-9393-8