Abstract
The quartet method is a novel hierarchical clustering approach where, given a set of n data objects and their pairwise dissimilarities, the aim is to construct an optimal tree from the total number of possible combinations of quartet topologies on n, where optimality means that the sum of the dissimilarities of the embedded (or consistent) quartet topologies is minimal. This corresponds to an NP-hard combinatorial optimization problem, also referred to as minimum quartet tree cost (MQTC) problem. We provide details and formulation of this challenging problem, and propose a basic greedy heuristic that is characterized by some appealing insights and findings for speeding up and simplifying the processes of solution generation and evaluation, such as the use of adjacency-like matrices to represent the topology structures of candidate solutions; fast calculation of coefficients and weights of the solution matrices; shortcuts in the enumeration of all solution permutations for a given configuration; and an iterative distance matrix reduction procedure, which greedily merges together highly connected objects which may bring lower values of the quartet cost function in a given partial solution. It will be shown that this basic greedy heuristic is able to improve consistently the performance of popular quartet clustering algorithms in the literature, namely a reduced variable neighbourhood search and a simulated annealing metaheuristic, producing novel efficient solution approaches to the MQTC problem.
Similar content being viewed by others
Notes
MQTC problem instances: https://sites.google.com/site/quartetmethod/home/datasets.
References
Aarts, E., Korst, J.: Simulated Annealing and Boltzmann Machines: A Stochastic Approach to Combinatorial Optimization and Neural Computing. Wiley, Chichester (1988)
Aarts, E., Korst, J., Michiels, W.: Simulated annealing. In: Burke, E.K., Kendall, G. (eds.) Search Methodologies: Introductory Tutorials in Optimization and Decision Support Techniques, pp. 187–210. Springer, Berlin (2005)
Ben-Dor, A., Chor, B., Graur, D., Ophir, R., Pelleg, D.: Constructing phylogenies from quartets: elucidation of eutherian superordinal relationships. J. Comput. Biol. 5(3), 377–390 (1998)
Berry, V., Jiang, T., Kearney, P., Li, M., Wareham, T.: Quartet cleaning: improved algorithms and simulations. In: Voigt, H.M., Ebeling, W., Rechenberg, I., Schwefel, H.P. (eds.) Algorithms—Proceedings 7th European Symposium on Algorithms (ESA’99), Lecture Notes in Computer Science, vol. 1643, pp. 313–324. Springer, Berlin (1999)
Cilibrasi, R.: The Complearn toolkit (2007). http://www.complearn.org/
Cilibrasi, R., Vitányi, P.M.B.: Clustering by compression. IEEE Trans. Inf. Theory 51(4), 1523–1545 (2005)
Cilibrasi, R., Vitányi, P.M.B.: The google similarity distance. IEEE Trans. Knowl. Data Eng. 19(3), 370–383 (2007)
Cilibrasi, R., Vitányi, P.M.B.: A fast quartet tree heuristic for hierarchical clustering. Pattern Recognit. 44(3), 662–677 (2011)
Cilibrasi, R., Vitányi, P.M.B., de Wolf, R.: Algorithmic clustering of music based on string compression. Comput. Music J. 28(4), 49–67 (2004)
Consoli, S., Darby-Dowman, K., Geleijnse, G., Korst, J., Pauws, S.: Heuristic approaches for the quartet method of hierarchical clustering. IEEE Trans. Knowl. Data Eng. 22(10), 1428–1443 (2010)
Consoli, S., Korst, J., Pauws, S., Geleijnse, G.: An exact algorithm for the minimum quartet tree cost problem. 4OR Q. J. Oper. Res. 17(4), 401–425 (2019). https://doi.org/10.1007/s10288-018-0394-2
Consoli, S., Korst, J., Pauws, S., Geleijnse, G.: Improved variable neighbourhood search heuristic for quartet clustering. In: Sifaleras, A., Salhi, S., Brimberg, J. (eds.) Proceedings 6th International Conference on Variable Neighborhood Search (ICVNS 2018), Lecture Notes in Computer Science, vol. 11328, pp. 1–12. Springer, Berlin (2019)
Consoli, S., Stilianakis, N.I.: A quartet method based on variable neighborhood search for biomedical literature extraction and clustering. Int. Trans. Oper. Res. 24(3), 537–558 (2017)
Davidović, T.: Scheduling heuristic for dense task graphs. Yugosl. J. Oper. Res. 10, 113–136 (2000)
Demśar, J.: Statistical comparison of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
Diestel, R.: Graph Theory. Springer, New York (2000)
Felsenstein, J.: Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evolut. 17(6), 368–376 (1981)
Friedman, M.: A comparison of alternative tests of significance for the problem of m rankings. Ann. Math. Stat. 11, 86–92 (1940)
Furnas, G.W.: The generation of random, binary unordered trees. J. Classif. 1(1), 187–233 (1984)
Geleijnse, G., Korst, J., de Boer, V.: Instance classification using co-occurrences on the web. In: Proceedings of the ISWC 2006 Workshop on Web Content Mining (WebConMine). Athens, GA (2006). http://www.dse.nl/~gijsg/webconmine.pdf
Glover, F.: Future paths for integer programming and links to artificial intelligence. Comput. Oper. Res. 13, 533–549 (1986)
Granados, A., Cebrian, M., Camacho, D., Rodriguez, F.B.: Reducing the loss of information through annealing text distortion. IEEE Trans. Knowl. Data Eng. 23(7), 1090–1102 (2011)
Hansen, P., Mladenović, N.: Variable neighborhood search. In: Marti, R., Pardalos, P.M., Resende, M.G.C. (eds.) Handbook of Heuristics, Chap. 15, pp. 759–787. Springer Nature, Berlin (2018)
Hansen, P., Mladenović, N., Perez-Brito, D.: Variable neighborhood decomposition search. J. Heurist. 7, 335–350 (2001)
Jiang, T., Kearney, P., Li, M.: A polynomial time approximation scheme for inferring evolutionary trees from quartet topologies and its application. SIAM J. Comput. 30(6), 1942–1961 (2000)
Kirkpatrick, S., Gelatt, C.D., Vecchi, M.P.: Optimization by simulated annealing. Science 220(4598), 671–680 (1983)
Li, M., Vitányi, P.M.B.: An Introduction to Kolmogorov Complexity and Its Applications, 2nd edn. Springer, New York (1997)
Mladenović, N., Petrović, J., Kovačević-Vujčić, V., Čangalović, M.: Solving spread spectrum radar polyphase code design problem by tabu search and variable neighbourhood search. Eur. J. Oper. Res. 151(2), 389–399 (2003)
Nemenyi, P.B.: Distribution-free multiple comparisons. Ph.D. thesis, Princeton University, NJ (1963)
Pei, J., Darzić, Z., Drazić, M., Mladenović, N., Pardalos, P.: Continuous variable neighborhood search (C-VNS) for solving systems of nonlinear equations. INFORMS J. Comput. 31, 235–250 (2019)
Rokas, A., Williams, B.L., King, N., Carroll, S.B.: Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature 425(6960), 798–804 (2003)
Steel, M.A.: The complexity of reconstructiong trees from qualitative characters and subtrees. J. Classif. 9, 91–116 (1992)
Strimmer, K., von Haeseler, A.: Quartet puzzling: a quartet maximum-likelihood method for reconstructing tree topologies. Mol. Biol. Evolut. 13(7), 964–969 (1996)
Weyer-Menkhoff, J., Devauchelle, C., Grossmann, A., Grünewald, S.: Integer linear programming as a tool for constructing trees from quartet data. Comput. Biol. Chem. 29(3), 196–203 (2005)
Whittaker, R.: A fast algorithm for the greedy interchange for large-scale clustering and median location problems. INFOR 21, 95–108 (1983)
Acknowledgements
The author Dr. Sergio Consoli wants to dedicate this work with deepest respect to the memory of Professor Kenneth Darby-Dowman, a great scientist, an excellent manager, the best supervisor, a wonderful person, a real friend.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Consoli, S., Korst, J., Pauws, S. et al. Improved metaheuristics for the quartet method of hierarchical clustering. J Glob Optim 78, 241–270 (2020). https://doi.org/10.1007/s10898-019-00871-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10898-019-00871-1