Improved metaheuristics for the quartet method of hierarchical clustering

Consoli, Sergio; Korst, Jan; Pauws, Steffen; Geleijnse, Gijs

doi:10.1007/s10898-019-00871-1

Improved metaheuristics for the quartet method of hierarchical clustering

Published: 13 January 2020

Volume 78, pages 241–270, (2020)
Cite this article

Journal of Global Optimization Aims and scope Submit manuscript

Sergio Consoli ORCID: orcid.org/0000-0001-7357-5858¹,
Jan Korst¹,
Steffen Pauws^1,2 &
…
Gijs Geleijnse³

258 Accesses
4 Citations
Explore all metrics

Abstract

The quartet method is a novel hierarchical clustering approach where, given a set of n data objects and their pairwise dissimilarities, the aim is to construct an optimal tree from the total number of possible combinations of quartet topologies on n, where optimality means that the sum of the dissimilarities of the embedded (or consistent) quartet topologies is minimal. This corresponds to an NP-hard combinatorial optimization problem, also referred to as minimum quartet tree cost (MQTC) problem. We provide details and formulation of this challenging problem, and propose a basic greedy heuristic that is characterized by some appealing insights and findings for speeding up and simplifying the processes of solution generation and evaluation, such as the use of adjacency-like matrices to represent the topology structures of candidate solutions; fast calculation of coefficients and weights of the solution matrices; shortcuts in the enumeration of all solution permutations for a given configuration; and an iterative distance matrix reduction procedure, which greedily merges together highly connected objects which may bring lower values of the quartet cost function in a given partial solution. It will be shown that this basic greedy heuristic is able to improve consistently the performance of popular quartet clustering algorithms in the literature, namely a reduced variable neighbourhood search and a simulated annealing metaheuristic, producing novel efficient solution approaches to the MQTC problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Particle swarm optimization algorithm: an overview

Article 17 January 2017

Ant Colony Optimization: Overview and Recent Advances

Ant Colony Optimization: A Component-Wise Overview

Notes

MQTC problem instances: https://sites.google.com/site/quartetmethod/home/datasets.

References

Aarts, E., Korst, J.: Simulated Annealing and Boltzmann Machines: A Stochastic Approach to Combinatorial Optimization and Neural Computing. Wiley, Chichester (1988)
MATH Google Scholar
Aarts, E., Korst, J., Michiels, W.: Simulated annealing. In: Burke, E.K., Kendall, G. (eds.) Search Methodologies: Introductory Tutorials in Optimization and Decision Support Techniques, pp. 187–210. Springer, Berlin (2005)
Chapter Google Scholar
Ben-Dor, A., Chor, B., Graur, D., Ophir, R., Pelleg, D.: Constructing phylogenies from quartets: elucidation of eutherian superordinal relationships. J. Comput. Biol. 5(3), 377–390 (1998)
Article Google Scholar
Berry, V., Jiang, T., Kearney, P., Li, M., Wareham, T.: Quartet cleaning: improved algorithms and simulations. In: Voigt, H.M., Ebeling, W., Rechenberg, I., Schwefel, H.P. (eds.) Algorithms—Proceedings 7th European Symposium on Algorithms (ESA’99), Lecture Notes in Computer Science, vol. 1643, pp. 313–324. Springer, Berlin (1999)
Cilibrasi, R.: The Complearn toolkit (2007). http://www.complearn.org/
Cilibrasi, R., Vitányi, P.M.B.: Clustering by compression. IEEE Trans. Inf. Theory 51(4), 1523–1545 (2005)
Article MathSciNet Google Scholar
Cilibrasi, R., Vitányi, P.M.B.: The google similarity distance. IEEE Trans. Knowl. Data Eng. 19(3), 370–383 (2007)
Article Google Scholar
Cilibrasi, R., Vitányi, P.M.B.: A fast quartet tree heuristic for hierarchical clustering. Pattern Recognit. 44(3), 662–677 (2011)
Article Google Scholar
Cilibrasi, R., Vitányi, P.M.B., de Wolf, R.: Algorithmic clustering of music based on string compression. Comput. Music J. 28(4), 49–67 (2004)
Article Google Scholar
Consoli, S., Darby-Dowman, K., Geleijnse, G., Korst, J., Pauws, S.: Heuristic approaches for the quartet method of hierarchical clustering. IEEE Trans. Knowl. Data Eng. 22(10), 1428–1443 (2010)
Article Google Scholar
Consoli, S., Korst, J., Pauws, S., Geleijnse, G.: An exact algorithm for the minimum quartet tree cost problem. 4OR Q. J. Oper. Res. 17(4), 401–425 (2019). https://doi.org/10.1007/s10288-018-0394-2
Consoli, S., Korst, J., Pauws, S., Geleijnse, G.: Improved variable neighbourhood search heuristic for quartet clustering. In: Sifaleras, A., Salhi, S., Brimberg, J. (eds.) Proceedings 6th International Conference on Variable Neighborhood Search (ICVNS 2018), Lecture Notes in Computer Science, vol. 11328, pp. 1–12. Springer, Berlin (2019)
Consoli, S., Stilianakis, N.I.: A quartet method based on variable neighborhood search for biomedical literature extraction and clustering. Int. Trans. Oper. Res. 24(3), 537–558 (2017)
Article MathSciNet Google Scholar
Davidović, T.: Scheduling heuristic for dense task graphs. Yugosl. J. Oper. Res. 10, 113–136 (2000)
MATH Google Scholar
Demśar, J.: Statistical comparison of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
MathSciNet MATH Google Scholar
Diestel, R.: Graph Theory. Springer, New York (2000)
MATH Google Scholar
Felsenstein, J.: Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evolut. 17(6), 368–376 (1981)
Article Google Scholar
Friedman, M.: A comparison of alternative tests of significance for the problem of m rankings. Ann. Math. Stat. 11, 86–92 (1940)
Article MathSciNet Google Scholar
Furnas, G.W.: The generation of random, binary unordered trees. J. Classif. 1(1), 187–233 (1984)
Article MathSciNet Google Scholar
Geleijnse, G., Korst, J., de Boer, V.: Instance classification using co-occurrences on the web. In: Proceedings of the ISWC 2006 Workshop on Web Content Mining (WebConMine). Athens, GA (2006). http://www.dse.nl/~gijsg/webconmine.pdf
Glover, F.: Future paths for integer programming and links to artificial intelligence. Comput. Oper. Res. 13, 533–549 (1986)
Article MathSciNet Google Scholar
Granados, A., Cebrian, M., Camacho, D., Rodriguez, F.B.: Reducing the loss of information through annealing text distortion. IEEE Trans. Knowl. Data Eng. 23(7), 1090–1102 (2011)
Article Google Scholar
Hansen, P., Mladenović, N.: Variable neighborhood search. In: Marti, R., Pardalos, P.M., Resende, M.G.C. (eds.) Handbook of Heuristics, Chap. 15, pp. 759–787. Springer Nature, Berlin (2018)
Chapter Google Scholar
Hansen, P., Mladenović, N., Perez-Brito, D.: Variable neighborhood decomposition search. J. Heurist. 7, 335–350 (2001)
Article Google Scholar
Jiang, T., Kearney, P., Li, M.: A polynomial time approximation scheme for inferring evolutionary trees from quartet topologies and its application. SIAM J. Comput. 30(6), 1942–1961 (2000)
Article MathSciNet Google Scholar
Kirkpatrick, S., Gelatt, C.D., Vecchi, M.P.: Optimization by simulated annealing. Science 220(4598), 671–680 (1983)
Article MathSciNet Google Scholar
Li, M., Vitányi, P.M.B.: An Introduction to Kolmogorov Complexity and Its Applications, 2nd edn. Springer, New York (1997)
Book Google Scholar
Mladenović, N., Petrović, J., Kovačević-Vujčić, V., Čangalović, M.: Solving spread spectrum radar polyphase code design problem by tabu search and variable neighbourhood search. Eur. J. Oper. Res. 151(2), 389–399 (2003)
Article MathSciNet Google Scholar
Nemenyi, P.B.: Distribution-free multiple comparisons. Ph.D. thesis, Princeton University, NJ (1963)
Pei, J., Darzić, Z., Drazić, M., Mladenović, N., Pardalos, P.: Continuous variable neighborhood search (C-VNS) for solving systems of nonlinear equations. INFORMS J. Comput. 31, 235–250 (2019)
Article MathSciNet Google Scholar
Rokas, A., Williams, B.L., King, N., Carroll, S.B.: Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature 425(6960), 798–804 (2003)
Article Google Scholar
Steel, M.A.: The complexity of reconstructiong trees from qualitative characters and subtrees. J. Classif. 9, 91–116 (1992)
Article Google Scholar
Strimmer, K., von Haeseler, A.: Quartet puzzling: a quartet maximum-likelihood method for reconstructing tree topologies. Mol. Biol. Evolut. 13(7), 964–969 (1996)
Article Google Scholar
Weyer-Menkhoff, J., Devauchelle, C., Grossmann, A., Grünewald, S.: Integer linear programming as a tool for constructing trees from quartet data. Comput. Biol. Chem. 29(3), 196–203 (2005)
Article Google Scholar
Whittaker, R.: A fast algorithm for the greedy interchange for large-scale clustering and median location problems. INFOR 21, 95–108 (1983)
Google Scholar

Download references

Acknowledgements

The author Dr. Sergio Consoli wants to dedicate this work with deepest respect to the memory of Professor Kenneth Darby-Dowman, a great scientist, an excellent manager, the best supervisor, a wonderful person, a real friend.

Author information

Authors and Affiliations

Philips Research, High Tech Campus 34, 5656 AE, Eindhoven, The Netherlands
Sergio Consoli, Jan Korst & Steffen Pauws
TiCC, Tilburg University, Warandelaan 2, 5037 AB, Tilburg, The Netherlands
Steffen Pauws
Netherlands Comprehensive Cancer Organisation (IKNL), Zernikestraat 29, 5612 HZ, Eindhoven, The Netherlands
Gijs Geleijnse

Authors

Sergio Consoli
View author publications
You can also search for this author in PubMed Google Scholar
Jan Korst
View author publications
You can also search for this author in PubMed Google Scholar
Steffen Pauws
View author publications
You can also search for this author in PubMed Google Scholar
Gijs Geleijnse
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sergio Consoli.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Consoli, S., Korst, J., Pauws, S. et al. Improved metaheuristics for the quartet method of hierarchical clustering. J Glob Optim 78, 241–270 (2020). https://doi.org/10.1007/s10898-019-00871-1

Download citation

Received: 27 December 2018
Accepted: 30 December 2019
Published: 13 January 2020
Issue Date: October 2020
DOI: https://doi.org/10.1007/s10898-019-00871-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improved metaheuristics for the quartet method of hierarchical clustering

Abstract

Access this article

Similar content being viewed by others

Particle swarm optimization algorithm: an overview

Ant Colony Optimization: Overview and Recent Advances

Ant Colony Optimization: A Component-Wise Overview

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improved metaheuristics for the quartet method of hierarchical clustering

Abstract

Access this article

Similar content being viewed by others

Particle swarm optimization algorithm: an overview

Ant Colony Optimization: Overview and Recent Advances

Ant Colony Optimization: A Component-Wise Overview

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation