Skip to main content
Log in

Improved metaheuristics for the quartet method of hierarchical clustering

  • Published:
Journal of Global Optimization Aims and scope Submit manuscript

Abstract

The quartet method is a novel hierarchical clustering approach where, given a set of n data objects and their pairwise dissimilarities, the aim is to construct an optimal tree from the total number of possible combinations of quartet topologies on n, where optimality means that the sum of the dissimilarities of the embedded (or consistent) quartet topologies is minimal. This corresponds to an NP-hard combinatorial optimization problem, also referred to as minimum quartet tree cost (MQTC) problem. We provide details and formulation of this challenging problem, and propose a basic greedy heuristic that is characterized by some appealing insights and findings for speeding up and simplifying the processes of solution generation and evaluation, such as the use of adjacency-like matrices to represent the topology structures of candidate solutions; fast calculation of coefficients and weights of the solution matrices; shortcuts in the enumeration of all solution permutations for a given configuration; and an iterative distance matrix reduction procedure, which greedily merges together highly connected objects which may bring lower values of the quartet cost function in a given partial solution. It will be shown that this basic greedy heuristic is able to improve consistently the performance of popular quartet clustering algorithms in the literature, namely a reduced variable neighbourhood search and a simulated annealing metaheuristic, producing novel efficient solution approaches to the MQTC problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. MQTC problem instances: https://sites.google.com/site/quartetmethod/home/datasets.

References

  1. Aarts, E., Korst, J.: Simulated Annealing and Boltzmann Machines: A Stochastic Approach to Combinatorial Optimization and Neural Computing. Wiley, Chichester (1988)

    MATH  Google Scholar 

  2. Aarts, E., Korst, J., Michiels, W.: Simulated annealing. In: Burke, E.K., Kendall, G. (eds.) Search Methodologies: Introductory Tutorials in Optimization and Decision Support Techniques, pp. 187–210. Springer, Berlin (2005)

    Chapter  Google Scholar 

  3. Ben-Dor, A., Chor, B., Graur, D., Ophir, R., Pelleg, D.: Constructing phylogenies from quartets: elucidation of eutherian superordinal relationships. J. Comput. Biol. 5(3), 377–390 (1998)

    Article  Google Scholar 

  4. Berry, V., Jiang, T., Kearney, P., Li, M., Wareham, T.: Quartet cleaning: improved algorithms and simulations. In: Voigt, H.M., Ebeling, W., Rechenberg, I., Schwefel, H.P. (eds.) Algorithms—Proceedings 7th European Symposium on Algorithms (ESA’99), Lecture Notes in Computer Science, vol. 1643, pp. 313–324. Springer, Berlin (1999)

  5. Cilibrasi, R.: The Complearn toolkit (2007). http://www.complearn.org/

  6. Cilibrasi, R., Vitányi, P.M.B.: Clustering by compression. IEEE Trans. Inf. Theory 51(4), 1523–1545 (2005)

    Article  MathSciNet  Google Scholar 

  7. Cilibrasi, R., Vitányi, P.M.B.: The google similarity distance. IEEE Trans. Knowl. Data Eng. 19(3), 370–383 (2007)

    Article  Google Scholar 

  8. Cilibrasi, R., Vitányi, P.M.B.: A fast quartet tree heuristic for hierarchical clustering. Pattern Recognit. 44(3), 662–677 (2011)

    Article  Google Scholar 

  9. Cilibrasi, R., Vitányi, P.M.B., de Wolf, R.: Algorithmic clustering of music based on string compression. Comput. Music J. 28(4), 49–67 (2004)

    Article  Google Scholar 

  10. Consoli, S., Darby-Dowman, K., Geleijnse, G., Korst, J., Pauws, S.: Heuristic approaches for the quartet method of hierarchical clustering. IEEE Trans. Knowl. Data Eng. 22(10), 1428–1443 (2010)

    Article  Google Scholar 

  11. Consoli, S., Korst, J., Pauws, S., Geleijnse, G.: An exact algorithm for the minimum quartet tree cost problem. 4OR Q. J. Oper. Res. 17(4), 401–425 (2019). https://doi.org/10.1007/s10288-018-0394-2

  12. Consoli, S., Korst, J., Pauws, S., Geleijnse, G.: Improved variable neighbourhood search heuristic for quartet clustering. In: Sifaleras, A., Salhi, S., Brimberg, J. (eds.) Proceedings 6th International Conference on Variable Neighborhood Search (ICVNS 2018), Lecture Notes in Computer Science, vol. 11328, pp. 1–12. Springer, Berlin (2019)

  13. Consoli, S., Stilianakis, N.I.: A quartet method based on variable neighborhood search for biomedical literature extraction and clustering. Int. Trans. Oper. Res. 24(3), 537–558 (2017)

    Article  MathSciNet  Google Scholar 

  14. Davidović, T.: Scheduling heuristic for dense task graphs. Yugosl. J. Oper. Res. 10, 113–136 (2000)

    MATH  Google Scholar 

  15. Demśar, J.: Statistical comparison of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)

    MathSciNet  MATH  Google Scholar 

  16. Diestel, R.: Graph Theory. Springer, New York (2000)

    MATH  Google Scholar 

  17. Felsenstein, J.: Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evolut. 17(6), 368–376 (1981)

    Article  Google Scholar 

  18. Friedman, M.: A comparison of alternative tests of significance for the problem of m rankings. Ann. Math. Stat. 11, 86–92 (1940)

    Article  MathSciNet  Google Scholar 

  19. Furnas, G.W.: The generation of random, binary unordered trees. J. Classif. 1(1), 187–233 (1984)

    Article  MathSciNet  Google Scholar 

  20. Geleijnse, G., Korst, J., de Boer, V.: Instance classification using co-occurrences on the web. In: Proceedings of the ISWC 2006 Workshop on Web Content Mining (WebConMine). Athens, GA (2006). http://www.dse.nl/~gijsg/webconmine.pdf

  21. Glover, F.: Future paths for integer programming and links to artificial intelligence. Comput. Oper. Res. 13, 533–549 (1986)

    Article  MathSciNet  Google Scholar 

  22. Granados, A., Cebrian, M., Camacho, D., Rodriguez, F.B.: Reducing the loss of information through annealing text distortion. IEEE Trans. Knowl. Data Eng. 23(7), 1090–1102 (2011)

    Article  Google Scholar 

  23. Hansen, P., Mladenović, N.: Variable neighborhood search. In: Marti, R., Pardalos, P.M., Resende, M.G.C. (eds.) Handbook of Heuristics, Chap. 15, pp. 759–787. Springer Nature, Berlin (2018)

    Chapter  Google Scholar 

  24. Hansen, P., Mladenović, N., Perez-Brito, D.: Variable neighborhood decomposition search. J. Heurist. 7, 335–350 (2001)

    Article  Google Scholar 

  25. Jiang, T., Kearney, P., Li, M.: A polynomial time approximation scheme for inferring evolutionary trees from quartet topologies and its application. SIAM J. Comput. 30(6), 1942–1961 (2000)

    Article  MathSciNet  Google Scholar 

  26. Kirkpatrick, S., Gelatt, C.D., Vecchi, M.P.: Optimization by simulated annealing. Science 220(4598), 671–680 (1983)

    Article  MathSciNet  Google Scholar 

  27. Li, M., Vitányi, P.M.B.: An Introduction to Kolmogorov Complexity and Its Applications, 2nd edn. Springer, New York (1997)

    Book  Google Scholar 

  28. Mladenović, N., Petrović, J., Kovačević-Vujčić, V., Čangalović, M.: Solving spread spectrum radar polyphase code design problem by tabu search and variable neighbourhood search. Eur. J. Oper. Res. 151(2), 389–399 (2003)

    Article  MathSciNet  Google Scholar 

  29. Nemenyi, P.B.: Distribution-free multiple comparisons. Ph.D. thesis, Princeton University, NJ (1963)

  30. Pei, J., Darzić, Z., Drazić, M., Mladenović, N., Pardalos, P.: Continuous variable neighborhood search (C-VNS) for solving systems of nonlinear equations. INFORMS J. Comput. 31, 235–250 (2019)

    Article  MathSciNet  Google Scholar 

  31. Rokas, A., Williams, B.L., King, N., Carroll, S.B.: Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature 425(6960), 798–804 (2003)

    Article  Google Scholar 

  32. Steel, M.A.: The complexity of reconstructiong trees from qualitative characters and subtrees. J. Classif. 9, 91–116 (1992)

    Article  Google Scholar 

  33. Strimmer, K., von Haeseler, A.: Quartet puzzling: a quartet maximum-likelihood method for reconstructing tree topologies. Mol. Biol. Evolut. 13(7), 964–969 (1996)

    Article  Google Scholar 

  34. Weyer-Menkhoff, J., Devauchelle, C., Grossmann, A., Grünewald, S.: Integer linear programming as a tool for constructing trees from quartet data. Comput. Biol. Chem. 29(3), 196–203 (2005)

    Article  Google Scholar 

  35. Whittaker, R.: A fast algorithm for the greedy interchange for large-scale clustering and median location problems. INFOR 21, 95–108 (1983)

    Google Scholar 

Download references

Acknowledgements

The author Dr. Sergio Consoli wants to dedicate this work with deepest respect to the memory of Professor Kenneth Darby-Dowman, a great scientist, an excellent manager, the best supervisor, a wonderful person, a real friend.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sergio Consoli.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Consoli, S., Korst, J., Pauws, S. et al. Improved metaheuristics for the quartet method of hierarchical clustering. J Glob Optim 78, 241–270 (2020). https://doi.org/10.1007/s10898-019-00871-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10898-019-00871-1

Keywords

Navigation