Abstract
The benefits of experimental algorithmics and algorithm engineering need to be extended to applications in the computational sciences. In this paper, we present on one such application: the reconstruction of evolutionary histories (phylogenies) from molecular data such as DNA sequences. Our presentation is not a survey of past and current work in the area, but rather a discussion of what we see as some of the important challenges in experimental algorithmics that arise from computational phylogenetics. As motivational examples or examples of possible approaches, we briefly discuss two specific uses of algorithm engineering and of experimental algorithmics from our recent research. The first such use focused on speed: we reimplemented Sanko. and Blanchette’s breakpoint analysis and obtained a 200, 000-fold speedup for serial code and 108-fold speedup on a 512-processor supercluster. We report here on the techniques used in obtaining such a speedup. The second use focused on experimentation: we conducted an extensive study of quartet-based reconstruction algorithms within a parameter-rich simulation space, using several hundred CPU-years of computation. We report here on the challenges involved in designing, conducting, and assessing such a study.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
L. Arge, J. Chase, J. S. Vitter, and R. Wickremesinghe. Efficient sorting using registers and caches. In Proceedings of the 4th Workshop on Algorithm Engineering (WAE’00). Springer Lecture Notes in Computer Science 1982, 2000.
D. A. Bader and B. M. E. Moret. GRAPPA runs in record time. HPC Wire, 9(47), 2000.
V. Berry, D. Bryant, T. Jiang, P. Kearney, M. Li, T. Wareham, and H. Zhang. A practical algorithm for recovering the best supported edges of an evolutionary tree. In Proceedings of the 11th ACM/SIAM Symposium on Discrete Algorithms (SODA’00), pages 287–296, 2000.
V. Berryand O. Gascuel. Inferring evolutionary trees with strong combinatorial evidence. Theoretical Computer Science, 240(2):271–298, 2000.
V. Berry, T. Jiang, P. Kearney, M. Li, and T. Wareham. Quartet cleaning: improved algorithms and simulations. In Proceedings of the 7th European Symposium on Algorithms (ESA’99). Springer Lecture Notes in Computer Science 1643, pages 313–324, 1999.
M. Blanchette, G. Bourque, and D. Sanko.. Breakpoint phylogenies. In S. Miyano and T. Takagi, editors, Genome Informatics 1997, pages 25–34. Univ. Academy Press, Tokyo, 1997.
A. Caprara. On the practical solution of the reversal median problem. In Proceedings of the 1st Workshop on Algorithms for Bioinformatics (WABI’01). Springer Lecture Notes in Computer Science 2149, pages 238–251, 2001.
J. I. Cohen. Epstein-barr virus infection. New England Journal of Medicine, 343(7):481–492, 2000.
M. E. Cosner, R. K. Jansen, B. M. E. Moret, L. A. Raubeson, L.-S. Wang, T. Warnow, and S. K. Wyman. An empirical comparison of phylogenetic methods on chloroplast gene order data in Campanulaceae. In D. Sanko. and J. Nadeau, editors, Comparative Genomics: Empirical and Analytical Approaches to Gene Order Dynamics, Map Alignment, and the Evolution of Gene Families, pages 99–121. Kluwer, 2000.
M. E. Cosner, R. K. Jansen, B. M. E. Moret, L. A. Raubeson, L. Wang, T. Warnow, and S. K. Wyman. A new fast heuristic for computing the breakpoint phylogeny and experimental phylogenetic analyses of real and synthetic data. In Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology (ISMB’00), pages 104–115, 2000.
N. Eiron, M. Rodeh, and I. Stewarts. Matrix multiplication: a case study of enhanced data cache utilization. ACM Journal of Experimental Algorithmics, 4(3), 1999. Online at http://www.jea.acm.org/1999/EironMatrix/.
P. Erdős, M. A. Steel, L. A. Székely, and T. Warnow. A few logs suffice to build (almost) all trees I. Random Structures and Algorithms, 14:153–184, 1997.
D. Huson, S. Nettles, K. Rice, T. Warnow, and S. Yooseph. Hybrid tree reconstruction methods. ACM Journal of Experimental Algorithmics, 4(5), 1999. Online at http://www.jea.acm.org/1999/HusonHybrid/.
T. Jiang, P. E. Kearney, and M. Li. A polynomial-time approximation scheme for inferring evolutionary trees from quartet topologies and its application. SIAM Journal on Computing. To appear.
D. S. Johnson and L. A. McGeoch. The traveling salesman problem: a case study. In E. Aarts and J.K. Lenstra, editors, Local Search in Combinatorial Optimization, pages 215–310. John Wiley, 1997.
T. H. Jukes and C. Cantor. Mammalian Protein Metabolism. Academic Press, 1969.
P. J. Keeling, M. A. Luker, and J. D. Palmer. Evidence from beta-tubulin phylogeny that microsporidia evolved from within the Fungi. Molecular Biology and Evolution, 17:23–31, 2000.
R. Ladner, J. D. Fix, and A. LaMarca. The cache performance of traversals and random accesses. In Proceedings of the 10th ACM/SIAM Symposium on Discrete Algorithms (SODA’99), pages 613–622, 1999.
A. LaMarca and R. Ladner. The influence of caches on the performance of heaps. ACM Journal of Experimental Algorithmics, 1(4), 1996. Online at http://www.jea.acm.org/1996/LaMarcaInfluence/.
A. LaMarca and R. Ladner. The influence of caches on the performance of sorting. In Proceedings of the 8th ACM/SIAM Symposium on Discrete Algorithms (SODA’97), pages 370–379, 1997.
C. C. McGeoch. Analyzing algorithms by simulation: variance reduction techniques and simulation speedups. ACM Computing Surveys, 24:195–212, 1992.
B. Mishof, C. L. Anderson, and H. Hadrys. A phylogeny of the damselfly genus Calopteryx (Odonata) using mitochondrial 16s rDNA markers. Molecular Phylogeny Evolution, 15:5–14, 2000.
B. M. E. Moret, D. A. Bader, and T. Warnow. High-performance algorithm engineering for computational phylogenetics. In Proceedings of the 2001 International Conference on Computational Science (ICCS’01). Springer Lecture Notes in Computer Science 2073–2074, 2001.
B. M. E. Moret, S. K. Wyman, D. A. Bader, T. Warnow, and M. Yan. A new implementation and detailed studyof breakpoint analysis. In Proceedings of the 6th Pacific Symposium Biocomputing (PSB’01). World Scientific, pages 583–594, 2001.
B. M. E. Moret and H. D. Shapiro. Algorithms and experiments: the new (and old) methodology. Journal on Universal Computer Science, 7(5):434–446, 2001.
B. M. E. Moret, J. Tang, L.-S. Wang, and T. Warnow. Steps toward accurate reconstruction of phylogenies from gene-order data. Journal on Computer and System Sciences. To appear.
I. Pe'er and R. Shamir. The median problems for breakpoints are NPcomplete. Electronic Colloqium on Computational Complexity, 71, 1998.
A. Rambaut and N. C. Grassly. Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Computational Applications in Biosciences, 13:235–238, 1997.
K. Rice, M. Donoghue, and R. Olmstead. Analyzing large datasets: rbcl500 revisited. System Biology, 46:554–562, 1997.
F. Rodrigues-Trelles, L. Alarcon, and A. Fontdevila. Molecular evolution and phylogeny of the buzzatii complex (D. repleta group): a maximum likelihood approach. Molecular Biology Evolution, 17:1112–1122, 2000.
A. Rokas and P. W. H. Holland. Rare genomic changes as a tool for phylogenetics. Trends in Ecology and Evolution, 15:454–459, 2000.
N. Saitou and M. Nei. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Molecular Biology Evolotion, 4:406–425, 1987.
D. Sanko. and M. Blanchette. Multiple genome rearrangement and breakpoint phylogeny. Journal on Computational Biology, 5:555–570, 1998.
A. C. Siepel and B. M. E. Moret. Finding an optimal inversion median: experimental results. In Proceedings of the 1st Workshop on Algorithms for Bioinformatics (WABI’01). Springer Lecture Notes in Computer Science 2149, pages 189–203, 2001.
K. St. John, T. Warnow, B. M. E. Moret, and L. Vawter. Performance study of phylogenetic methods: (unweighted) quartet methods and neighbor-joining. In Proceedings of the 12th Annual ACM/SIAM Symposium on Discrete Algorithms (SODA’01), pages 196–205, 2001.
K. Strimmer and A. von Haeseler. Quartet puzzling: a maximum likelihood method for reconstructing tree topologies. Molecular Biology Evolution, 13:964–969, 1996.
T. Warnow, B. M. E. Moret, and K. St. John. Absolute phylogeny: true trees from short sequences. In Proceedings of the 12th Annual ACM/SIAM Symposium on Discrete Algorithms (SODA’01), pages 186–195, 2001.
M. S. Waterman. Introduction to Computational Biology: Sequences, Maps and Genomes. Chapman Hall, 1995.
L. Xiao, X. Zhang, and S. A. Kubricht. Improving memory performance of sorting algorithms. ACM Journal of Experimental Algorithmics, 5(3), 2000. Online at http://www.jea.acm.org/2000/XiaoMemory/.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Moret, B.M.E., Warnow, T. (2002). Reconstructing Optimal Phylogenetic Trees: A Challenge in Experimental Algorithmics. In: Fleischer, R., Moret, B., Schmidt, E.M. (eds) Experimental Algorithmics. Lecture Notes in Computer Science, vol 2547. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36383-1_8
Download citation
DOI: https://doi.org/10.1007/3-540-36383-1_8
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00346-5
Online ISBN: 978-3-540-36383-5
eBook Packages: Springer Book Archive