Estimating Evolutionary Distances from Spaced-Word Matches

Morgenstern, Burkhard; Zhu, Binyao; Horwege, Sebastian; Leimeister, Chris-André

doi:10.1007/978-3-662-44753-6_13

Estimating Evolutionary Distances from Spaced-Word Matches

Burkhard Morgenstern^20,21,
Binyao Zhu²²,
Sebastian Horwege²⁰ &
…
Chris-André Leimeister²⁰

Conference paper

1867 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 8701))

Abstract

Alignment-free methods are increasingly used to estimate distances between DNA and protein sequences and to reconstruct phylogenetic trees. Most distance functions used by these methods, however, are heuristic measures of dissimilarity, not based on any explicit model of evolution. Herein, we propose a simple estimator of the evolutionary distance between two DNA sequences calculated from the number of (spaced) word matches between them. We show that this distance function estimates the evolutionary distance between DNA sequences more accurately than other distance measures used by alignment-free methods. In addition, we calculate the variance of the number of (spaced) word matches depending on sequence length and mismatch probability.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Vinga, S.: Editorial: Alignment-free methods in computational biology. Briefings in Bioinformatics 15, 341–342 (2014)
Article Google Scholar
Blaisdell, B.E.: A measure of the similarity of sets of sequences not requiring sequence alignment. Proceedings of the National Academy of Sciences of the United States of America 83, 5155–5159 (1986)
Article MATH Google Scholar
Lin, J.: Divergence measures based on the shannon entropy. IEEE Transactions on Information theory 37, 145–151 (1991)
Article MATH Google Scholar
Ma, B., Tromp, J., Li, M.: PatternHunter: faster and more sensitive homology search. Bioinformatics 18, 440–445 (2002)
Article Google Scholar
Boden, M., Schöneich, M., Horwege, S., Lindner, S., Leimeister, C.-A., Morgenstern, B.: Alignment-free sequence comparison with spaced k-mers. In: German Conference on Bioinformatics 2013. OpenAccess Series in Informatics (OASIcs), vol. 34, pp. 24–34 (2013)
Google Scholar
Leimeister, C.-A., Boden, M., Horwege, S., Lindner, S., Morgenstern, B.: Fast alignment-free sequence comparison using spaced-word frequencies. Bioinformatics 30, 2000–2008 (2014)
Article Google Scholar
Horwege, S., Sebastian, L., Boden, M., Hatje, K., Kollmar, M., Leimeister, C.-A., Morgenstern, B.: Spaced words and kmacs: fast alignment-free sequence comparison based on inexact word matches. Nucleic Acids Research 42, W7–W11 (2014)
Google Scholar
Saitou, N., Nei, M.: The neighbor-joining method: a new method for reconstructing phylogenetic trees. Molecular Biology and Evolution 4, 406–425 (1987)
Google Scholar
Haubold, B., Pierstorff, N., Möller, F., Wiehe, T.: Genome comparison without alignment using shortest unique substrings. BMC Bioinformatics 6, 123 (2005)
Article Google Scholar
Lippert, R.A., Huang, H., Waterman, M.S.: Distributional regimes for the number of k-word matches between two random sequences. Proceedings of the National Academy of Sciences 99, 13980–13989 (2002)
Article MATH MathSciNet Google Scholar
Kantorovitz, M., Robinson, G., Sinha, S.: A statistical method for alignment-free comparison of regulatory sequences. Bioinformatics 23, 249–255 (2007)
Article Google Scholar
Reinert, G., Chew, D., Sun, F., Waterman, M.S.: Alignment-free sequence comparison (i): Statistics and power. Journal of Computational Biology 16, 1615–1634 (2009)
Article MathSciNet Google Scholar
Jukes, T.H., Cantor, C.R.: Evolution of Protein Molecules. Academy Press (1969)
Google Scholar
Robin, S., Rodolphe, F., Schbath, S.: DNA, Words and Models: Statistics of Exceptional Words. Cambridge University Press, Cambridge (2005)
Google Scholar
Haubold, B., Pfaffelhuber, P., Domazet-Loso, M., Wiehe, T.: Estimating mutation distances from unaligned genomes. Journal of Computational Biology 16, 1487–1500 (2009)
Article MathSciNet Google Scholar
Leimeister, C.-A., Morgenstern, B.: kmacs: the k-mismatch average common substring approach to alignment-free sequence comparison. Bioinformatics 30, 1991–1999 (2014)
Article Google Scholar
Ulitsky, I., Burstein, D., Tuller, T., Chor, B.: The average common substring approach to phylogenomic reconstruction. Journal of Computational Biology 13, 336–350 (2006)
Article MathSciNet Google Scholar
Sims, G.E., Jun, S.-R., Wu, G.A., Kim, S.-H.: Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions. Proceedings of the National Academy of Sciences 106, 2677–2682 (2009)
Article Google Scholar
Qi, J., Luo, H., Hao, B.: CVTree: a phylogenetic tree reconstruction tool based on whole genomes. Nucleic Acids Research 32(suppl 2), W45–W47 (2004)
Google Scholar
Felsenstein, J.: PHYLIP - Phylogeny Inference Package (Version 3.2). Cladistics 5, 164–166 (1989)
Google Scholar
Bonnet, E., de Peer, Y.V.: zt: A sofware tool for simple and partial mantel tests. Journal of Statistical Software 7, 1–12 (2002)
Google Scholar
Didier, G., Laprevotte, I., Pupin, M., Hénaut, A.: Local decoding of sequences and alignment-free comparison. J. Computational Biology 13, 1465–1476 (2006)
Article Google Scholar
Kuiken, C., Leitner, T., Foley, B., Hahn, B., Marx, P., McCutchan, F., Wolinsky, S., Korber, B.T. (eds.): HIV Sequence Compendium 2009. Theoretical Biology and Biophysics Group, Los Alamos National Laboratory, Los Alamos, New Mexico (2009)
Google Scholar
Sievers, F., Wilm, A., Dineen, D., Gibson, T.J., Karplus, K., Li, W., Lopez, R., McWilliam, H., Remmert, M., Söding, J., Thompson, J.D., Higgins, D.G.: Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Molecular Systems Biology 7, 539 (2011)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Microbiology and Genetics, Department of Bioinformatics, University of Göttingen, Goldschmidtstr. 1, 37077, Göttingen, Germany
Burkhard Morgenstern, Sebastian Horwege & Chris-André Leimeister
Laboratoire Statistique et Génome, UMR CNRS 8071, USC INRA, Université d’Evry Val d’Essonne, 23 Boulevard de France, 91037, Evry, France
Burkhard Morgenstern
Institute of Microbiology and Genetics, Department of General Microbiology, University of Göttingen, Grisebachstr. 8, 37077, Göttingen, Germany
Binyao Zhu

Authors

Burkhard Morgenstern
View author publications
You can also search for this author in PubMed Google Scholar
Binyao Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Sebastian Horwege
View author publications
You can also search for this author in PubMed Google Scholar
Chris-André Leimeister
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

David R. Cheriton School of Computer Science, University of Waterloo, ON, Canada
Dan Brown
Institute of Microbiology and Genetics, Department of Bioinformatics, University of Göttingen, Germany, Goldschmidtstr. 1, 37077, Göttingen, Germany
Burkhard Morgenstern

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Morgenstern, B., Zhu, B., Horwege, S., Leimeister, CA. (2014). Estimating Evolutionary Distances from Spaced-Word Matches. In: Brown, D., Morgenstern, B. (eds) Algorithms in Bioinformatics. WABI 2014. Lecture Notes in Computer Science(), vol 8701. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44753-6_13

Download citation

DOI: https://doi.org/10.1007/978-3-662-44753-6_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-44752-9
Online ISBN: 978-3-662-44753-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics