Abstract
We model the evolution of biological and linguistic sequences by comparing their statistical properties. This comparison is performed by means of efficiently computable kernel functions, that take two sequences as an input and return a measure of statistical similarity between them. We show how the use of such kernels allows to reconstruct the phylogenetic trees of primates based on the mitochondrial DNA (mtDNA) of existing animals, and the phylogenetic tree of Indo-European and other languages based on sample documents from existing languages.
Kernel methods provide a convenient framework for many pattern analysis tasks, and recent advances have been focused on efficient methods for sequence comparison and analysis. While a large toolbox of algorithms has been developed to analyze data by using kernels, in this paper we demonstrate their use in combination with standard phylogenetic reconstruction algorithms and visualization methods.
Similar content being viewed by others
References
Allman, E.S., Rhodes, J.A.: Mathematical Models in Biology: An Introduction. Cambridge University Press, Cambridge (2004)
Benedetto, D., Caglioti, E., Loreto, V.: Language trees and zipping. Phys. Rev. Lett. (2002)
De Bie, T., Cristianini, N.: Kernel methods for exploratory data analysis: a demonstration on text data. In: Proceedings of the joint IAPR international workshops on Syntactical and Structural Pattern Recognition, SSPR 2004 and Statistical Pattern Recognition, SPR 2004, Lisbon, August 2004
Felsenstein, J.: Inferring Phylogenies. Sinauer, Sunderland (2004)
Ingman, M.: mtDB—Human Mitochondrial Genome Database, http://www.genpat.uu.se/mtDB/sequences.php
Ingman, M., Kaessmann, H., Pbo, S., Gyllensten, U.: Mitochondrial genome variation and the origin of modern humans. Nature (2000)
Kruskal, J.B., Wish, M.: Multidimensional Scaling. Sage, Beverly Hills (1978)
Leslie, C., Kuang, R.: Fast kernels for inexact string matching. In: Conference on Learning Theory, Columbia University, New York, NY, 2003
Li, M., Li, X., Ma, B., Vitanyi, P.: Similarity distance and phylogeny. IEEE Trans. Inform. Theory (2004)
Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text classification using string kernels. J. Mach. Learn. Res. (2002)
Nowak, M.A., Krakauer, D.C.: The evolution of language. Proc. Natl. Acad. Sci. USA (1999)
Perrière, G., Gouy, M.: WWW-Query: An on-line retrieval system for biological sequence banks. Biochimie (1996), http://pbil.univ-lyon1.fr/software/, pp. 364–369
United Nations General Assembly resolution 217 A (III), Universal Declaration of Human Rights, 1948
Ringe, D.A., Taylor, A., Warnow, T.: Determining the Evolutionary History of Languages. University of Pennsylvania, Philadelphia (1955)
Ruhlen, M.: The Origin of Language: Tracing the Evolution of the Mother Tongue. Wiley, New York (1994)
Saitou, N., Nei, M.: The neighbor joining method: A new method for reconstructing phylogenetic trees. Mol. Biol. Evol. (1987)
Shannon, C.: A mathematical theory of communication. Bell Syst. Tech. J. (1948)
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004), http://www.kernel-methods.net
Studier, A.J., Keppler, K.J.: A note on the neighbor joining algorithm of Saitou and Nei. Mol. Biol. Evol. (1988)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bresco, M., Turchi, M., De Bie, T. et al. Modeling sequence evolution with kernel methods. Comput Optim Appl 38, 281–298 (2007). https://doi.org/10.1007/s10589-007-9045-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10589-007-9045-9