Abstract
Substitution matrices are at the heart of Bioinformatics: sequence alignment, database search, phylogenetic inference, protein family classification are all based on BLOSUM, PAM, JTT, mtREV24 and other matrices. These matrices provide means of computing models of evolution and assessing the statistical relationships amongst sequences. This paper reports two results; first we show how Bayesian and grid settings can be used to derive novel specific substitution matrices for fish and insects and we discuss their performances with respect to standard amino acid replacement matrices. Then we discuss a novel application of these matrices: a refinement of the mutual information formula applied to amino acid alignments by incorporating a substitution matrix into the calculation of the mutual information. We show that different substitution matrices provide qualitatively different mutual information results and that the new algorithm allows the derivation of better estimates of the similarity along a sequence alignment. We thus express an interesting procedure: generating ad hoc substitution matrices from a collection of sequences and combining the substitution matrices and mutual information for the detection of sequence patterns.
Keywords
- Mutual Information
- Phylogenetic Inference
- Markov Chain Model
- Markov Chain Monte Carlo Method
- Amino Acid Replacement
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Adachi, J., Hasegawa, M.: Model of amino acid substitution in proteins encoded by mitochondrial DNA. J. Mol. Evol. 42, 459–468 (1996a)
Altekar, G., Dwarkadas, S., Huelsenbeck, J.P., Ronquist, F.: Parallel Metropolis coupled Markov chain Monte Carlo for Bayesian phylogenetic inference. Bioinformatics 20, 407–415 (2004)
Abascal, F., Posada, D., Zardoya, R.: MtArt: a new model of amino acid replacement for Arthropoda. Mol. Biol. Evol. 24, 1–5 (2007)
Huelsenbeck, J.P., Ronquist, F.: MrBayes: Bayesian inference in phylogenetic trees. Bioinformatics 17, 754–755 (2001)
Ronquist, F., Huelsenbeck, J.P.: MrBayes3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19, 1572–1574 (2003)
Rannala, B., Yang, Z.: Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. Genetics 164, 1645–1656 (2003)
Goldman, N., Thorne, J.L., Jones, D.T.: Using evolutionary trees in protein secondary structure prediction and other comparative sequence analyses. J. Mol. Biol. 263, 196–208 (1996)
Goldman, N., Thorne, J.L., Jones, D.T.: Assessing the impact of secondary structure and solvent accessibility on protein evolution. Genetics 149, 445–458 (1998)
Liò, P., Goldman, N.: Using protein structural information in evolutionary inference: transmembrane proteins. Mol. Biol. Evol. 16, 1696–1710 (1999)
Jones, D.T., Taylor, W.R., Thornton, J.M.: The rapid generation of mutation data matrices from protein sequences. CABIOS 8, 275–282 (1992)
Jones, D.T., Taylor, W.R., Thornton, J.M.: A mutation data matrix for transmembrane proteins. FEBS Letts 339, 269–275 (1994)
Altschul, S.F.: Amino acid substitutions matrices from an information theoretic perspective. J. Mol. Biol. 219, 555–665 (1991)
Dayhoff, M.O., Schwartz, R.M., Orcutt, B.C.: A model of evolutionary change in proteins. In: Dayhoff, M.O. (ed.) Atlas of Protein Sequence and Structure, vol. 5(3), pp. 345–352 (1978)
Henikoff, S., Henikoff, J.: Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89(biochemistry), 10915–10919 (1992)
Whelan, S., Liò, P., Goldman, N.: Molecular phylogenetics: State-of-art methods for looking into the past. Trends Genet. 17, 262–272 (2001)
Liò, P., Goldman, N.: Models of molecular evolution and phylogeny. Genome Res. 8, 1233–1244 (1998)
Chomyn, A.: Mitochondrial genetic control of assembly and function of complex I in mammalian cells. J. Bioenerg. Biomembr. 133, 251–257 (2001)
Duchen, M.R.: Mitochondria and calcium: from cell signalling to cell death. J. Physiol. 529, 57–68 (2000)
Grantham, R.: Amino acid difference formula to help explain protein evolution. Science 185, 862–864 (1974)
Li, M., Badger, J.H., Chen, X., Kwong, S., Kearney, P., Zhang, H.: An information-based sequence distance and its application to whole mitochondrial genome phylogeny. Bioinformatics 17, 149–154 (2001)
Carapelli, A., Liò, P., Nardi, F., van der Wath, E., Frati, F.: Phylogenetic analysis of mitochondrial protein coding genes confirms the reciprocal paraphyly of Hexapoda and Crustacea. BMC Evol. Biol. 7(suppl. 2), S8 (2007)
Li, M., Chen, X., Li, X., Ma, B., Vitanyi, P.: The similarity metric. E-print, arxiv.org/cs.CC/0111054 (2002)
Li, M., Vitanyi, P.: An introduction to Kolmogorov complexity and its applications. Springer, New York (1997)
Zardoya, R., Meyer, A.: Phylogenetic performance of mitochondrial protein-coding genes in resolving relationships among vertebrates. Molecular Biology and Evolution 13, 525–536 (1996)
Liò, P.: Phylogenetic and structural analysis of mitochondrial complex I proteins. Gene 345, 55–64 (1999)
Larget, B., Simon, D.: Markov chain Monte Carlo algorithms for the Bayesian analysis of phylogenetic trees. Mol. Biol. Evol. 16, 750–759 (1999)
Mau, B., Newton, M.A., Larget, B.: Bayesian phylogenetic inference via Markov chain Monte Carlo methods. Biometrics 55, 1–12 (1999)
Yang, Z., Rannala, B.: Bayesian phylogenetic inference using DNA sequences: Markov chain Monte Carlo methods. Mol. Biol. Evol. 14, 717–724 (1997)
Yang, Z., Nielsen, R., Hasegawa: Models of amino acid substitutions and applications to mitochondrial protein evolution. Mol. Biol. Evol. 15, 1600–1611 (1998)
Gascuel, O.: Mathematics of Evolution and Phylogeny. Oxford University Press, USA (2007)
Yang, Z.: Computational Molecular Evolution. Oxford Series in Ecology and Evolution. Oxford University Press, USA (2006)
Felsenstein, J.: Inferring Phylogenies, 2nd edn. Sinauer Associates (2003)
Nielsen, R.: Statistical Methods in Molecular Evolution, 1st edn. Statistics for Biology and Health. Springer, Heidelberg (2005)
Liò, P., Goldman, N.: Models of molecular evolution and phylogeny. Genome Res. 8, 1233–1244 (1998)
Russo, C.A., Takezaki, N., Nei, M.: Efficiencies of different genes and different tree-building methods in recovering a known vertebrate phylogeny. Mol. Biol. Evol. 13, 933–942 (1996)
Cao, Y., Janke, A., Waddell, P.J., Westerman, M., Takenaka, O., Murata, S., Okada, N., Paabo, S., Hasegawa, M.: Conflict among individual mitochondrial proteins in resolving the phylogeny of eutherian orders. J. Mol. Evol. 47, 307–322 (1998)
Swofford, D.L., Olsen, G.J., Waddell, P.J., Hillis, D.M.: Phylogenetic inference. In: Hillis, D.M., Moritz, C., Mable, B.K. (eds.) Molecular Systematics, pp. 407–514. Sinauer, Sunderland (1996)
Xia, X., Li, W.H.: What amino acid properties affect protein evolution? J. Mol. Evol. 47, 557–564 (1998)
Yang, Z.: Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J. Mol. Evol. 39, 306–314 (1994)
Liò, P., Politi, A., Buiatti, M., Ruffo, S.: High statistics block entropy measures of DNA sequences. J. Theor. Biol. 180(2), 151–160 (1996)
Kraskov, A., Stögbauer, H., Grassberger, P.: Estimating mutual information. Phys. Rev. E Stat. Nonlin. Soft. Matter. Phys. 69(6 Pt 2), 066138 (2004)
Hein, J.: TreeAlign. Methods Mol. Biol. 25, 349–364 (1994)
Papetti, C., Liò, P., Ruber, L., Patarnello, T., Zardoya, R.: Antarctic Fish Mitochondrial Genomes Lack ND6. Gene J. Mol. Evol. 65, 519–528 (2007)
Sokal, R.R., Rohlf, F.J.: Biometry, 3rd edn. Freeman, New York (1995)
Seq-Gen: a program that will simulate the evolution of nucleotide or amino acid sequences along a phylogeny, http://tree.bio.ed.ac.uk/software/seqgen/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kitchovitch, S., Song, Y., van der Wath, R., Liò, P. (2009). Substitution Matrices and Mutual Information Approaches to Modeling Evolution. In: Stützle, T. (eds) Learning and Intelligent Optimization. LION 2009. Lecture Notes in Computer Science, vol 5851. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11169-3_19
Download citation
DOI: https://doi.org/10.1007/978-3-642-11169-3_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-11168-6
Online ISBN: 978-3-642-11169-3
eBook Packages: Computer ScienceComputer Science (R0)