Abstract
Information content of a pool of sequences has been defined in information theory through enthropic measures aimed to capture the amount of variability within sequences. When dealing with biological sequences coding for proteins, a first approach is to align these sequences to estimate the probability of each amino-acid to occur within alignment positions and to combine these values through an “entropy” function whose minimum corresponds to the case where for each position, each amino-acid has the same probability to occur. This model is too restrictive when the purpose is to evaluate sequence constraints that have to be conserved to maintain the function of the proteins under random mutations. In fact, co-evolution of amino-acids appearing in pairs or tuplets of positions in sequences constitutes a fine signal of important structural, functional and mechanical information for protein families. It is clear that classical information theory should be revisited when applied to biological data. A large number of approaches to co-evolution of biological sequences have been developed in the last seven years. We present a few of them, discuss their limitations and some related questions, like the generation of random structures to validate predictions based on co-evolution, which appear crucial for new advances in structural bioinformatics.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Adami, C., Cerf, N.J.: Physical complexity of symbolic sequences. Physica D 137, 62–69 (2000)
Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)
Armon, A., Graur, D., Ben-Tal, N.: ConSurf: An Algorithmic Tool for the Identification of Functional Regions in Proteins by Surface Mapping of Phylogenetic Information. J. Mol. Biol. 307, 447–463 (2001)
Cheng, G., Qian, B., Samudrala, R., Baker, D.: Improvement in protein functional site prediction by distinguishing structural and functional constraints on protein family evolution using computational design. Nucleic Acids Res. 33, 5861–5867 (2005)
Baussand, J., Carbone, A.: A combinatorial approach to detect co-evolved amino-acid networks in protein families with variable divergence (submitted manuscript) (2009)
Bickel, P.J., Kechris, K.J., Spector, P.C., Wedemayer, G.J., Glazer, A.N.: Finding important sites in protein sequences. Proceedings of the National Academy of Sciences USA 99, 14764–14771 (2002)
Capra, J.A., Singh, M.: Predicting functionnally important residues from sequences conservation. Bioinformatics 23, 1875–1882 (2007)
Carbone, A., Engelen, S.: Information content of sets of biological sequences revisited. In: Condon, A., Harel, D., Kok, J.N., Salomaa, A., Winfree, E. (eds.) Algorithmic Bioprocesses. Natural Computing Series. Springer, Heidelberg (2008)
Carothers, J.M., Oestreich, S.C., Davis, J.H., Szostak, J.W.: Informational complexity and functional activity of RNA structures. J. Am. Chem. Soc. 126, 5130–5137 (2004)
Chang, M.S.S., Benner, S.A.: Empirical analysis of protein insertions and deletions determining parameters for the correct placement of gaps in protein sequence alignments. J. Mol. Biol. 341, 617–631 (2004)
Cheng, G., Qian, B., Samudrala, R., Baker, D.: Improvement in protein functional site prediction by distinguishing structural and functional constraints on protein family evolution using computational design. Nucleic Acids Research 33, 5861–5867 (2005)
del Alamo, M., Mateu, M.G.: Electrostatic repulsion, compensatory mutations, and long-range non-additive effects at the dimerization interface of the HIV capsid protein. J. Mol. Biol. 345, 893–906 (2005)
Dunn, S.D., Wahl, L.M., Gloor, G.B.: Mutual Information Without the Influence of Phylogeny or Entropy Dramatically Improves Residue Contact Prediction. Bioinformatics 24, 333–340 (2008)
Duret, L., Abdeddaim, S.: Multiple alignment for structural functional or phylogenetic analyses of homologous sequences. In: Higgins, D., Taylor, W. (eds.) Bioinformatics sequence structure and databanks. Oxford University Press, Oxford (2000)
Engelen, S., Trojan, L.A., Sacquin-Mora, S., Lavery, R., Carbone, A.: Joint Evolutionary Trees: detection and analysis of protein interfaces. PLoS Computational Biology 5(1), e1000267 (2009)
Fares, M.A., Travers, S.A.A.: A Novel Method for Detecting Intramolecular Coevolution: Adding a Further Dimension to Selective Constraints Analyses. Genetics 173, 9–23 (2006)
Fares, M.A., McNally, D.: CAPS: coevolution analysis using protein sequences. Bioinformatics 22, 2821–2822 (2006)
Felsenstein, J.: Inferring Phylogenies. Sinauer Associates, Sunderland (2004)
Fitch, W.M., Markowitz, E.: An improved method for determining codon variability in a gene and its application to the rate of fixation of mutations in evolution. Biochem Genet. 4, 579–593 (1970)
Fodor, A.A., Aldrich, R.W.: Influence of conservation on calculations of amino acid covariance in multiple sequence alignments. Proteins 56, 211–221 (2004a)
Gloor, G.B., Martin, L.C., Wahl, L.N., Dunn, S.D.: Mutual information in protein multiple sequence alignments reveals two two classes of coevolving positions. Biochemistry 44, 7156–7165 (2005)
Halperin, I., Wolfson, H., Nussinov, R.: Correlated mutations: advances and limitations. A study on fusion proteins and on the CohesinDockerin families. Proteins 63, 832–845 (2006)
Innis, C.A.: siteFiNDER–3D: a web-based tool for predicting the location of functional sites in proteins. Nucleic Acids Res. 35(Web-Server-Issue), 489–494 (2007)
Kass, I., Horovitz, A.: Mapping pathways of allosteric communication in GroEL by analysis of correlated mutations. Proteins: Structure, Function, and Bioinformatics 48, 611–617 (2002)
Lecompte, O., Thompson, J.D., Plewniak, F., Thierry, J., Poch, O.: Multiple alignment of complete sequences (MACS) in the post-genomic era. Gene. 270, 17–30 (2001)
Lichtarge, O., Bourne, H.R., Cohen, F.E.: An evolutionary trace method defines binding surfaces common to protein families. J. Mol. Biol. 257, 342–358 (1996)
Lichtarge, O., Sowa, M.E.: Evolutionary predictions of binding surfaces and interactions. Current Opinions in Structural Biology 12, 21–27 (2002)
Lockless, S.W., Ranganathan, R.: Evolutionary conserved pathways of energetic connectivity in protein families. Science 286, 295–299 (1999)
Martin, L.C., Gloor, G.B., Dunn, S.D., Wahl, L.M.: Using information theory to search for co-evolving residues in proteins. Bioinformatics 21, 4116–4124 (2005)
Mateu, M.G., Fersht, A.R.: Mutually compensatory mutations during evolution of the tetramerization domain of tumor suppressor p53 lead to impaired hetero-oligomerization. Proc. Natl. Acad Sci. USA 96, 3595–3599 (1999)
Mintseris, J., Weng, Z.: Structure, function, and evolution of transient and obligate proteinprotein interactions. Proc. Natl. Acad. Sci. USA 102, 10930–10935 (2005)
Notredame, C.: Recent progresses in multiple sequence alignment: a survey. Pharmacogenomics 31, 131–144 (2002)
Notredame, C.: Recent evolutions of multiple sequence alignment algorithms. PLOS Computational Biology 8, e123 (2007)
Pazos, F., Helmer-Citterich, M., Ausiello, G., Valencia, A.: Correlated mutations contain information about proteinprotein interaction. J. Mol. Biol. 271, 511–523 (1997)
Pazos, F., Valencia, A.: In silico two-hybrid system for the selection of physically interacting protein pairs. Proteins 47, 219–227 (2002)
Poon, A., Chao, L.: The rate of compensatory mutation in the DNA bacteriophage X174. Genetics 170, 989–999 (2005)
Pupko, T., Bell, R.E., Mayrose, I., Glaser, F., Ben-Tal, N.: Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues. Bioinformatics 18, S71–S77 (2002)
Rambaut, A., Grassly, N.C.: Seq-Gen: An application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Comput. Appl. Biosci. 13, 235–238 (1997)
Strope, C.L., Scott, S.D., Moriyama, E.N.: indel-Seq-Gen: A new protein family simulator incorporating domains, motifs, and indels. Mol. Biol. Evol. 24, 640–649 (2007)
Suel, G.M., Lockless, S.W., Wall, M.A., Ranganathan, R.: Evolutionary conserved networks of residues mediate allosteric communication in proteins. Nature Struct. Biol. 23, 59–69 (2003)
Thompson, J.D., Plewniak, F., Poch, O.: A comprehensive comparison of multiple sequence alignment programs. Nucleic Acids Research 27, 12682–12690 (1999)
Tillier, E.R., Lui, T.W.: Using multiple interdependency to separate functional from phylogenetic correlations in protein alignments. Bioinformatics 19, 750–755 (2003)
Tress, M., de Juan, D., Grana, O., Gomez, M.J., Gomez-Puertas, P., Gonzalez, J.M., Lopez, G., Valencia, A.: Scoring docking models with evolutionary information. Proteins 60, 275–280 (2005)
Yang, Z.: Adaptive molecular evolution. In: Balding, D., Bishop, M., Cannings, C. (eds.) Handbook of statistical genetics, pp. 327–350. Wiley, New York (2001)
Yang, Z., Swanson, W.J., Vacquier, V.D.: Maximum likelihood analysis of molecular adaptation in abalone sperm lysin reveals variable selective pressures among lineages and sites. Mol. Biol. Evol. 17, 1446–1455 (2000)
Yanofsky, C., Horn, V., Thorpe, D.: Protein Structure Relationships Revealed by Mutational Analysis. Science 146, 1593–1594 (1964)
Wallace, I.M., Blackshields, G., Higgins, D.G.: Multiple sequence alignments. Curr. Opin. Struct. Biol. 15, 261–266 (2005)
Watson, J.D., Laskowski, R.A., Thornton, J.M.: Predicting protein function from sequence and structural data. Curr. Opin. Struct. Biol. 15, 275–284 (2005)
Wollenberg, K.R., Atchley, W.R.: Separation of phylogenetic and functional associations in biological sequences by using the parametric bootstrap. Proc. Natl. Acad. Sci. U S A 97, 3288–3291 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Carbone, A., Dib, L. (2009). Co-evolution and Information Signals in Biological Sequences. In: Chen, J., Cooper, S.B. (eds) Theory and Applications of Models of Computation. TAMC 2009. Lecture Notes in Computer Science, vol 5532. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02017-9_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-02017-9_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02016-2
Online ISBN: 978-3-642-02017-9
eBook Packages: Computer ScienceComputer Science (R0)