Co-evolution and Information Signals in Biological Sequences

Carbone, Alessandra; Dib, Linda

doi:10.1007/978-3-642-02017-9_4

Alessandra Carbone¹⁸ &
Linda Dib¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5532))

Included in the following conference series:

International Conference on Theory and Applications of Models of Computation

641 Accesses

Abstract

Information content of a pool of sequences has been defined in information theory through enthropic measures aimed to capture the amount of variability within sequences. When dealing with biological sequences coding for proteins, a first approach is to align these sequences to estimate the probability of each amino-acid to occur within alignment positions and to combine these values through an “entropy” function whose minimum corresponds to the case where for each position, each amino-acid has the same probability to occur. This model is too restrictive when the purpose is to evaluate sequence constraints that have to be conserved to maintain the function of the proteins under random mutations. In fact, co-evolution of amino-acids appearing in pairs or tuplets of positions in sequences constitutes a fine signal of important structural, functional and mechanical information for protein families. It is clear that classical information theory should be revisited when applied to biological data. A large number of approaches to co-evolution of biological sequences have been developed in the last seven years. We present a few of them, discuss their limitations and some related questions, like the generation of random structures to validate predictions based on co-evolution, which appear crucial for new advances in structural bioinformatics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A common root for coevolution and substitution rate variability in protein sequence evolution

Article Open access 02 December 2019

A novel algorithm for detecting multiple covariance and clustering of biological sequences

Article Open access 25 July 2016

Pseudo-Rate Matrices, Beyond Dayhoff’s Model

References

Adami, C., Cerf, N.J.: Physical complexity of symbolic sequences. Physica D 137, 62–69 (2000)
Article MATH MathSciNet Google Scholar
Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)
Article Google Scholar
Armon, A., Graur, D., Ben-Tal, N.: ConSurf: An Algorithmic Tool for the Identification of Functional Regions in Proteins by Surface Mapping of Phylogenetic Information. J. Mol. Biol. 307, 447–463 (2001)
Article Google Scholar
Cheng, G., Qian, B., Samudrala, R., Baker, D.: Improvement in protein functional site prediction by distinguishing structural and functional constraints on protein family evolution using computational design. Nucleic Acids Res. 33, 5861–5867 (2005)
Article Google Scholar
Baussand, J., Carbone, A.: A combinatorial approach to detect co-evolved amino-acid networks in protein families with variable divergence (submitted manuscript) (2009)
Google Scholar
Bickel, P.J., Kechris, K.J., Spector, P.C., Wedemayer, G.J., Glazer, A.N.: Finding important sites in protein sequences. Proceedings of the National Academy of Sciences USA 99, 14764–14771 (2002)
Article MATH MathSciNet Google Scholar
Capra, J.A., Singh, M.: Predicting functionnally important residues from sequences conservation. Bioinformatics 23, 1875–1882 (2007)
Article Google Scholar
Carbone, A., Engelen, S.: Information content of sets of biological sequences revisited. In: Condon, A., Harel, D., Kok, J.N., Salomaa, A., Winfree, E. (eds.) Algorithmic Bioprocesses. Natural Computing Series. Springer, Heidelberg (2008)
Google Scholar
Carothers, J.M., Oestreich, S.C., Davis, J.H., Szostak, J.W.: Informational complexity and functional activity of RNA structures. J. Am. Chem. Soc. 126, 5130–5137 (2004)
Article Google Scholar
Chang, M.S.S., Benner, S.A.: Empirical analysis of protein insertions and deletions determining parameters for the correct placement of gaps in protein sequence alignments. J. Mol. Biol. 341, 617–631 (2004)
Article Google Scholar
Cheng, G., Qian, B., Samudrala, R., Baker, D.: Improvement in protein functional site prediction by distinguishing structural and functional constraints on protein family evolution using computational design. Nucleic Acids Research 33, 5861–5867 (2005)
Article Google Scholar
del Alamo, M., Mateu, M.G.: Electrostatic repulsion, compensatory mutations, and long-range non-additive effects at the dimerization interface of the HIV capsid protein. J. Mol. Biol. 345, 893–906 (2005)
Article Google Scholar
Dunn, S.D., Wahl, L.M., Gloor, G.B.: Mutual Information Without the Influence of Phylogeny or Entropy Dramatically Improves Residue Contact Prediction. Bioinformatics 24, 333–340 (2008)
Article Google Scholar
Duret, L., Abdeddaim, S.: Multiple alignment for structural functional or phylogenetic analyses of homologous sequences. In: Higgins, D., Taylor, W. (eds.) Bioinformatics sequence structure and databanks. Oxford University Press, Oxford (2000)
Google Scholar
Engelen, S., Trojan, L.A., Sacquin-Mora, S., Lavery, R., Carbone, A.: Joint Evolutionary Trees: detection and analysis of protein interfaces. PLoS Computational Biology 5(1), e1000267 (2009)
Article Google Scholar
Fares, M.A., Travers, S.A.A.: A Novel Method for Detecting Intramolecular Coevolution: Adding a Further Dimension to Selective Constraints Analyses. Genetics 173, 9–23 (2006)
Article Google Scholar
Fares, M.A., McNally, D.: CAPS: coevolution analysis using protein sequences. Bioinformatics 22, 2821–2822 (2006)
Article Google Scholar
Felsenstein, J.: Inferring Phylogenies. Sinauer Associates, Sunderland (2004)
Google Scholar
Fitch, W.M., Markowitz, E.: An improved method for determining codon variability in a gene and its application to the rate of fixation of mutations in evolution. Biochem Genet. 4, 579–593 (1970)
Article Google Scholar
Fodor, A.A., Aldrich, R.W.: Influence of conservation on calculations of amino acid covariance in multiple sequence alignments. Proteins 56, 211–221 (2004a)
Article Google Scholar
Gloor, G.B., Martin, L.C., Wahl, L.N., Dunn, S.D.: Mutual information in protein multiple sequence alignments reveals two two classes of coevolving positions. Biochemistry 44, 7156–7165 (2005)
Article Google Scholar
Halperin, I., Wolfson, H., Nussinov, R.: Correlated mutations: advances and limitations. A study on fusion proteins and on the CohesinDockerin families. Proteins 63, 832–845 (2006)
Article Google Scholar
Innis, C.A.: siteFiNDER–3D: a web-based tool for predicting the location of functional sites in proteins. Nucleic Acids Res. 35(Web-Server-Issue), 489–494 (2007)
Article Google Scholar
Kass, I., Horovitz, A.: Mapping pathways of allosteric communication in GroEL by analysis of correlated mutations. Proteins: Structure, Function, and Bioinformatics 48, 611–617 (2002)
Article Google Scholar
Lecompte, O., Thompson, J.D., Plewniak, F., Thierry, J., Poch, O.: Multiple alignment of complete sequences (MACS) in the post-genomic era. Gene. 270, 17–30 (2001)
Article Google Scholar
Lichtarge, O., Bourne, H.R., Cohen, F.E.: An evolutionary trace method defines binding surfaces common to protein families. J. Mol. Biol. 257, 342–358 (1996)
Article Google Scholar
Lichtarge, O., Sowa, M.E.: Evolutionary predictions of binding surfaces and interactions. Current Opinions in Structural Biology 12, 21–27 (2002)
Article Google Scholar
Lockless, S.W., Ranganathan, R.: Evolutionary conserved pathways of energetic connectivity in protein families. Science 286, 295–299 (1999)
Article Google Scholar
Martin, L.C., Gloor, G.B., Dunn, S.D., Wahl, L.M.: Using information theory to search for co-evolving residues in proteins. Bioinformatics 21, 4116–4124 (2005)
Article Google Scholar
Mateu, M.G., Fersht, A.R.: Mutually compensatory mutations during evolution of the tetramerization domain of tumor suppressor p53 lead to impaired hetero-oligomerization. Proc. Natl. Acad Sci. USA 96, 3595–3599 (1999)
Article Google Scholar
Mintseris, J., Weng, Z.: Structure, function, and evolution of transient and obligate proteinprotein interactions. Proc. Natl. Acad. Sci. USA 102, 10930–10935 (2005)
Article Google Scholar
Notredame, C.: Recent progresses in multiple sequence alignment: a survey. Pharmacogenomics 31, 131–144 (2002)
Article Google Scholar
Notredame, C.: Recent evolutions of multiple sequence alignment algorithms. PLOS Computational Biology 8, e123 (2007)
Article Google Scholar
Pazos, F., Helmer-Citterich, M., Ausiello, G., Valencia, A.: Correlated mutations contain information about proteinprotein interaction. J. Mol. Biol. 271, 511–523 (1997)
Article Google Scholar
Pazos, F., Valencia, A.: In silico two-hybrid system for the selection of physically interacting protein pairs. Proteins 47, 219–227 (2002)
Article Google Scholar
Poon, A., Chao, L.: The rate of compensatory mutation in the DNA bacteriophage X174. Genetics 170, 989–999 (2005)
Article Google Scholar
Pupko, T., Bell, R.E., Mayrose, I., Glaser, F., Ben-Tal, N.: Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues. Bioinformatics 18, S71–S77 (2002)
Article Google Scholar
Rambaut, A., Grassly, N.C.: Seq-Gen: An application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Comput. Appl. Biosci. 13, 235–238 (1997)
Google Scholar
Strope, C.L., Scott, S.D., Moriyama, E.N.: indel-Seq-Gen: A new protein family simulator incorporating domains, motifs, and indels. Mol. Biol. Evol. 24, 640–649 (2007)
Article Google Scholar
Suel, G.M., Lockless, S.W., Wall, M.A., Ranganathan, R.: Evolutionary conserved networks of residues mediate allosteric communication in proteins. Nature Struct. Biol. 23, 59–69 (2003)
Article Google Scholar
Thompson, J.D., Plewniak, F., Poch, O.: A comprehensive comparison of multiple sequence alignment programs. Nucleic Acids Research 27, 12682–12690 (1999)
Article Google Scholar
Tillier, E.R., Lui, T.W.: Using multiple interdependency to separate functional from phylogenetic correlations in protein alignments. Bioinformatics 19, 750–755 (2003)
Article Google Scholar
Tress, M., de Juan, D., Grana, O., Gomez, M.J., Gomez-Puertas, P., Gonzalez, J.M., Lopez, G., Valencia, A.: Scoring docking models with evolutionary information. Proteins 60, 275–280 (2005)
Article Google Scholar
Yang, Z.: Adaptive molecular evolution. In: Balding, D., Bishop, M., Cannings, C. (eds.) Handbook of statistical genetics, pp. 327–350. Wiley, New York (2001)
Google Scholar
Yang, Z., Swanson, W.J., Vacquier, V.D.: Maximum likelihood analysis of molecular adaptation in abalone sperm lysin reveals variable selective pressures among lineages and sites. Mol. Biol. Evol. 17, 1446–1455 (2000)
Google Scholar
Yanofsky, C., Horn, V., Thorpe, D.: Protein Structure Relationships Revealed by Mutational Analysis. Science 146, 1593–1594 (1964)
Article Google Scholar
Wallace, I.M., Blackshields, G., Higgins, D.G.: Multiple sequence alignments. Curr. Opin. Struct. Biol. 15, 261–266 (2005)
Article Google Scholar
Watson, J.D., Laskowski, R.A., Thornton, J.M.: Predicting protein function from sequence and structural data. Curr. Opin. Struct. Biol. 15, 275–284 (2005)
Article Google Scholar
Wollenberg, K.R., Atchley, W.R.: Separation of phylogenetic and functional associations in biological sequences by using the parametric bootstrap. Proc. Natl. Acad. Sci. U S A 97, 3288–3291 (2000)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Département d’Informatique, Université Pierre et Marie Curie-Paris 6,
Alessandra Carbone
Génomique Analytique, FRE3214 CNRS-UPMC, 15, Rue de l’Ecole de Médecine, 75005, Paris,
Linda Dib

Authors

Alessandra Carbone
View author publications
You can also search for this author in PubMed Google Scholar
Linda Dib
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Texas A&M University, College Station, 77843, Texas, USA
Jianer Chen
School of Mathematics, University of Leeds, LS2 9JT, U.K.
S. Barry Cooper

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Carbone, A., Dib, L. (2009). Co-evolution and Information Signals in Biological Sequences. In: Chen, J., Cooper, S.B. (eds) Theory and Applications of Models of Computation. TAMC 2009. Lecture Notes in Computer Science, vol 5532. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02017-9_4

Download citation

DOI: https://doi.org/10.1007/978-3-642-02017-9_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02016-2
Online ISBN: 978-3-642-02017-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics