Skip to main content

Co-evolution and Information Signals in Biological Sequences

  • Conference paper
Theory and Applications of Models of Computation (TAMC 2009)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5532))

Abstract

Information content of a pool of sequences has been defined in information theory through enthropic measures aimed to capture the amount of variability within sequences. When dealing with biological sequences coding for proteins, a first approach is to align these sequences to estimate the probability of each amino-acid to occur within alignment positions and to combine these values through an “entropy” function whose minimum corresponds to the case where for each position, each amino-acid has the same probability to occur. This model is too restrictive when the purpose is to evaluate sequence constraints that have to be conserved to maintain the function of the proteins under random mutations. In fact, co-evolution of amino-acids appearing in pairs or tuplets of positions in sequences constitutes a fine signal of important structural, functional and mechanical information for protein families. It is clear that classical information theory should be revisited when applied to biological data. A large number of approaches to co-evolution of biological sequences have been developed in the last seven years. We present a few of them, discuss their limitations and some related questions, like the generation of random structures to validate predictions based on co-evolution, which appear crucial for new advances in structural bioinformatics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Adami, C., Cerf, N.J.: Physical complexity of symbolic sequences. Physica D 137, 62–69 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  2. Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)

    Article  Google Scholar 

  3. Armon, A., Graur, D., Ben-Tal, N.: ConSurf: An Algorithmic Tool for the Identification of Functional Regions in Proteins by Surface Mapping of Phylogenetic Information. J. Mol. Biol. 307, 447–463 (2001)

    Article  Google Scholar 

  4. Cheng, G., Qian, B., Samudrala, R., Baker, D.: Improvement in protein functional site prediction by distinguishing structural and functional constraints on protein family evolution using computational design. Nucleic Acids Res. 33, 5861–5867 (2005)

    Article  Google Scholar 

  5. Baussand, J., Carbone, A.: A combinatorial approach to detect co-evolved amino-acid networks in protein families with variable divergence (submitted manuscript) (2009)

    Google Scholar 

  6. Bickel, P.J., Kechris, K.J., Spector, P.C., Wedemayer, G.J., Glazer, A.N.: Finding important sites in protein sequences. Proceedings of the National Academy of Sciences USA 99, 14764–14771 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  7. Capra, J.A., Singh, M.: Predicting functionnally important residues from sequences conservation. Bioinformatics 23, 1875–1882 (2007)

    Article  Google Scholar 

  8. Carbone, A., Engelen, S.: Information content of sets of biological sequences revisited. In: Condon, A., Harel, D., Kok, J.N., Salomaa, A., Winfree, E. (eds.) Algorithmic Bioprocesses. Natural Computing Series. Springer, Heidelberg (2008)

    Google Scholar 

  9. Carothers, J.M., Oestreich, S.C., Davis, J.H., Szostak, J.W.: Informational complexity and functional activity of RNA structures. J. Am. Chem. Soc. 126, 5130–5137 (2004)

    Article  Google Scholar 

  10. Chang, M.S.S., Benner, S.A.: Empirical analysis of protein insertions and deletions determining parameters for the correct placement of gaps in protein sequence alignments. J. Mol. Biol. 341, 617–631 (2004)

    Article  Google Scholar 

  11. Cheng, G., Qian, B., Samudrala, R., Baker, D.: Improvement in protein functional site prediction by distinguishing structural and functional constraints on protein family evolution using computational design. Nucleic Acids Research 33, 5861–5867 (2005)

    Article  Google Scholar 

  12. del Alamo, M., Mateu, M.G.: Electrostatic repulsion, compensatory mutations, and long-range non-additive effects at the dimerization interface of the HIV capsid protein. J. Mol. Biol. 345, 893–906 (2005)

    Article  Google Scholar 

  13. Dunn, S.D., Wahl, L.M., Gloor, G.B.: Mutual Information Without the Influence of Phylogeny or Entropy Dramatically Improves Residue Contact Prediction. Bioinformatics 24, 333–340 (2008)

    Article  Google Scholar 

  14. Duret, L., Abdeddaim, S.: Multiple alignment for structural functional or phylogenetic analyses of homologous sequences. In: Higgins, D., Taylor, W. (eds.) Bioinformatics sequence structure and databanks. Oxford University Press, Oxford (2000)

    Google Scholar 

  15. Engelen, S., Trojan, L.A., Sacquin-Mora, S., Lavery, R., Carbone, A.: Joint Evolutionary Trees: detection and analysis of protein interfaces. PLoS Computational Biology 5(1), e1000267 (2009)

    Article  Google Scholar 

  16. Fares, M.A., Travers, S.A.A.: A Novel Method for Detecting Intramolecular Coevolution: Adding a Further Dimension to Selective Constraints Analyses. Genetics 173, 9–23 (2006)

    Article  Google Scholar 

  17. Fares, M.A., McNally, D.: CAPS: coevolution analysis using protein sequences. Bioinformatics 22, 2821–2822 (2006)

    Article  Google Scholar 

  18. Felsenstein, J.: Inferring Phylogenies. Sinauer Associates, Sunderland (2004)

    Google Scholar 

  19. Fitch, W.M., Markowitz, E.: An improved method for determining codon variability in a gene and its application to the rate of fixation of mutations in evolution. Biochem Genet. 4, 579–593 (1970)

    Article  Google Scholar 

  20. Fodor, A.A., Aldrich, R.W.: Influence of conservation on calculations of amino acid covariance in multiple sequence alignments. Proteins 56, 211–221 (2004a)

    Article  Google Scholar 

  21. Gloor, G.B., Martin, L.C., Wahl, L.N., Dunn, S.D.: Mutual information in protein multiple sequence alignments reveals two two classes of coevolving positions. Biochemistry 44, 7156–7165 (2005)

    Article  Google Scholar 

  22. Halperin, I., Wolfson, H., Nussinov, R.: Correlated mutations: advances and limitations. A study on fusion proteins and on the CohesinDockerin families. Proteins 63, 832–845 (2006)

    Article  Google Scholar 

  23. Innis, C.A.: siteFiNDER–3D: a web-based tool for predicting the location of functional sites in proteins. Nucleic Acids Res. 35(Web-Server-Issue), 489–494 (2007)

    Article  Google Scholar 

  24. Kass, I., Horovitz, A.: Mapping pathways of allosteric communication in GroEL by analysis of correlated mutations. Proteins: Structure, Function, and Bioinformatics 48, 611–617 (2002)

    Article  Google Scholar 

  25. Lecompte, O., Thompson, J.D., Plewniak, F., Thierry, J., Poch, O.: Multiple alignment of complete sequences (MACS) in the post-genomic era. Gene. 270, 17–30 (2001)

    Article  Google Scholar 

  26. Lichtarge, O., Bourne, H.R., Cohen, F.E.: An evolutionary trace method defines binding surfaces common to protein families. J. Mol. Biol. 257, 342–358 (1996)

    Article  Google Scholar 

  27. Lichtarge, O., Sowa, M.E.: Evolutionary predictions of binding surfaces and interactions. Current Opinions in Structural Biology 12, 21–27 (2002)

    Article  Google Scholar 

  28. Lockless, S.W., Ranganathan, R.: Evolutionary conserved pathways of energetic connectivity in protein families. Science 286, 295–299 (1999)

    Article  Google Scholar 

  29. Martin, L.C., Gloor, G.B., Dunn, S.D., Wahl, L.M.: Using information theory to search for co-evolving residues in proteins. Bioinformatics 21, 4116–4124 (2005)

    Article  Google Scholar 

  30. Mateu, M.G., Fersht, A.R.: Mutually compensatory mutations during evolution of the tetramerization domain of tumor suppressor p53 lead to impaired hetero-oligomerization. Proc. Natl. Acad Sci. USA 96, 3595–3599 (1999)

    Article  Google Scholar 

  31. Mintseris, J., Weng, Z.: Structure, function, and evolution of transient and obligate proteinprotein interactions. Proc. Natl. Acad. Sci. USA 102, 10930–10935 (2005)

    Article  Google Scholar 

  32. Notredame, C.: Recent progresses in multiple sequence alignment: a survey. Pharmacogenomics 31, 131–144 (2002)

    Article  Google Scholar 

  33. Notredame, C.: Recent evolutions of multiple sequence alignment algorithms. PLOS Computational Biology 8, e123 (2007)

    Article  Google Scholar 

  34. Pazos, F., Helmer-Citterich, M., Ausiello, G., Valencia, A.: Correlated mutations contain information about proteinprotein interaction. J. Mol. Biol. 271, 511–523 (1997)

    Article  Google Scholar 

  35. Pazos, F., Valencia, A.: In silico two-hybrid system for the selection of physically interacting protein pairs. Proteins 47, 219–227 (2002)

    Article  Google Scholar 

  36. Poon, A., Chao, L.: The rate of compensatory mutation in the DNA bacteriophage X174. Genetics 170, 989–999 (2005)

    Article  Google Scholar 

  37. Pupko, T., Bell, R.E., Mayrose, I., Glaser, F., Ben-Tal, N.: Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues. Bioinformatics 18, S71–S77 (2002)

    Article  Google Scholar 

  38. Rambaut, A., Grassly, N.C.: Seq-Gen: An application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Comput. Appl. Biosci. 13, 235–238 (1997)

    Google Scholar 

  39. Strope, C.L., Scott, S.D., Moriyama, E.N.: indel-Seq-Gen: A new protein family simulator incorporating domains, motifs, and indels. Mol. Biol. Evol. 24, 640–649 (2007)

    Article  Google Scholar 

  40. Suel, G.M., Lockless, S.W., Wall, M.A., Ranganathan, R.: Evolutionary conserved networks of residues mediate allosteric communication in proteins. Nature Struct. Biol. 23, 59–69 (2003)

    Article  Google Scholar 

  41. Thompson, J.D., Plewniak, F., Poch, O.: A comprehensive comparison of multiple sequence alignment programs. Nucleic Acids Research 27, 12682–12690 (1999)

    Article  Google Scholar 

  42. Tillier, E.R., Lui, T.W.: Using multiple interdependency to separate functional from phylogenetic correlations in protein alignments. Bioinformatics 19, 750–755 (2003)

    Article  Google Scholar 

  43. Tress, M., de Juan, D., Grana, O., Gomez, M.J., Gomez-Puertas, P., Gonzalez, J.M., Lopez, G., Valencia, A.: Scoring docking models with evolutionary information. Proteins 60, 275–280 (2005)

    Article  Google Scholar 

  44. Yang, Z.: Adaptive molecular evolution. In: Balding, D., Bishop, M., Cannings, C. (eds.) Handbook of statistical genetics, pp. 327–350. Wiley, New York (2001)

    Google Scholar 

  45. Yang, Z., Swanson, W.J., Vacquier, V.D.: Maximum likelihood analysis of molecular adaptation in abalone sperm lysin reveals variable selective pressures among lineages and sites. Mol. Biol. Evol. 17, 1446–1455 (2000)

    Google Scholar 

  46. Yanofsky, C., Horn, V., Thorpe, D.: Protein Structure Relationships Revealed by Mutational Analysis. Science 146, 1593–1594 (1964)

    Article  Google Scholar 

  47. Wallace, I.M., Blackshields, G., Higgins, D.G.: Multiple sequence alignments. Curr. Opin. Struct. Biol. 15, 261–266 (2005)

    Article  Google Scholar 

  48. Watson, J.D., Laskowski, R.A., Thornton, J.M.: Predicting protein function from sequence and structural data. Curr. Opin. Struct. Biol. 15, 275–284 (2005)

    Article  Google Scholar 

  49. Wollenberg, K.R., Atchley, W.R.: Separation of phylogenetic and functional associations in biological sequences by using the parametric bootstrap. Proc. Natl. Acad. Sci. U S A 97, 3288–3291 (2000)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Carbone, A., Dib, L. (2009). Co-evolution and Information Signals in Biological Sequences. In: Chen, J., Cooper, S.B. (eds) Theory and Applications of Models of Computation. TAMC 2009. Lecture Notes in Computer Science, vol 5532. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02017-9_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-02017-9_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-02016-2

  • Online ISBN: 978-3-642-02017-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics