Information Content of Sets of Biological Sequences Revisited

Carbone, Alessandra; Engelen, Stefan

doi:10.1007/978-3-540-88869-7_3

Alessandra Carbone⁶ &
Stefan Engelen⁶

Part of the book series: Natural Computing Series ((NCS))

1279 Accesses

Abstract

To analyze the information included in a pool of amino acid sequences, a first approach is to align the sequences, to estimate the probability of each amino acid to occur within columns of the aligned sequences and to combine these values through an “entropy” function whose minimum corresponds to absence of information, that is, to the case where each amino acid has the same probability to occur. Another alternative is to construct a distance tree between sequences (issued by the alignment) based on sequence similarity and to properly interpret the tree topology so to model the evolutionary property of residue conservation. We introduce the concept of “evolutionary content” of a tree of sequences, and demonstrate at what extent the more classical notion of “information content” on sequences approximates the new measure and in what manner tree topology contributes sharper information for the detection of protein binding sites.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Quantiprot - a Python package for quantitative analysis of protein sequences

Article Open access 17 July 2017

Introduction

Information Theory in Genome Analysis

References

Adami C, Cerf NJ (2000) Physical complexity of symbolic sequences. Physica D 137:62–69
Article MATH MathSciNet Google Scholar
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(3):389–3402
Google Scholar
Baussand J (2008) Évolution des séquences protéiques: signatures structurales hydrophobes et réseaux d’acides aminés co-évolués. Thèse de Doctorat de l’Université Pierre et Marie Curie-Paris 6
Google Scholar
Caffrey DR, Somaroo S, Hughes JH, Mintseris J, Huang ES (2004) Are protein–protein interfaces more conserved in sequence than the rest of the protein surface? Protein Sci 13:190–189
Article Google Scholar
Carothers JM, Oestreich SC, Davis JH, Szostak JW (2004) Informational complexity and functional activity of RNA structures. J Am Chem Soc 126:5130–5137
Article Google Scholar
Duret L, Abdeddaim S (2000) Multiple alignment for structural functional or phylogenetic analyses of homologous sequences. In: Higgins D, Taylor W (eds) Bioinformatics sequence structure and databanks. Oxford University Press, Oxford
Google Scholar
Engelen S, Trojan LA, Sacquin-Mora S, Lavery R, Carbone A (2009) JET: detection and analysis of protein interfaces based on evolution. PLOS Comput Biol 5(1):e1000267, 1–17
Article Google Scholar
Lecompte O, Thompson JD, Plewniak F, Thierry J, Poch O (2001) Multiple alignment of complete sequences (MACS) in the post-genomic era. Gene 270:17–30
Article Google Scholar
Lockless S, Ranganathan R (1999) Evolutionary conserved pathways of energetic connectivity in protein families. Science 286:295–299
Article Google Scholar
Mihalek I, Reš I, Lichtarge O (2004) A family of evolution-entropy hybrid methods for ranking protein residues by importance. J Mol Biol 336:1265–1282
Article Google Scholar
Moult J (2005) A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction. Curr Opin Struct Biol 15:285–289
Article Google Scholar
Notredame C (2002) Recent progresses in multiple sequence alignment: a survey. Pharmacogenomics 31:131–144
Article Google Scholar
Notredame C (2007) Recent evolutions of multiple sequence alignment algorithms. PLOS Comput Biol 8:e123
Article Google Scholar
Phillips A, Janies D, Wheeler W (2000) Multiple sequence alignment in phylogenetic analysis. Mol Phylogenet Evol 16:317–330
Article Google Scholar
Schmidt Am Busch M, Lopes A, Mignon D, Simonson T (2007) Computational protein design: software implementation, parameter optimization, and performance of a simple model. J Comput Chem 29(7):1092–1102
Article Google Scholar
Suel G, Lockless S, Wall M, Ranganthan R (2003) Evolutionary conserved networks of residues mediate allosteric communication in proteins. Nat Struct Biol 23:59–69
Article Google Scholar
Sugio S, Petsko GA, Manning JM, Soda K, Ringe D (1995) Crystal structure of a D-amino acid aminotransferase: how the protein controls stereoselectivity. Biochemistry 34:9661–9669
Article Google Scholar
Thompson JD, Plewniak F, Poch O (1999) A comprehensive comparison of multiple sequence alignment programs. Nucleic Acids Res 27:12682–12690
Article Google Scholar
Wallace IM, Blackshields G, Higgins DG (2005) Multiple sequence alignments. Curr Opin Struct Biol 15:261–266
Article Google Scholar
Watson JD, Laskowski RA, Thornton JM (2005) Predicting protein function from sequence and structural data. Curr Opin Struct Biol 15:275–284
Article Google Scholar
Xayaphoummine A, Viasnoff V, Harlepp S, Isambert H (2007) Encoding folding paths of RNA switches. Nucleic Acids Res 35:614–622
Article Google Scholar
Xia Y, Levitt M (2004) Simulating protein evolution in sequence and structure space. Curr Opin Struct Biol 14:202–207
Article Google Scholar

Download references

Author information

Authors and Affiliations

Génomique Analytique, Université Pierre et Marie Curie, INSERM UMRS511, 91, Bd de l’Hôpital, 75013, Paris, France
Alessandra Carbone & Stefan Engelen

Authors

Alessandra Carbone
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Engelen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alessandra Carbone .

Editor information

Editors and Affiliations

Dept. Computer Science, University of British Columbia, Main Mall 201-2366, Vancouver, V6T 1Z4, Canada
Anne Condon
Dept. Applied Mathematics, Weizmann Institute of Science, Rehovot, 76100, Israel
David Harel
Leiden Inst. Advanced Computer Science, Leiden University, Niels Bohrweg 1, Leiden, 2333 CA, Netherlands
Joost N. Kok
Turku Centre for Computer Science, Lemminkaisenkatu 14 A, Turku, 20520, Finland
Arto Salomaa
Computer Science, Computation,, California Inst. of Technology, Pasadena, 91125, U.S.A.
Erik Winfree

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Carbone, A., Engelen, S. (2009). Information Content of Sets of Biological Sequences Revisited. In: Condon, A., Harel, D., Kok, J., Salomaa, A., Winfree, E. (eds) Algorithmic Bioprocesses. Natural Computing Series. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88869-7_3

Download citation

DOI: https://doi.org/10.1007/978-3-540-88869-7_3
Published: 13 August 2009
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88868-0
Online ISBN: 978-3-540-88869-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics