Skip to main content

Information Theory in Genome Analysis

  • Conference paper
  • First Online:
Membrane Computing (CMC 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9504))

Included in the following conference series:

Abstract

Classical concepts of Information Theory are quickly summarized and their application to the computational analysis of genomes is outlined. Genomes are long strings, and this open the possibility of considering them as information sources. From this viewpoint, it turns out that information entropy, mutual information, entropic divergences, codes, and dictionaries (finite formal languages) are fundamental tools for extracting the biological information on which biological functionalities are based on. The importance of random genomes is also motivated, and some genomic distributions are presented and discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: Replacing suffix trees with enhanced suffix arrays. J. Discrete Algorithms 2(1), 53–86 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  2. Bonnici, V.: Informational and Relational Analysis of Biological Data. Ph.D. Thesis, Department of Computer Science, University of Verona (2015)

    Google Scholar 

  3. Bonnici, V., Manca, V.: Infogenomics tools: a computational suite for informational analysis of genomes. Bioinform. Proteomics Rev. 1(1), 7–14 (2015)

    Google Scholar 

  4. Bonnici, V., Manca, V.: Recurrence distance distributions in computational genomics. Am. J. Bioinformat. Comput. Biol. 1, 7–14 (2015)

    Google Scholar 

  5. Brendel, V., Busse, H.: Genome structure described by formal languages. Nucleic Acids Res. 12(5), 2561–2568 (1984)

    Article  Google Scholar 

  6. Castellini, A., Franco, G., Manca, V.: A dictionary based informational genome analysis. BMC Genomics 13(1), 485 (2012)

    Article  Google Scholar 

  7. Castellini, A., Franco, G., Milanese, A.: A genome analysis based on repeat sharing gene networks. Nat. Comput. 14, 403–420 (2015)

    Article  MathSciNet  Google Scholar 

  8. ENCODE Project Consortium: An integrated encyclopedia of DNA elements in the human genome. Nature 489(7414), 57–72 (2012)

    Google Scholar 

  9. Cover, T.M., Thomas, J.A.: Elements of Information Theory. John Wiley & Sons, New York (1991)

    Book  MATH  Google Scholar 

  10. Deonier, R.C., Tavaré, S., Waterman, M.: Computational Genome Analysis: An Introduction. Springer, New York (2005)

    Google Scholar 

  11. Feller, W.: An Introduction to Probability Theory and its Applications, vol. 1. Wiley, New York (1968)

    MATH  Google Scholar 

  12. Franco, G., Manca, V.: Algorithmic applications of XPCR. Nat. Comput. 10, 805–819 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  13. Franco, G., Milanese, A.: An investigation on genomic repeats. In: Bonizzoni, P., Brattka, V., Löwe, B. (eds.) CiE 2013. LNCS, vol. 7921, pp. 149–160. Springer, Heidelberg (2013)

    Google Scholar 

  14. Gimona, M.: Protein linguistics–a grammar for modular protein assembly? Nat. Rev. Mol. Cell Biol. 7(1), 68–73 (2006)

    Article  Google Scholar 

  15. Hampikian, G., Andersen, T.: Absent sequences: nullomers and primes. In: Pacific Symposium on Biocomputing, vol. 12, pp. 355–366 (2007)

    Google Scholar 

  16. Hao, B., Qi, J.: Prokaryote phylogeny without sequence alignment: from avoidance signature to composition distance. J. Bioinform. Comput. Biol. 2(01), 1–19 (2004)

    Article  Google Scholar 

  17. Head, T.: Formal language theory and DNA: an analysis of the generative capacity of specific recombinant behaviors. Bull. Math. Biol. 49(6), 737–759 (1987)

    Article  MATH  MathSciNet  Google Scholar 

  18. Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22, 79–86 (1951)

    Article  MATH  MathSciNet  Google Scholar 

  19. Manca, V.: Infobiotics: Information in Biotic Systems. Springer, Heidelberg (2013)

    Book  Google Scholar 

  20. Manca, V.: Infogenomics: genomes as information sources. In: Emerging Trends in Computational Biology, Bioinformatics, and Systems Biology, pp. 1–10. Morgan Kauffman (2015)

    Google Scholar 

  21. Manca, V.: Outlines of an informational approach to computational genomics. In: Gheorghe Paun’s 65th Birthday Festschrift Volume, pp. 1–12 (2015)

    Google Scholar 

  22. Manca, V.: Research lines in infogenomics. Bioinform. Proteomics Rev. 1(1), 1–4 (2015)

    Google Scholar 

  23. Manca, V., Franco, G.: Computing by polymerase chain reaction. Math. Biosci. 211(2), 282–298 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  24. Păun, G., Rozenberg, G., Salomaa, A.: DNA Computing, New Computing Paradigms. Springer, Heidelberg (1998)

    MATH  Google Scholar 

  25. Puglisi, A., Baronchelli, A., Loreto, V.: Cultural route to the emergence of linguistic categories. Proc. Nat. Acad. Sci. U.S.A. 105(23), 7936–7940 (2008)

    Article  Google Scholar 

  26. Rosenhouse, J.: The Monty Hall Problem. John Wiley & Sons, New York (2009)

    MATH  Google Scholar 

  27. Rozenberg, G., Salomaa, A.: Handbook of Formal Languages: Beyonds Words, vol. 3. Springer, Heidelberg (1997)

    Book  Google Scholar 

  28. Searls, D.B.: The language of genes. Nature 420(6912), 211–217 (2002)

    Article  Google Scholar 

  29. Searls, D.B.: Molecules, languages and automata. In: Sempere, J.M., García, P. (eds.) ICGI 2010. LNCS, vol. 6339, pp. 5–10. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  30. Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27, 379–423, 623–656 (1948)

    Google Scholar 

  31. Vinga, S.: Information theory applications for biological sequence analysis. Briefings Bioinform. 15(3), 376–389 (2013)

    Article  Google Scholar 

  32. Vinga, S., Almeida, J.: Alignment-free sequence comparison–a review. Bioinformatics 19(4), 513–523 (2003)

    Article  Google Scholar 

  33. Yin, C., Chen, Y., Yau, S.S.T.: A measure of DNA sequence similarity by Fourier transform with applications on hierarchical clustering. J. Theoret. Biol. 359, 18–28 (2014)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vincenzo Manca .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Manca, V. (2015). Information Theory in Genome Analysis. In: Rozenberg, G., Salomaa, A., Sempere, J., Zandron, C. (eds) Membrane Computing. CMC 2015. Lecture Notes in Computer Science(), vol 9504. Springer, Cham. https://doi.org/10.1007/978-3-319-28475-0_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-28475-0_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-28474-3

  • Online ISBN: 978-3-319-28475-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics