Abstract
Classical concepts of Information Theory are quickly summarized and their application to the computational analysis of genomes is outlined. Genomes are long strings, and this open the possibility of considering them as information sources. From this viewpoint, it turns out that information entropy, mutual information, entropic divergences, codes, and dictionaries (finite formal languages) are fundamental tools for extracting the biological information on which biological functionalities are based on. The importance of random genomes is also motivated, and some genomic distributions are presented and discussed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: Replacing suffix trees with enhanced suffix arrays. J. Discrete Algorithms 2(1), 53–86 (2004)
Bonnici, V.: Informational and Relational Analysis of Biological Data. Ph.D. Thesis, Department of Computer Science, University of Verona (2015)
Bonnici, V., Manca, V.: Infogenomics tools: a computational suite for informational analysis of genomes. Bioinform. Proteomics Rev. 1(1), 7–14 (2015)
Bonnici, V., Manca, V.: Recurrence distance distributions in computational genomics. Am. J. Bioinformat. Comput. Biol. 1, 7–14 (2015)
Brendel, V., Busse, H.: Genome structure described by formal languages. Nucleic Acids Res. 12(5), 2561–2568 (1984)
Castellini, A., Franco, G., Manca, V.: A dictionary based informational genome analysis. BMC Genomics 13(1), 485 (2012)
Castellini, A., Franco, G., Milanese, A.: A genome analysis based on repeat sharing gene networks. Nat. Comput. 14, 403–420 (2015)
ENCODE Project Consortium: An integrated encyclopedia of DNA elements in the human genome. Nature 489(7414), 57–72 (2012)
Cover, T.M., Thomas, J.A.: Elements of Information Theory. John Wiley & Sons, New York (1991)
Deonier, R.C., Tavaré, S., Waterman, M.: Computational Genome Analysis: An Introduction. Springer, New York (2005)
Feller, W.: An Introduction to Probability Theory and its Applications, vol. 1. Wiley, New York (1968)
Franco, G., Manca, V.: Algorithmic applications of XPCR. Nat. Comput. 10, 805–819 (2011)
Franco, G., Milanese, A.: An investigation on genomic repeats. In: Bonizzoni, P., Brattka, V., Löwe, B. (eds.) CiE 2013. LNCS, vol. 7921, pp. 149–160. Springer, Heidelberg (2013)
Gimona, M.: Protein linguistics–a grammar for modular protein assembly? Nat. Rev. Mol. Cell Biol. 7(1), 68–73 (2006)
Hampikian, G., Andersen, T.: Absent sequences: nullomers and primes. In: Pacific Symposium on Biocomputing, vol. 12, pp. 355–366 (2007)
Hao, B., Qi, J.: Prokaryote phylogeny without sequence alignment: from avoidance signature to composition distance. J. Bioinform. Comput. Biol. 2(01), 1–19 (2004)
Head, T.: Formal language theory and DNA: an analysis of the generative capacity of specific recombinant behaviors. Bull. Math. Biol. 49(6), 737–759 (1987)
Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22, 79–86 (1951)
Manca, V.: Infobiotics: Information in Biotic Systems. Springer, Heidelberg (2013)
Manca, V.: Infogenomics: genomes as information sources. In: Emerging Trends in Computational Biology, Bioinformatics, and Systems Biology, pp. 1–10. Morgan Kauffman (2015)
Manca, V.: Outlines of an informational approach to computational genomics. In: Gheorghe Paun’s 65th Birthday Festschrift Volume, pp. 1–12 (2015)
Manca, V.: Research lines in infogenomics. Bioinform. Proteomics Rev. 1(1), 1–4 (2015)
Manca, V., Franco, G.: Computing by polymerase chain reaction. Math. Biosci. 211(2), 282–298 (2008)
Păun, G., Rozenberg, G., Salomaa, A.: DNA Computing, New Computing Paradigms. Springer, Heidelberg (1998)
Puglisi, A., Baronchelli, A., Loreto, V.: Cultural route to the emergence of linguistic categories. Proc. Nat. Acad. Sci. U.S.A. 105(23), 7936–7940 (2008)
Rosenhouse, J.: The Monty Hall Problem. John Wiley & Sons, New York (2009)
Rozenberg, G., Salomaa, A.: Handbook of Formal Languages: Beyonds Words, vol. 3. Springer, Heidelberg (1997)
Searls, D.B.: The language of genes. Nature 420(6912), 211–217 (2002)
Searls, D.B.: Molecules, languages and automata. In: Sempere, J.M., García, P. (eds.) ICGI 2010. LNCS, vol. 6339, pp. 5–10. Springer, Heidelberg (2010)
Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27, 379–423, 623–656 (1948)
Vinga, S.: Information theory applications for biological sequence analysis. Briefings Bioinform. 15(3), 376–389 (2013)
Vinga, S., Almeida, J.: Alignment-free sequence comparison–a review. Bioinformatics 19(4), 513–523 (2003)
Yin, C., Chen, Y., Yau, S.S.T.: A measure of DNA sequence similarity by Fourier transform with applications on hierarchical clustering. J. Theoret. Biol. 359, 18–28 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Manca, V. (2015). Information Theory in Genome Analysis. In: Rozenberg, G., Salomaa, A., Sempere, J., Zandron, C. (eds) Membrane Computing. CMC 2015. Lecture Notes in Computer Science(), vol 9504. Springer, Cham. https://doi.org/10.1007/978-3-319-28475-0_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-28475-0_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28474-3
Online ISBN: 978-3-319-28475-0
eBook Packages: Computer ScienceComputer Science (R0)