Skip to main content
Log in

A genome analysis based on repeat sharing gene networks

  • Published:
Natural Computing Aims and scope Submit manuscript

Abstract

Motivated by an interest to understand how information is organized within genomes, and how genes communicate between each other in the transcription process, in this paper we propose a novel network based methodology for genomic sequence analysis, specifically applied to three organisms: Nanoarchaeum equitans, Escherichia coli, and Saccaromyces cerevisiae. A dictionary based approach previously introduced is here continued through a repeat analysis in genic and intergenic regions. Key results of this work have been found in a biological and computational analysis of novel parametrized gene networks, defined by means of motifs of fixed length occurring inside multiple genes. Cliques emerge as groups of genes sharing a long repeat with a clear biological interpretation, while a (complete, paralog) cluster analysis has outlined some unexpected regularity. Repeat sharing gene networks may be applied in contexts of comparative genomics, as an investigation methodology for a comprehension of evolutional and functional properties of genes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Notes

  1. www.cbmc.it/external/Infogenomics3.

  2. For example, capability of a protein to break chemical bonds or phosphorilate another protein.

  3. For example, a protein involved in replication, energy production or movement.

  4. Localization of the protein, for example in nucleus, on membranes, in ribosomes.

References

  • Aittokallio T, Schwikowski B (2006) Graph-based methods for analysing networks in cell biology. Brief Bioinform 7(3):243–255

    Article  Google Scholar 

  • Bartel DP (2009) MicroRNAs: target recognition and regulatory functions. Cell 136(2):215–233. doi:10.1016/j.cell.2009.01.002

    Article  Google Scholar 

  • Brendel V, Busse H (1984) Genome structure described by formal languages. Nucleic Acids Res 12(94):2561–2568

    Article  Google Scholar 

  • Castellini A, Franco G, Manca V (2012) A dictionary based informational genome analysis. BMC Genomics 13(1):485. doi:10.1186/1471-2164-13-485

    Article  Google Scholar 

  • Castellini A et al. Genome classification by dictionary-based indexes. Poster presented at the International Conference on Pattern Recognition in Bioinformatics (PRIB2011).

  • Chor B, Horn D, Goldman N et al (2009) Genomic DNA k-mer spectra: models and modalities. Genome Biol 10:R108

    Article  Google Scholar 

  • Das S, Paul S, Bag SK, Dutta C (2006) Analysis of Nanoarchaeum equitans genome and proteome composition: indications for hyperthermophilic and parasitic adaption. BMC Genomics 7:186

    Article  Google Scholar 

  • Dunham I, Kundaje A, Aldred S et al (2012) (the ENCODE Project Consortium): An integrated encyclopedia of DNA elements in the human genome. Nature 489:57–74

    Article  Google Scholar 

  • Fici G, Mignosi F, Restivo A et al (2006) Word assembly through minimal forbidden words. Theor Comput Sci 359:214–230

    Article  MathSciNet  MATH  Google Scholar 

  • Fofanov Y, Luo Y, Katili C, Wang J, Belosludtsev Y, Powdrill T, Belapurkar C, Fofanov V, Li T-B, Chumakov S, Pettitt BM (2008) How independent are the appearances of \(n\)-mers in different genomes? Bioinformatics 20(15):2421–2428

    Article  Google Scholar 

  • Franco G (2013) Perspectives in computational genome analysis. Discrete and topological models in molecular biology. Springer, Berlin

    Google Scholar 

  • Franco G, Milanese A (2013) An investigation on genomic repeats. LNCS 7921:149–160

    Google Scholar 

  • Friedman RC, Farh KK, Burge CB, Bartel DP (January 2009) Most mammalian mRNAs are conserved targets of microRNAs. Genome Res 19(1):92–105

  • Gottesman S (2004) The small RNA regulators of Escherichia coli: roles and mechanisms. Annu Rev Microbiol 58:303–328

    Article  Google Scholar 

  • Hampikian G, Andersen T (2007) Absent sequences: nullomers and primes. Pac Symp Biocomput 12:355–366

    Google Scholar 

  • Herold J, Kurtz S, Giegerich R (2008) Efficient computation of absent words in genomic sequences. BMC Bioinform 9:167

    Article  Google Scholar 

  • Hoogeboom H, Kosters W (2008) Substring differences in genomes. In: Armañanzas, R., Saeys, Y., Inza, I., García-Torres, M., Van de Peer, Y., Bielza, C., Larrañaga, P. (eds.) Proceedings of the Benelux Bioinformatics Conference (BBC 2008), pp. 62, Maastricht, The Netherlands

  • Hussein R, Lim HN (2012) Direct comparison of small RNA and transcription factor signalling. Nucleic Acids Res 40(15):7269–7279

    Article  Google Scholar 

  • International Human Genome Sequencing Consortium (2001) Initial sequencing and analysis of the human genome. Nature 409:860–921

    Article  Google Scholar 

  • Mandin P (2012) Genetic screens to identify bacterial sRNA regulators. Methods Mol Biol 905:41–60

    Google Scholar 

  • Mizoguchi H, Mori H, Fujio T (2007) Escherichia Coli minimum genome factory. Biotechnol. Appl. Biochem. 46:157–167

    Article  Google Scholar 

  • Navarro G, Mäkinen V (2007) Compressed full-text indexes. ACM Comput Surv 39(1):2

    Article  Google Scholar 

  • Poliseno L (2012) Pseudogenes: newly discovered players in human cancer. Sci Signal 5(242):5. doi:10.1186/gb-2012-13-8-r77

    Article  Google Scholar 

  • Poliseno L, Salmena L, Zhang J et al (2010) A coding-independent function of gene and pseudogene mRNAs regulates tumour biology. Nature 465(7301):1033–8

    Article  Google Scholar 

  • Searls DB (2002) The language of genes. Nature 420:211–217

    Article  Google Scholar 

  • Searls DB (2010) Molecules. Lang Autom LNAI 6339:5–10

    Google Scholar 

  • Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T (2003) Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 13(11):2498–2504

    Article  Google Scholar 

  • Sharma CM, Vogel J (2009) Experimental approaches for the discovery and characterization of regulatory small RNA. Curr Opin Microbiol 12:536–546

    Article  Google Scholar 

  • Tay Y, Kats L, Salmena L, Weiss D, Tan SM, Ala U, Karreth F, Poliseno L, Provero P, Di Cunto F, Lieberman J, Rigoutsos I, Pandolfi PP (2011) Coding-independent regulation of the tumor suppressor PTEN by competing endogenous mRNAs. Cell 147(2):344–357

    Article  Google Scholar 

  • Vinga S, Almeida J (2003) Alignment-free sequence comparison—a review. Bioinformatics 19(4):513–523

    Article  Google Scholar 

  • Vinga S, Almeida J (2007) Local Renyi entropic profiles of DNA sequences. BMC Bioinform 8:393

    Article  Google Scholar 

  • Wagner EGH, Simon RW (1994) Antisense RNA control in bacteria, phages, and plasmids. Annu Rev Microbiol 48:713–742

    Article  Google Scholar 

  • Wu et al (2010) Modularity of Escherichia coli sRNA regulation revealed by sRNA-target and protein network analysis. BMC Bioinform 11(Suppl 7):S11

    Article  Google Scholar 

  • Zhou F, Olman V, Xu Y (2008) Barcodes for genomes and applications. BMC Bioinform 9:546

    Article  Google Scholar 

Download references

Acknowledgments

The first author has been financially supported by CBMC (Center for Biomedical Computing), in Verona, Italy, which also provided us with the server where all the computations were performed. All the authors are grateful for numerous and detailed improvements suggested by anonymous referees, and inspiring discussions on Infogenomic approach with V. Manca.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alberto Castellini.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Castellini, A., Franco, G. & Milanese, A. A genome analysis based on repeat sharing gene networks. Nat Comput 14, 403–420 (2015). https://doi.org/10.1007/s11047-014-9437-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11047-014-9437-6

Keywords

Navigation