Skip to main content
Log in

MetaG: a graph-based metagenomic gene analysis for big DNA data

  • Original Article
  • Published:
Network Modeling Analysis in Health Informatics and Bioinformatics Aims and scope Submit manuscript

Abstract

Microbial interactions and relationships are significant for animals, insects and plants. Metagenomic research enables properassessments and analysis for microbial organs and communities. The analysis helps to gain detailed insights on miscopies insects. Recent machine learning techniques focused on algorithms and data mining tools to check the depth of interactions and relationships on metagenomic dataset. Accurate analysis over large genes helps to solve real-world problems for public interest. In this regard, graph-centric big gene dataset representations are very important. De Bruijn graph is one the pivotal media to demonstrate the relationships and interactions of large genes dataset or metagenomic dataset. In this research, mapping-based metagenomic graphical (MetaG) genomes representation has been demonstrated. Data cleaning is done before applying graphical illustration. Random mapping is used to assess the variations in dataset. Euler path-based De Bruijn graph is used to sketch the gene annotation, translations, signaling and coding. This research helps in computational biology to map the genomic information in graphical ways with clear conceptions. Adequate experimental comparisons as well as analysis established the claims with tables and graphs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  • Abubucker S et al (2012) Metabolic reconstruction for metagenomic data and its application to the human microbiome. PLoS Comput Biol 8:e1002358

    Article  Google Scholar 

  • Ayyala DN, Lin S (2015) GrammR: graphical representation and modeling of count data with application in metagenomics. Bioinformatics 31(10):1648–1654

    Article  Google Scholar 

  • Basford KE, McLachlan GJ, Rathnayake SI (2013) On the classification of microarray gene-expression data. Brief Bioinform 14(4):402–410

    Article  Google Scholar 

  • Bazinet A, Cummings M (2012) A comparative evaluation of sequence classification programs. BMC Bioinform 13:1–13

    Article  Google Scholar 

  • Besemer J, Borodovsky M (1999) Heuristic approach to deriving models for gene finding. Nucleic Acids Res 27(19):3911–3920

    Article  Google Scholar 

  • Bicego M, Lovato P, Perina A, Fasoli M, Delledonne M, Pezzotti M et al (2012) Investigating topic models’ capabilities in expression microarray data classification. IEEE/ACM Trans Comput Biol Bioinform (TCBB) 9(6):1831–1836

    Article  Google Scholar 

  • Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114–2120

    Article  Google Scholar 

  • Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A (2012) An ensemble of filters and classifiers for microarray data classification. Pattern Recogn 45(1):531–539

    Article  Google Scholar 

  • Brown CT (2015) Strain recovery from metagenomes. Nat Biotechnol 33:1041–1043

    Article  Google Scholar 

  • Brown CT, Hug LA, Thomas BC, Sharon I, Castelle CJ, Singh A et al (2015) Unusual biology across a group comprising more than 15% of domain bacteria. Nature 523:208–211

    Article  Google Scholar 

  • Brum JR, Ignacio-Espinoza JC, Roux S, Doulcier G, Acinas SG, Alberti A, Chaffron S, Cruaud C, de Vargas C, Gasol JM et al (2015) Ocean plankton. Patterns and ecological drivers of ocean viral communities. Science 348:1261498

    Article  Google Scholar 

  • Chang Z et al (2015a) Bridger: a new framework for de novo transcriptome assembly using RNA-seq data. Genome Biol 16:30

    Article  Google Scholar 

  • Chang Z, Li G, Li J, Zhang Y, Ashby C, Liu D, Cramer C, Huang X (2015b) Bridger: a new framework for de novo transcriptome assembly using RNA-seq data. Genome Biol 16:30

    Article  Google Scholar 

  • Chopra P, Lee J, Kang J, Lee S (2010) Improving cancer classification accuracy using gene pairs. PLoS One 5(12):e14305

    Article  Google Scholar 

  • De Cruz P, Kang S, Wagner J, Buckley M, Sim WH, Prideaux L et al (2015) Association between specific mucosa-associated microbiota in Crohn’s disease at the time of resection and subsequent disease recurrence: a pilot study. J Gastroenterol Hepatol 30:268–278

    Article  Google Scholar 

  • De Vargas C, Audic S, Henry N, Decelle J, Mahé F, Logares R, Lara E, Berney C, Le Bescot N, Probert I et al (2015) Ocean plankton. Eukaryotic plankton diversity in the sunlit ocean. Science 348:1261605

    Article  Google Scholar 

  • Deng X, Naccache SN, Ng T, Federman S, Li L, Chiu CY et al (2015) An ensemble strategy that significantly improves de novo assembly of microbial genomes from metagenomic next-generation sequencing data. Nucleic Acids Res 43(7):e46

    Article  Google Scholar 

  • Eikmeyer FG, Rademacher A, Hanreich A, Hennig M, Jaenicke S, Maus I, Wibberg D, Zakrzewski M, Pühler A, Klocke M (2013) Detailed analysis of metagenome datasets obtained from biogas-producing microbial communities residing in biogas reactors does not indicate the presence of putative pathogenic microorganisms. Biotechnol Biofuels 6(1):49

    Article  Google Scholar 

  • Forster SC, Lawley TD (2015) Systematic discovery of probiotics. Nat Biotechnol 33:47–49

    Article  Google Scholar 

  • Franzosa EA et al (2014) Relating the metatranscriptome and metagenome of the human gut. Proc Natl Acad Sci USA 111:E2329–E2338

    Article  Google Scholar 

  • Gibbons SM, Schwartz T, Fouquier J, Mitchell M, Sangwan N, Gilbert JA et al (2015) Ecological succession and viability of human-associated microbiota on restroom surfaces. Appl Environ Microbiol 81:765–773

    Article  Google Scholar 

  • Gilbert JA, Jansson JK, Knight R (2014) The Earth Microbiome project: successes and aspirations. BMC Biol 12:69. doi:10.1186/s12915-014-0069-1

    Article  Google Scholar 

  • Giugno R, Pulvirenti A, Cascione L, Pigola G, Ferro A (2013) MIDClass: microarray data classification by association rules and gene expression intervals. PLoS One 8(8):e69873

    Article  Google Scholar 

  • Hernandez D (2008) De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer. Genome Res 18:802–809

    Article  Google Scholar 

  • Hoff KJ, Lingner T, Meinicke P, Tech M (2009) Orphelia: predicting genes in metagenomic sequencing reads. Nucleic Acids Res 37:W101–W105 (Web Server)

    Article  Google Scholar 

  • Hsiao A, Ahmed AM, Subramanian S, Griffin NW, Drewry LL, Petri WA Jr, Haque R, Ahmed T, Gordon JI (2014) Members of the human gut microbiota involved in recovery from Vibrio cholerae infection. Nature 515:423–426

    Article  Google Scholar 

  • Huang K, Brady A, Mahurkar A, White O, Gevers D, Huttenhower C, Segata N (2014) MetaRef: a pan-genomic database for comparative and community microbial genomics. Nucleic Acids Res 42:D617–D624

    Article  Google Scholar 

  • Hultman J, Waldrop MP, Mackelprang R, David MM, McFarland J, Blazewicz SJ et al (2015) Multi-omics of permafrost, active layer and thermokarst bog soil microbiomes. Nature 521:208–212

    Article  Google Scholar 

  • Hunter S, Corbett M, Denise H, Fraser M, Gonzalez-Beltran A, Hunter C, Jones P, Leinonen R, McAnulla C, Maguire E et al (2014) EBI metagenomics—a new resource for the analysis and archiving of metagenomic data. Nucleic Acids Res 42:D600–D606

    Article  Google Scholar 

  • Huson DH et al (2011) Integrative analysis of environmental sequences using MEGAN4. Genome Res 21:1552–1560

    Article  Google Scholar 

  • Ives Z, Alon Y, Mork P, Tatarinov I (2004) Piazza: mediation and integration infrastructure for semantic web data. J Web Sem 1(2):155–175

    Article  Google Scholar 

  • Jing X-Y, Zhang D, Tang Y-Y (2004) An improved LDA approach. IEEE Trans Syst Man Cybern B Cybern 34(5):1942–1951

    Article  Google Scholar 

  • Kang DD, Froula J, Egan R, Wang Z (2015) MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities. PeerJ 3:e1165

    Article  Google Scholar 

  • Kopf A, Bicak M, Kottmann R, Schnetzer J, Kostadinov I, Lehmann K, Fernandez-Guerra A, Jeanthon C, Rahav E, Ullrich M et al (2015) The ocean sampling day consortium. Gigascience 4:27

    Article  Google Scholar 

  • Leung Y, Hung Y (2010) A multiple-filter-multiple-wrapper approach to gene selection and microarray data classification. IEEE/ACM Trans Comput Biol Bioinform (TCBB) 7(1):108–117

    Article  Google Scholar 

  • Leimena MM et al (2013) A comprehensive metatranscriptome analysis pipeline and its validation using human small intestine microbiota datasets. BMC Genom 14:530

    Article  Google Scholar 

  • Lima-Mendez G, Faust K, Henry N, Decelle J, Colin S, Carcillo F, Chaffron S, Ignacio-Espinosa JC, Roux S, Vincent F et al (2015) Ocean plankton. Determinants of community structure in the global plankton interactome. Science 348(6237):1262073

  • Liu H, Liu L, Zhang H (2010a) Ensemble gene selection by grouping for microarray data classification. J Biomed Inform 43(1):81–87

    Article  Google Scholar 

  • Liu H, Liu L, Zhang H (2010b) Ensemble gene selection for cancer classification. Pattern Recogn 43(8):2763–2772

    Article  Google Scholar 

  • Lozupone C, Lladser ME, Knights D, Stombaugh J, Knight R (2011) UniFrac: an effective distance metric for microbial community comparison. ISME J 5(2):169–172

    Article  Google Scholar 

  • Lenzerini M (2002) Data integration: a theoretical perspective. Proc ACM PODS, Madison, WI, pp 233–246

  • Lu H, Qian G, Ren Z et al (2015) Alterations of Bacteroides sp., Neisseria sp., Actinomyces sp., and Streptococcus sp. populations in the oropharyngeal microbiome are associated with liver cirrhosis and pneumonia. BMC Infect Dis 15(1):239

    Article  Google Scholar 

  • Markowitz VM, Chen IM, Palaniappan K, Chu K, Szeto E, Pillay M, Ratner A, Huang J, Woyke T, Huntemann M et al (2014) IMG 4 version of the integrated microbial genomes comparative analysis system. Nucleic Acids Res 42:D560–D567

    Article  Google Scholar 

  • Maurice CF, Haiser HJ, Turnbaugh PJ (2013) Xenobiotics shape the physiology and gene expression of the active human gut microbiome. Cell 152(1–2):39–50

    Article  Google Scholar 

  • McNulty NP et al (2011) The impact of a consortium of fermented milk strains on the gut microbiome of gnotobiotic mice and monozygotic twins. Sci Transl Med 3(106):ra106

    Article  Google Scholar 

  • Meyer F et al (2008) The metagenomics RAST server—a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinform 9:386

    Article  Google Scholar 

  • Mitchell A, Chang H-Y, Daugherty L, Fraser M, Hunter S, Lopez R, McAnulla C, McMenamin C, Nuka G, Pesseat S et al (2015) The InterPro protein families database: the classification resource after 15 years. Nucleic Acids Res 43:D213–D221

    Article  Google Scholar 

  • Mochizuki H, Nakamura K, Sato H, Goto-Koshino Y, Sato M, Takahashi M, Fujino Y, Ohno K (2011) Multiplex PCR and Genescan analysis to detect immunoglobulin heavy chain gene rearrangement in feline B-cell neoplasms. Vet Immunol Immunopathol 143(2011):38–45

    Article  Google Scholar 

  • Noguchi H, Park J, Takagi T (2006) MetaGene: prokaryotic gene finding from environmental genome shotgun sequences. Nucleic Acids Res 34(19):5623–5630

    Article  Google Scholar 

  • Noguchi H, Taniguchi T, Itoh T (2008) Meta gene annotator: detecting species-specific patterns of ribosomal binding site for precise gene prediction in anonymous prokaryotic and phage genomes. DNA Res 15(6):387–396

    Article  Google Scholar 

  • Li P, Yang C, Xie J et al (2015) Acinetobacter calcoaceticus from a fatal case of pneumonia harboring blaNDM-1 on a widely distributed plasmid. BMC Infect Dis 15(131)

  • Parra G, Blanco E, Guigo R (2000) GeneID in Drosophila. Genome Res 10:511–515

    Article  Google Scholar 

  • Carreira P, Helena G (2004) Execution of data mappers. Proc ACM SIGMOD workshop IQIS, Paris, France, pp 2–9

  • Pylro VS, Roesch L, Ortega JM, do Amaral AM (2014) Brazilian microbiome project: revealing the unexplored microbial diversity challenges and prospects. Microb Ecol 67:237–241. doi:10.1007/s00248-013-0302-4

    Article  Google Scholar 

  • Raman V, Joseph MH (2001) Potter’s Wheel: an interactive data cleaning system. Proc VLDB Conf, Roma, Italy, pp 381–390

  • Reboiro-Jato M, Arrais JP, Oliveira JL, Fdez-Riverola F (2014) geneCommittee: a web-based tool for extensively testing the discriminatory power of biologically relevant gene sets in microarray data classification. BMC Bioinform 15(1):31

    Article  Google Scholar 

  • Reddy TBK, Thomas AD, Stamatis D, Bertsch J, Isbandi M, Jansson J, Mallajosyula J, Pagani I, Lobos EA, Kyrpides NC (2015) The Genomes OnLine Database (GOLD) v. 5: a metadata management system based on a four level (meta)genome project classification. Nucleic Acids Res 43:D1099–D1106

    Article  Google Scholar 

  • Rusch DB, Halpern AL, Sutton G, Heidelberg KB, Williamson S, Yooseph S, Wu D, Eisen JA, Hoffman JM, Remington K et al (2007) The Sorcerer II global ocean sampling expedition: northwest Atlantic through eastern tropical Pacific. PLoS Biol 5:e77

    Article  Google Scholar 

  • Sangwan N, Xia F, Gilbert JA (2016) Recovering complete and draft population genomes from metagenome datasets. Microbiome 4:8

    Article  Google Scholar 

  • Sato K, Sakakibara Y (2015) MetaVelvet-SL: an extension of the Velvet assembler to a de novo metagenomic assembler utilizing supervised learning. DNA Res 22(1):69–77

    Article  Google Scholar 

  • Schlüter A, Bekel T, Diaz NN, Dondrup M, Eichenlaub R, Gartemann K-H, Krahn I, Krause L, Krömeke H, Kruse O (2008) The metagenome of a biogas-producing microbial community of a production-scale biogas plant fermenter analysed by the 454-pyrosequencing technology. J Biotechnol 136(1):77–90

    Article  Google Scholar 

  • Sharma VK, Kumar N, Prakash T, Taylor TD (2010) MetaBioME: a database to explore commercially useful enzymes in metagenomic datasets. Nucleic Acids Res 38:D468–D472

    Article  Google Scholar 

  • Silvester N, Alako B, Amid C, Cerdeno-Tarraga A, Cleland I, Gibson R, Goodgame N, Ten Hoopen P, Kay S, Leinonen R et al (2015) Content discovery and retrieval services at the European Nucleotide Archive. Nucleic Acids Res 43:D23–D29

    Article  Google Scholar 

  • Sunagawa S, Coelho LP, Chaffron S, Kultima JR, Labadie K, Salazar G, Djahanschiri B, Zeller G, Mende DR, Alberti A et al (2015) Ocean plankton. Structure and function of the global ocean microbiome. Science 348:1261359

    Article  Google Scholar 

  • Freitas TAK, Li PE, Scholz MB, Chain PSG (2015) Accurate read-based metagenome characterization using a hierarchical suite of unique signatures. Nucleic Acids Res 1. doi:10.1093/nar/gkv180

  • Ten Hoopen P, Pesant S, Kottmann R, Kopf A, Bicak M, Claus S, Deneudt K, Borremans C, Thijsse P, Dekeyzer S et al (2015) Marine microbial biodiversity, bioinformatics and biotechnology (M2B3) data reporting and service standards. Stand Genomic Sci. 10:20

    Article  Google Scholar 

  • Villar E, Farrant GK, Follows M, Garczarek L, Speich S, Audic S, Bittner L, Blanke B, Brum JR, Brunet C et al (2015) Ocean plankton. Environmental characteristics of Agulhas rings affect interocean plankton transport. Science 348:1261447

    Article  Google Scholar 

  • Wang S, Cho H, Zhai CX, Berger B, Peng J (2015) Exploiting ontology graph for predicting sparsely annotated gene function. Bioinformatics 31:i357–i364

    Article  Google Scholar 

  • Wirth R, Kovács E, Maróti G, Bagi Z, Rákhely G, Kovács KL (2012) Characterization of a biogas-producing microbial community by short-read next generation DNA sequencing. Biotechnol Biofuels 5(1):41

    Article  Google Scholar 

  • Wu MY, Dai DQ, Shi Y, Yan H, Zhang XF (2012) Biomarker identification and cancer classification based on microarray data using laplace naive Bayes model with mean shrinkage. IEEE/ACM Trans Comput Biol Bioinform (TCBB) 9(6):1649–1662

    Article  Google Scholar 

  • Wu Y-W, Simmons BA, Singer SW (2016) MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics 32(4):605–607

    Article  Google Scholar 

  • Xu K, Cui J, Olman V, Yang Q, Puett D, Xu Y (2010) A comparative analysis of gene-expression data of multiple cancer types. PLoS One 5(10):e13696

    Article  Google Scholar 

  • Rahm E, Philip A (2001) A survey of approaches to automatic schema matching. VLDB J 10(4):334–350

    Article  MATH  Google Scholar 

  • Wang Y, Li R, Zhou Y, Ling Z, Guo X, Xie L, Liu L (2016) Motif-based text mining of microbial metagenome redundancy profiling data for disease classification. BioMed Res Int 2016: 11 pages (Article ID 6598307)

  • Yinan W, Renner DW, Albert I, Szpara ML (2015) VirAmp: a galaxy-based viral genome assembly pipeline. GigaScience 4:19

    Article  Google Scholar 

  • Yuzhen Y, Haixu T (2015) Utilizing de Bruijn graph of metagenome assembly for metatranscriptome analysis. Bioinformatics 32(7):1001–1008

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Linkon Chowdhury.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chowdhury, L., Khan, M.I., Deb, K. et al. MetaG: a graph-based metagenomic gene analysis for big DNA data. Netw Model Anal Health Inform Bioinforma 5, 27 (2016). https://doi.org/10.1007/s13721-016-0132-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13721-016-0132-7

Keywords

Navigation