Abstract
Microbial interactions and relationships are significant for animals, insects and plants. Metagenomic research enables properassessments and analysis for microbial organs and communities. The analysis helps to gain detailed insights on miscopies insects. Recent machine learning techniques focused on algorithms and data mining tools to check the depth of interactions and relationships on metagenomic dataset. Accurate analysis over large genes helps to solve real-world problems for public interest. In this regard, graph-centric big gene dataset representations are very important. De Bruijn graph is one the pivotal media to demonstrate the relationships and interactions of large genes dataset or metagenomic dataset. In this research, mapping-based metagenomic graphical (MetaG) genomes representation has been demonstrated. Data cleaning is done before applying graphical illustration. Random mapping is used to assess the variations in dataset. Euler path-based De Bruijn graph is used to sketch the gene annotation, translations, signaling and coding. This research helps in computational biology to map the genomic information in graphical ways with clear conceptions. Adequate experimental comparisons as well as analysis established the claims with tables and graphs.
Similar content being viewed by others
References
Abubucker S et al (2012) Metabolic reconstruction for metagenomic data and its application to the human microbiome. PLoS Comput Biol 8:e1002358
Ayyala DN, Lin S (2015) GrammR: graphical representation and modeling of count data with application in metagenomics. Bioinformatics 31(10):1648–1654
Basford KE, McLachlan GJ, Rathnayake SI (2013) On the classification of microarray gene-expression data. Brief Bioinform 14(4):402–410
Bazinet A, Cummings M (2012) A comparative evaluation of sequence classification programs. BMC Bioinform 13:1–13
Besemer J, Borodovsky M (1999) Heuristic approach to deriving models for gene finding. Nucleic Acids Res 27(19):3911–3920
Bicego M, Lovato P, Perina A, Fasoli M, Delledonne M, Pezzotti M et al (2012) Investigating topic models’ capabilities in expression microarray data classification. IEEE/ACM Trans Comput Biol Bioinform (TCBB) 9(6):1831–1836
Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114–2120
Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A (2012) An ensemble of filters and classifiers for microarray data classification. Pattern Recogn 45(1):531–539
Brown CT (2015) Strain recovery from metagenomes. Nat Biotechnol 33:1041–1043
Brown CT, Hug LA, Thomas BC, Sharon I, Castelle CJ, Singh A et al (2015) Unusual biology across a group comprising more than 15% of domain bacteria. Nature 523:208–211
Brum JR, Ignacio-Espinoza JC, Roux S, Doulcier G, Acinas SG, Alberti A, Chaffron S, Cruaud C, de Vargas C, Gasol JM et al (2015) Ocean plankton. Patterns and ecological drivers of ocean viral communities. Science 348:1261498
Chang Z et al (2015a) Bridger: a new framework for de novo transcriptome assembly using RNA-seq data. Genome Biol 16:30
Chang Z, Li G, Li J, Zhang Y, Ashby C, Liu D, Cramer C, Huang X (2015b) Bridger: a new framework for de novo transcriptome assembly using RNA-seq data. Genome Biol 16:30
Chopra P, Lee J, Kang J, Lee S (2010) Improving cancer classification accuracy using gene pairs. PLoS One 5(12):e14305
De Cruz P, Kang S, Wagner J, Buckley M, Sim WH, Prideaux L et al (2015) Association between specific mucosa-associated microbiota in Crohn’s disease at the time of resection and subsequent disease recurrence: a pilot study. J Gastroenterol Hepatol 30:268–278
De Vargas C, Audic S, Henry N, Decelle J, Mahé F, Logares R, Lara E, Berney C, Le Bescot N, Probert I et al (2015) Ocean plankton. Eukaryotic plankton diversity in the sunlit ocean. Science 348:1261605
Deng X, Naccache SN, Ng T, Federman S, Li L, Chiu CY et al (2015) An ensemble strategy that significantly improves de novo assembly of microbial genomes from metagenomic next-generation sequencing data. Nucleic Acids Res 43(7):e46
Eikmeyer FG, Rademacher A, Hanreich A, Hennig M, Jaenicke S, Maus I, Wibberg D, Zakrzewski M, Pühler A, Klocke M (2013) Detailed analysis of metagenome datasets obtained from biogas-producing microbial communities residing in biogas reactors does not indicate the presence of putative pathogenic microorganisms. Biotechnol Biofuels 6(1):49
Forster SC, Lawley TD (2015) Systematic discovery of probiotics. Nat Biotechnol 33:47–49
Franzosa EA et al (2014) Relating the metatranscriptome and metagenome of the human gut. Proc Natl Acad Sci USA 111:E2329–E2338
Gibbons SM, Schwartz T, Fouquier J, Mitchell M, Sangwan N, Gilbert JA et al (2015) Ecological succession and viability of human-associated microbiota on restroom surfaces. Appl Environ Microbiol 81:765–773
Gilbert JA, Jansson JK, Knight R (2014) The Earth Microbiome project: successes and aspirations. BMC Biol 12:69. doi:10.1186/s12915-014-0069-1
Giugno R, Pulvirenti A, Cascione L, Pigola G, Ferro A (2013) MIDClass: microarray data classification by association rules and gene expression intervals. PLoS One 8(8):e69873
Hernandez D (2008) De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer. Genome Res 18:802–809
Hoff KJ, Lingner T, Meinicke P, Tech M (2009) Orphelia: predicting genes in metagenomic sequencing reads. Nucleic Acids Res 37:W101–W105 (Web Server)
Hsiao A, Ahmed AM, Subramanian S, Griffin NW, Drewry LL, Petri WA Jr, Haque R, Ahmed T, Gordon JI (2014) Members of the human gut microbiota involved in recovery from Vibrio cholerae infection. Nature 515:423–426
Huang K, Brady A, Mahurkar A, White O, Gevers D, Huttenhower C, Segata N (2014) MetaRef: a pan-genomic database for comparative and community microbial genomics. Nucleic Acids Res 42:D617–D624
Hultman J, Waldrop MP, Mackelprang R, David MM, McFarland J, Blazewicz SJ et al (2015) Multi-omics of permafrost, active layer and thermokarst bog soil microbiomes. Nature 521:208–212
Hunter S, Corbett M, Denise H, Fraser M, Gonzalez-Beltran A, Hunter C, Jones P, Leinonen R, McAnulla C, Maguire E et al (2014) EBI metagenomics—a new resource for the analysis and archiving of metagenomic data. Nucleic Acids Res 42:D600–D606
Huson DH et al (2011) Integrative analysis of environmental sequences using MEGAN4. Genome Res 21:1552–1560
Ives Z, Alon Y, Mork P, Tatarinov I (2004) Piazza: mediation and integration infrastructure for semantic web data. J Web Sem 1(2):155–175
Jing X-Y, Zhang D, Tang Y-Y (2004) An improved LDA approach. IEEE Trans Syst Man Cybern B Cybern 34(5):1942–1951
Kang DD, Froula J, Egan R, Wang Z (2015) MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities. PeerJ 3:e1165
Kopf A, Bicak M, Kottmann R, Schnetzer J, Kostadinov I, Lehmann K, Fernandez-Guerra A, Jeanthon C, Rahav E, Ullrich M et al (2015) The ocean sampling day consortium. Gigascience 4:27
Leung Y, Hung Y (2010) A multiple-filter-multiple-wrapper approach to gene selection and microarray data classification. IEEE/ACM Trans Comput Biol Bioinform (TCBB) 7(1):108–117
Leimena MM et al (2013) A comprehensive metatranscriptome analysis pipeline and its validation using human small intestine microbiota datasets. BMC Genom 14:530
Lima-Mendez G, Faust K, Henry N, Decelle J, Colin S, Carcillo F, Chaffron S, Ignacio-Espinosa JC, Roux S, Vincent F et al (2015) Ocean plankton. Determinants of community structure in the global plankton interactome. Science 348(6237):1262073
Liu H, Liu L, Zhang H (2010a) Ensemble gene selection by grouping for microarray data classification. J Biomed Inform 43(1):81–87
Liu H, Liu L, Zhang H (2010b) Ensemble gene selection for cancer classification. Pattern Recogn 43(8):2763–2772
Lozupone C, Lladser ME, Knights D, Stombaugh J, Knight R (2011) UniFrac: an effective distance metric for microbial community comparison. ISME J 5(2):169–172
Lenzerini M (2002) Data integration: a theoretical perspective. Proc ACM PODS, Madison, WI, pp 233–246
Lu H, Qian G, Ren Z et al (2015) Alterations of Bacteroides sp., Neisseria sp., Actinomyces sp., and Streptococcus sp. populations in the oropharyngeal microbiome are associated with liver cirrhosis and pneumonia. BMC Infect Dis 15(1):239
Markowitz VM, Chen IM, Palaniappan K, Chu K, Szeto E, Pillay M, Ratner A, Huang J, Woyke T, Huntemann M et al (2014) IMG 4 version of the integrated microbial genomes comparative analysis system. Nucleic Acids Res 42:D560–D567
Maurice CF, Haiser HJ, Turnbaugh PJ (2013) Xenobiotics shape the physiology and gene expression of the active human gut microbiome. Cell 152(1–2):39–50
McNulty NP et al (2011) The impact of a consortium of fermented milk strains on the gut microbiome of gnotobiotic mice and monozygotic twins. Sci Transl Med 3(106):ra106
Meyer F et al (2008) The metagenomics RAST server—a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinform 9:386
Mitchell A, Chang H-Y, Daugherty L, Fraser M, Hunter S, Lopez R, McAnulla C, McMenamin C, Nuka G, Pesseat S et al (2015) The InterPro protein families database: the classification resource after 15 years. Nucleic Acids Res 43:D213–D221
Mochizuki H, Nakamura K, Sato H, Goto-Koshino Y, Sato M, Takahashi M, Fujino Y, Ohno K (2011) Multiplex PCR and Genescan analysis to detect immunoglobulin heavy chain gene rearrangement in feline B-cell neoplasms. Vet Immunol Immunopathol 143(2011):38–45
Noguchi H, Park J, Takagi T (2006) MetaGene: prokaryotic gene finding from environmental genome shotgun sequences. Nucleic Acids Res 34(19):5623–5630
Noguchi H, Taniguchi T, Itoh T (2008) Meta gene annotator: detecting species-specific patterns of ribosomal binding site for precise gene prediction in anonymous prokaryotic and phage genomes. DNA Res 15(6):387–396
Li P, Yang C, Xie J et al (2015) Acinetobacter calcoaceticus from a fatal case of pneumonia harboring blaNDM-1 on a widely distributed plasmid. BMC Infect Dis 15(131)
Parra G, Blanco E, Guigo R (2000) GeneID in Drosophila. Genome Res 10:511–515
Carreira P, Helena G (2004) Execution of data mappers. Proc ACM SIGMOD workshop IQIS, Paris, France, pp 2–9
Pylro VS, Roesch L, Ortega JM, do Amaral AM (2014) Brazilian microbiome project: revealing the unexplored microbial diversity challenges and prospects. Microb Ecol 67:237–241. doi:10.1007/s00248-013-0302-4
Raman V, Joseph MH (2001) Potter’s Wheel: an interactive data cleaning system. Proc VLDB Conf, Roma, Italy, pp 381–390
Reboiro-Jato M, Arrais JP, Oliveira JL, Fdez-Riverola F (2014) geneCommittee: a web-based tool for extensively testing the discriminatory power of biologically relevant gene sets in microarray data classification. BMC Bioinform 15(1):31
Reddy TBK, Thomas AD, Stamatis D, Bertsch J, Isbandi M, Jansson J, Mallajosyula J, Pagani I, Lobos EA, Kyrpides NC (2015) The Genomes OnLine Database (GOLD) v. 5: a metadata management system based on a four level (meta)genome project classification. Nucleic Acids Res 43:D1099–D1106
Rusch DB, Halpern AL, Sutton G, Heidelberg KB, Williamson S, Yooseph S, Wu D, Eisen JA, Hoffman JM, Remington K et al (2007) The Sorcerer II global ocean sampling expedition: northwest Atlantic through eastern tropical Pacific. PLoS Biol 5:e77
Sangwan N, Xia F, Gilbert JA (2016) Recovering complete and draft population genomes from metagenome datasets. Microbiome 4:8
Sato K, Sakakibara Y (2015) MetaVelvet-SL: an extension of the Velvet assembler to a de novo metagenomic assembler utilizing supervised learning. DNA Res 22(1):69–77
Schlüter A, Bekel T, Diaz NN, Dondrup M, Eichenlaub R, Gartemann K-H, Krahn I, Krause L, Krömeke H, Kruse O (2008) The metagenome of a biogas-producing microbial community of a production-scale biogas plant fermenter analysed by the 454-pyrosequencing technology. J Biotechnol 136(1):77–90
Sharma VK, Kumar N, Prakash T, Taylor TD (2010) MetaBioME: a database to explore commercially useful enzymes in metagenomic datasets. Nucleic Acids Res 38:D468–D472
Silvester N, Alako B, Amid C, Cerdeno-Tarraga A, Cleland I, Gibson R, Goodgame N, Ten Hoopen P, Kay S, Leinonen R et al (2015) Content discovery and retrieval services at the European Nucleotide Archive. Nucleic Acids Res 43:D23–D29
Sunagawa S, Coelho LP, Chaffron S, Kultima JR, Labadie K, Salazar G, Djahanschiri B, Zeller G, Mende DR, Alberti A et al (2015) Ocean plankton. Structure and function of the global ocean microbiome. Science 348:1261359
Freitas TAK, Li PE, Scholz MB, Chain PSG (2015) Accurate read-based metagenome characterization using a hierarchical suite of unique signatures. Nucleic Acids Res 1. doi:10.1093/nar/gkv180
Ten Hoopen P, Pesant S, Kottmann R, Kopf A, Bicak M, Claus S, Deneudt K, Borremans C, Thijsse P, Dekeyzer S et al (2015) Marine microbial biodiversity, bioinformatics and biotechnology (M2B3) data reporting and service standards. Stand Genomic Sci. 10:20
Villar E, Farrant GK, Follows M, Garczarek L, Speich S, Audic S, Bittner L, Blanke B, Brum JR, Brunet C et al (2015) Ocean plankton. Environmental characteristics of Agulhas rings affect interocean plankton transport. Science 348:1261447
Wang S, Cho H, Zhai CX, Berger B, Peng J (2015) Exploiting ontology graph for predicting sparsely annotated gene function. Bioinformatics 31:i357–i364
Wirth R, Kovács E, Maróti G, Bagi Z, Rákhely G, Kovács KL (2012) Characterization of a biogas-producing microbial community by short-read next generation DNA sequencing. Biotechnol Biofuels 5(1):41
Wu MY, Dai DQ, Shi Y, Yan H, Zhang XF (2012) Biomarker identification and cancer classification based on microarray data using laplace naive Bayes model with mean shrinkage. IEEE/ACM Trans Comput Biol Bioinform (TCBB) 9(6):1649–1662
Wu Y-W, Simmons BA, Singer SW (2016) MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics 32(4):605–607
Xu K, Cui J, Olman V, Yang Q, Puett D, Xu Y (2010) A comparative analysis of gene-expression data of multiple cancer types. PLoS One 5(10):e13696
Rahm E, Philip A (2001) A survey of approaches to automatic schema matching. VLDB J 10(4):334–350
Wang Y, Li R, Zhou Y, Ling Z, Guo X, Xie L, Liu L (2016) Motif-based text mining of microbial metagenome redundancy profiling data for disease classification. BioMed Res Int 2016: 11 pages (Article ID 6598307)
Yinan W, Renner DW, Albert I, Szpara ML (2015) VirAmp: a galaxy-based viral genome assembly pipeline. GigaScience 4:19
Yuzhen Y, Haixu T (2015) Utilizing de Bruijn graph of metagenome assembly for metatranscriptome analysis. Bioinformatics 32(7):1001–1008
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chowdhury, L., Khan, M.I., Deb, K. et al. MetaG: a graph-based metagenomic gene analysis for big DNA data. Netw Model Anal Health Inform Bioinforma 5, 27 (2016). https://doi.org/10.1007/s13721-016-0132-7
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13721-016-0132-7