Abstract
Metagenomic binning refers to the means of clustering or assigning taxonomy to metagenomic sequences or contigs. Due to the massive abundance of organisms in metagenomic samples, the number of nucleotide sequences skyrockets, and thus leading to the complexity of binning algorithms. Unsupervised classification is gaining a reputation in recent years since the lacking of the reference database required in the reference-based methods with various state-of-the-art tools released. By manipulating the overlapping information between reads drives to the success of various unsupervised methods with extraordinary accuracy. These research practices on the evidence that the average proportion of common l-mers between genomes of different species is practically miniature when l is sufficient. This paper introduces a novel algorithm for binning metagenomic sequences without requiring reference databases by utilizing highly connected components inside a weighted overlapping graph of reads. Experimental outcomes show that the precision is improved over other well-known binning tools for both short and long sequences.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–410 (1990)
Chor, B., Horn, D., Goldman, N., Levy, Y., Massingham, T.: Genomic DNA k-mer spectra: models and modalities. Genome Biol. 10(10), R108 (2009)
National Research Council: The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet. National Academies Press (2007)
Girotto, S., Pizzi, C., Comin, M.: MetaProb: accurate metagenomic reads binning based on probabilistic sequence signatures. Bioinformatics 32(17), i567–i575 (2016)
Huson, D.H., Auch, A.F., Qi, J., Schuster, S.C.: Megan analysis of metagenomic data. Genome Res. 17(3), 377–386 (2007)
Huson, D.H., et al.: Megan community edition - interactive exploration and analysis of large-scale microbiome sequencing data. PLoS Comput. Biol. 12(6), 1–12 (2016)
Kelley, D.R., Salzberg, S.L.: Clustering metagenomic sequences with interpolated markov models. BMC Bioinform. 11(1), 544 (2010)
Kent, W.J.: Blat-the blast-like alignment tool. Genome Res. 12(4), 656–664 (2002)
Kislyuk, A., Bhatnagar, S., Dushoff, J., Weitz, J.S.: Unsupervised statistical clustering of environmental shotgun sequences. BMC Bioinform. 10(1), 316 (2009)
Qiao, Y., Jia, B., Hu, Z., Sun, C., Xiang, Y., Wei, C.: Metabing2: a fast and accurate metagenomic sequence classification system for samples with many unknown organisms. Biol. Direct 13(1), 15 (2018)
Richter, D.C., Ott, F., Auch, A.F., Schmid, R., Huson, D.H.: Metasim-a sequencing simulator for genomics and metagenomics. PLoS ONE 3(10), e3373 (2008)
Roumpeka, D.D., Wallace, R.J., Escalettes, F., Fotheringham, I., Watson, M.: A review of bioinformatics tools for bio-prospecting from metagenomic sequence data. Front. Genet. 8, 23 (2017)
Shendure, J., Ji, H.: Next-generation DNA sequencing. Nat. Biotechnol. 26(10), 1135 (2008)
Tausch, S.H., et al.: Livekraken—real-time metagenomic classification of illumina data. Bioinformatics 34(21), 3750–3752 (2018)
Van Le, V., Van Tran, L., Van Tran, H.: A novel semi-supervised algorithm for the taxonomic assignment of metagenomic reads. BMC Bioinform. 17(1), 22 (2016)
Vinh, L.V., Lang, T.V., Binh, L.T., Hoai, T.V.: A two-phase binning algorithm using l-mer frequency on groups of non-overlapping reads. Algorithms Mol. Biol. 10(1), 2 (2015)
Wang, Y., Leung, H.C., Yiu, S.M., Chin, F.Y.: Metacluster 5.0: a two-round binning approach for metagenomic data for low-abundance species in a noisy sample. Bioinformatics 28(18), i356–i362 (2012)
Wood, D.E., Salzberg, S.L.: Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 15(3), R46 (2014)
Wu, Y.W., Simmons, B.A., Singer, S.W.: Maxbin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics 32(4), 605–607 (2015)
Wu, Y.W., Ye, Y.: A novel abundance-based algorithm for binning metagenomic sequences using l-tuples. J. Comput. Biol. 18(3), 523–534 (2011)
Zhou, F., Olman, V., Xu, Y.: Barcodes for genomes and applications. BMC Bioinform. 9(1), 546 (2008)
Acknowledgment
This research is funded by Vietnam National University Ho Chi Minh City (VNU-HCM) under grant number B2019-20-06.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Pham, H.T., Vinh, L.V., Lang, T.V., Tran, V.H. (2019). GMeta: A Novel Algorithm to Utilize Highly Connected Components for Metagenomic Binning. In: Dang, T., Küng, J., Takizawa, M., Bui, S. (eds) Future Data and Security Engineering. FDSE 2019. Lecture Notes in Computer Science(), vol 11814. Springer, Cham. https://doi.org/10.1007/978-3-030-35653-8_35
Download citation
DOI: https://doi.org/10.1007/978-3-030-35653-8_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-35652-1
Online ISBN: 978-3-030-35653-8
eBook Packages: Computer ScienceComputer Science (R0)