GMeta: A Novel Algorithm to Utilize Highly Connected Components for Metagenomic Binning

Pham, Hong Thanh; Vinh, Le Van; Lang, Tran Van; Tran, Van Hoai

doi:10.1007/978-3-030-35653-8_35

Hong Thanh Pham ORCID: orcid.org/0000-0002-9069-1751¹³,
Le Van Vinh¹⁴,
Tran Van Lang¹⁵ &
…
Van Hoai Tran¹²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11814))

Included in the following conference series:

International Conference on Future Data and Security Engineering

1404 Accesses

Abstract

Metagenomic binning refers to the means of clustering or assigning taxonomy to metagenomic sequences or contigs. Due to the massive abundance of organisms in metagenomic samples, the number of nucleotide sequences skyrockets, and thus leading to the complexity of binning algorithms. Unsupervised classification is gaining a reputation in recent years since the lacking of the reference database required in the reference-based methods with various state-of-the-art tools released. By manipulating the overlapping information between reads drives to the success of various unsupervised methods with extraordinary accuracy. These research practices on the evidence that the average proportion of common l-mers between genomes of different species is practically miniature when l is sufficient. This paper introduces a novel algorithm for binning metagenomic sequences without requiring reference databases by utilizing highly connected components inside a weighted overlapping graph of reads. Experimental outcomes show that the precision is improved over other well-known binning tools for both short and long sequences.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–410 (1990)
Article Google Scholar
Chor, B., Horn, D., Goldman, N., Levy, Y., Massingham, T.: Genomic DNA k-mer spectra: models and modalities. Genome Biol. 10(10), R108 (2009)
Article Google Scholar
National Research Council: The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet. National Academies Press (2007)
Google Scholar
Girotto, S., Pizzi, C., Comin, M.: MetaProb: accurate metagenomic reads binning based on probabilistic sequence signatures. Bioinformatics 32(17), i567–i575 (2016)
Article Google Scholar
Huson, D.H., Auch, A.F., Qi, J., Schuster, S.C.: Megan analysis of metagenomic data. Genome Res. 17(3), 377–386 (2007)
Article Google Scholar
Huson, D.H., et al.: Megan community edition - interactive exploration and analysis of large-scale microbiome sequencing data. PLoS Comput. Biol. 12(6), 1–12 (2016)
Article Google Scholar
Kelley, D.R., Salzberg, S.L.: Clustering metagenomic sequences with interpolated markov models. BMC Bioinform. 11(1), 544 (2010)
Article Google Scholar
Kent, W.J.: Blat-the blast-like alignment tool. Genome Res. 12(4), 656–664 (2002)
Article Google Scholar
Kislyuk, A., Bhatnagar, S., Dushoff, J., Weitz, J.S.: Unsupervised statistical clustering of environmental shotgun sequences. BMC Bioinform. 10(1), 316 (2009)
Article Google Scholar
Qiao, Y., Jia, B., Hu, Z., Sun, C., Xiang, Y., Wei, C.: Metabing2: a fast and accurate metagenomic sequence classification system for samples with many unknown organisms. Biol. Direct 13(1), 15 (2018)
Article Google Scholar
Richter, D.C., Ott, F., Auch, A.F., Schmid, R., Huson, D.H.: Metasim-a sequencing simulator for genomics and metagenomics. PLoS ONE 3(10), e3373 (2008)
Article Google Scholar
Roumpeka, D.D., Wallace, R.J., Escalettes, F., Fotheringham, I., Watson, M.: A review of bioinformatics tools for bio-prospecting from metagenomic sequence data. Front. Genet. 8, 23 (2017)
Article Google Scholar
Shendure, J., Ji, H.: Next-generation DNA sequencing. Nat. Biotechnol. 26(10), 1135 (2008)
Article Google Scholar
Tausch, S.H., et al.: Livekraken—real-time metagenomic classification of illumina data. Bioinformatics 34(21), 3750–3752 (2018)
Article Google Scholar
Van Le, V., Van Tran, L., Van Tran, H.: A novel semi-supervised algorithm for the taxonomic assignment of metagenomic reads. BMC Bioinform. 17(1), 22 (2016)
Article Google Scholar
Vinh, L.V., Lang, T.V., Binh, L.T., Hoai, T.V.: A two-phase binning algorithm using l-mer frequency on groups of non-overlapping reads. Algorithms Mol. Biol. 10(1), 2 (2015)
Article Google Scholar
Wang, Y., Leung, H.C., Yiu, S.M., Chin, F.Y.: Metacluster 5.0: a two-round binning approach for metagenomic data for low-abundance species in a noisy sample. Bioinformatics 28(18), i356–i362 (2012)
Article Google Scholar
Wood, D.E., Salzberg, S.L.: Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 15(3), R46 (2014)
Article Google Scholar
Wu, Y.W., Simmons, B.A., Singer, S.W.: Maxbin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics 32(4), 605–607 (2015)
Article Google Scholar
Wu, Y.W., Ye, Y.: A novel abundance-based algorithm for binning metagenomic sequences using l-tuples. J. Comput. Biol. 18(3), 523–534 (2011)
Article MathSciNet Google Scholar
Zhou, F., Olman, V., Xu, Y.: Barcodes for genomes and applications. BMC Bioinform. 9(1), 546 (2008)
Article Google Scholar

Download references

Acknowledgment

This research is funded by Vietnam National University Ho Chi Minh City (VNU-HCM) under grant number B2019-20-06.

Author information

Authors and Affiliations

Faculty of Computer Science and Engineering, Ho Chi Minh City University of Technology, Vietnam National University Ho Chi Minh City, Ho Chi Minh City, Vietnam
Van Hoai Tran
Information Technology Office, Hoa Sen University, Ho Chi Minh City, Vietnam
Hong Thanh Pham
Faculty of Information Technology, Ho Chi Minh City University of Technology and Education, Ho Chi Minh City, Vietnam
Le Van Vinh
Institute of Applied Mechanics and Informatics, Vietnam Academy of Science and Technology (VAST), Hanoi, Vietnam
Tran Van Lang

Authors

Hong Thanh Pham
View author publications
You can also search for this author in PubMed Google Scholar
Le Van Vinh
View author publications
You can also search for this author in PubMed Google Scholar
Tran Van Lang
View author publications
You can also search for this author in PubMed Google Scholar
Van Hoai Tran
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tran Van Lang .

Editor information

Editors and Affiliations

Ho Chi Minh City University of Technology, Ho Chi Minh City, Vietnam
Tran Khanh Dang
Johannes Kepler Universität Linz, Linz, Austria
Josef Küng
Hosei University, Tokyo, Japan
Makoto Takizawa
Telecommunications University, Nha Trang City, Vietnam
Son Ha Bui

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pham, H.T., Vinh, L.V., Lang, T.V., Tran, V.H. (2019). GMeta: A Novel Algorithm to Utilize Highly Connected Components for Metagenomic Binning. In: Dang, T., Küng, J., Takizawa, M., Bui, S. (eds) Future Data and Security Engineering. FDSE 2019. Lecture Notes in Computer Science(), vol 11814. Springer, Cham. https://doi.org/10.1007/978-3-030-35653-8_35

Download citation

DOI: https://doi.org/10.1007/978-3-030-35653-8_35
Published: 20 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-35652-1
Online ISBN: 978-3-030-35653-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics