skip to main content
10.1145/3592686.3592742acmotherconferencesArticle/Chapter ViewAbstractPublication PagesbicConference Proceedingsconference-collections
research-article

Viabin: A novel method for Overlapped Binning of Metagenomic Contigs using ZINB-autodecoder and Assembly Graphs: Viabin: a metagenomic contigs binning tool

Published:31 May 2023Publication History

ABSTRACT

Binning the metagenomic contigs into potential genomes is an essential step to investigate and characterize various genetic materials of different species and understand microbial populations. Existing tools have achieved considerable success with a variety of datasets, especially in long-length sequences, while precise classification of short contigs in complex metagenomics remains a challenging task. Since different species may have same sequences in their genomes, a contig sequence can be assigned to multiple species. However, current contig binning tools merely support single label binning. In other words, each contig is assigned to just one species. This article introduced Viabin which can refine the binning results using ZINB-autoencoder and assembly graphs. Viabin used Gaussian mixture model to preliminarily classify the unclassified contigs encoded by autoencoder based on ZINB. Subsequently, assembly graph and sequence feature were combined with the binning results of other tools to complete the final binning. More importantly, it had the capacity to assign contigs to numerous bins. Experimental 3 datasets demonstrated that Viabin could not only improve binning performance of short contig sequences but also support assignment of contigs to multiple bins.

References

  1. Zhang, Z., Wang, J., Wang, J., Wang, J., & Li, Y., 2020. Estimate of the sequenced proportion of the global prokaryotic genome. Microbiome, 8(1), 1-9.Google ScholarGoogle ScholarCross RefCross Ref
  2. Quince, C., Walker, A. W., Simpson, J. T., Loman, N. J., & Segata, N., 2017. Shotgun metagenomics, from sampling to analysis. Nature biotechnology, 35(9), 833-844.Google ScholarGoogle Scholar
  3. Nissen, J. N., Johansen, J., Allesøe, R. L., Sønderby, C. K., Armenteros, J. J. A., Grønbech, C. H., ... & Rasmussen, S., 2021. Improved metagenome binning and assembly using deep variational autoencoders. Nature biotechnology, 39(5), 555-560.Google ScholarGoogle Scholar
  4. Liu, C. C., Dong, S. S., Chen, J. B., Wang, C., Ning, P., Guo, Y., & Yang, T. L., 2022. MetaDecoder: a novel method for clustering metagenomic contigs. Microbiome, 10(1), 1-16.Google ScholarGoogle ScholarCross RefCross Ref
  5. Pan, S., Zhu, C., Zhao, X. M., & Coelho, L. P., 2022. A deep siamese neural network improves metagenome-assembled genomes in microbiome datasets across different environments. Nature communications, 13(1), 1-12.Google ScholarGoogle Scholar
  6. Riesenfeld, C. S., Schloss, P. D., & Handelsman, J., 2004. Metagenomics: genomic analysis of microbial communities. Annual review of genetics, 38(1), 525-552.Google ScholarGoogle Scholar
  7. Chan, C. K. K., Hsu, A. L., Halgamuge, S. K., & Tang, S. L., 2008. Binning sequences using very sparse labels within a metagenome. BMC bioinformatics, 9(1), 1-17.Google ScholarGoogle Scholar
  8. Burton, J. N., Liachko, I., Dunham, M. J., & Shendure, J., 2014. Species-level deconvolution of metagenome assemblies with Hi-C–based contact probability maps. G3: Genes, Genomes, Genetics, 4(7), 1339-1346.Google ScholarGoogle Scholar
  9. Nurk, S., Meleshko, D., Korobeynikov, A., & Pevzner, P. A., 2017. metaSPAdes: a new versatile metagenomic assembler. Genome research, 27(5), 824-834.Google ScholarGoogle Scholar
  10. Li, D., Liu, C. M., Luo, R., Sadakane, K., & Lam, T. W., 2015. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics, 31(10), 1674-1676.Google ScholarGoogle ScholarCross RefCross Ref
  11. Parks, D. H., Chuvochina, M., Chaumeil, P. A., Rinke, C., Mussig, A. J., & Hugenholtz, P., 2020. A complete domain-to-species taxonomy for Bacteria and Archaea. Nature biotechnology, 38(9), 1079-1086.Google ScholarGoogle Scholar
  12. Steinegger, M., & Söding, J., 2017. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nature biotechnology, 35(11), 1026-1028.Google ScholarGoogle Scholar
  13. Mirdita, M., Steinegger, M., Breitwieser, F., Söding, J., & Levy Karin, E., 2021. Fast and sensitive taxonomic assignment to metagenomic contigs. Bioinformatics, 37(18), 3029-3031.Google ScholarGoogle ScholarCross RefCross Ref
  14. Wu, Y. W., Simmons, B. A., & Singer, S. W., 2016. MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics, 32(4), 605-607.Google ScholarGoogle Scholar
  15. Rho, M., Tang, H., & Ye, Y., 2010. FragGeneScan: predicting genes in short and error-prone reads. Nucleic acids research, 38(20), e191-e191.Google ScholarGoogle Scholar
  16. Zhu, X. and Ghahramani, Z., 2002. Learning from labeled and unlabeled data with label propagation. Technical report, School of Computer Science, Carnegie Mellon University.Google ScholarGoogle Scholar
  17. Sevim, V., Lee, J., Egan, R., Clum, A., Hundley, H., Lee, J., ... & Woyke, T., 2019. Shotgun metagenome data of a defined mock community using Oxford Nanopore, PacBio and Illumina technologies. Scientific data, 6(1), 1-9.Google ScholarGoogle Scholar
  18. Kang, D. D., Li, F., Kirton, E., Thomas, A., Egan, R., An, H., & Wang, Z., 2019. MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ, 7, e7359.Google ScholarGoogle ScholarCross RefCross Ref

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    BIC '23: Proceedings of the 2023 3rd International Conference on Bioinformatics and Intelligent Computing
    February 2023
    398 pages
    ISBN:9798400700200
    DOI:10.1145/3592686

    Copyright © 2023 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 31 May 2023

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited
  • Article Metrics

    • Downloads (Last 12 months)29
    • Downloads (Last 6 weeks)0

    Other Metrics

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format