ABSTRACT
Binning the metagenomic contigs into potential genomes is an essential step to investigate and characterize various genetic materials of different species and understand microbial populations. Existing tools have achieved considerable success with a variety of datasets, especially in long-length sequences, while precise classification of short contigs in complex metagenomics remains a challenging task. Since different species may have same sequences in their genomes, a contig sequence can be assigned to multiple species. However, current contig binning tools merely support single label binning. In other words, each contig is assigned to just one species. This article introduced Viabin which can refine the binning results using ZINB-autoencoder and assembly graphs. Viabin used Gaussian mixture model to preliminarily classify the unclassified contigs encoded by autoencoder based on ZINB. Subsequently, assembly graph and sequence feature were combined with the binning results of other tools to complete the final binning. More importantly, it had the capacity to assign contigs to numerous bins. Experimental 3 datasets demonstrated that Viabin could not only improve binning performance of short contig sequences but also support assignment of contigs to multiple bins.
- Zhang, Z., Wang, J., Wang, J., Wang, J., & Li, Y., 2020. Estimate of the sequenced proportion of the global prokaryotic genome. Microbiome, 8(1), 1-9.Google ScholarCross Ref
- Quince, C., Walker, A. W., Simpson, J. T., Loman, N. J., & Segata, N., 2017. Shotgun metagenomics, from sampling to analysis. Nature biotechnology, 35(9), 833-844.Google Scholar
- Nissen, J. N., Johansen, J., Allesøe, R. L., Sønderby, C. K., Armenteros, J. J. A., Grønbech, C. H., ... & Rasmussen, S., 2021. Improved metagenome binning and assembly using deep variational autoencoders. Nature biotechnology, 39(5), 555-560.Google Scholar
- Liu, C. C., Dong, S. S., Chen, J. B., Wang, C., Ning, P., Guo, Y., & Yang, T. L., 2022. MetaDecoder: a novel method for clustering metagenomic contigs. Microbiome, 10(1), 1-16.Google ScholarCross Ref
- Pan, S., Zhu, C., Zhao, X. M., & Coelho, L. P., 2022. A deep siamese neural network improves metagenome-assembled genomes in microbiome datasets across different environments. Nature communications, 13(1), 1-12.Google Scholar
- Riesenfeld, C. S., Schloss, P. D., & Handelsman, J., 2004. Metagenomics: genomic analysis of microbial communities. Annual review of genetics, 38(1), 525-552.Google Scholar
- Chan, C. K. K., Hsu, A. L., Halgamuge, S. K., & Tang, S. L., 2008. Binning sequences using very sparse labels within a metagenome. BMC bioinformatics, 9(1), 1-17.Google Scholar
- Burton, J. N., Liachko, I., Dunham, M. J., & Shendure, J., 2014. Species-level deconvolution of metagenome assemblies with Hi-C–based contact probability maps. G3: Genes, Genomes, Genetics, 4(7), 1339-1346.Google Scholar
- Nurk, S., Meleshko, D., Korobeynikov, A., & Pevzner, P. A., 2017. metaSPAdes: a new versatile metagenomic assembler. Genome research, 27(5), 824-834.Google Scholar
- Li, D., Liu, C. M., Luo, R., Sadakane, K., & Lam, T. W., 2015. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics, 31(10), 1674-1676.Google ScholarCross Ref
- Parks, D. H., Chuvochina, M., Chaumeil, P. A., Rinke, C., Mussig, A. J., & Hugenholtz, P., 2020. A complete domain-to-species taxonomy for Bacteria and Archaea. Nature biotechnology, 38(9), 1079-1086.Google Scholar
- Steinegger, M., & Söding, J., 2017. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nature biotechnology, 35(11), 1026-1028.Google Scholar
- Mirdita, M., Steinegger, M., Breitwieser, F., Söding, J., & Levy Karin, E., 2021. Fast and sensitive taxonomic assignment to metagenomic contigs. Bioinformatics, 37(18), 3029-3031.Google ScholarCross Ref
- Wu, Y. W., Simmons, B. A., & Singer, S. W., 2016. MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics, 32(4), 605-607.Google Scholar
- Rho, M., Tang, H., & Ye, Y., 2010. FragGeneScan: predicting genes in short and error-prone reads. Nucleic acids research, 38(20), e191-e191.Google Scholar
- Zhu, X. and Ghahramani, Z., 2002. Learning from labeled and unlabeled data with label propagation. Technical report, School of Computer Science, Carnegie Mellon University.Google Scholar
- Sevim, V., Lee, J., Egan, R., Clum, A., Hundley, H., Lee, J., ... & Woyke, T., 2019. Shotgun metagenome data of a defined mock community using Oxford Nanopore, PacBio and Illumina technologies. Scientific data, 6(1), 1-9.Google Scholar
- Kang, D. D., Li, F., Kirton, E., Thomas, A., Egan, R., An, H., & Wang, Z., 2019. MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ, 7, e7359.Google ScholarCross Ref
Recommendations
MetaCoAG: Binning Metagenomic Contigs via Composition, Coverage and Assembly Graphs
Research in Computational Molecular BiologyAbstractMetagenomics has allowed us to obtain various genetic material from different species and gain valuable insights into microbial communities. Binning plays an important role in the early stages of metagenomic analysis pipelines. A typical pipeline ...
A novel abundance-based algorithm for binning metagenomic sequences using l-tuples
RECOMB'10: Proceedings of the 14th Annual international conference on Research in Computational Molecular BiologyMetagenomics is the study of microbial communities sampled directly from their natural environment, without prior culturing Among the computational tools recently developed for metagenomic sequence analysis, binning tools attempt to classify all (or ...
Unsupervised Binning of Metagenomic Assembled Contigs Using Improved Fuzzy C-Means Method
Metagenomic contigs binning is a necessary step of metagenome analysis. After assembly, the number of contigs belonging to different genomes is usually unequal. So a metagenomic contigs dataset is a kind of imbalanced dataset and traditional fuzzy c-...
Comments