Abstract
Current technologies allow the sequencing of microbial communities directly from the environment without prior culturing. One of the major problems when analyzing a microbial sample is to taxonomically annotate its reads to identify the species it contains. Taxonomic analysis of microbial communities requires reads clustering, a process referred to as binning. The major problems of metagenomics reads binning are the lack of taxonomically related genomes in existing reference databases, the uneven abundance ratio of species, and sequencing errors.
In this paper we present MetaProb 2 an unsupervised binning method based on reads assembly and probabilistic k-mers statistics. The novelties of MetaProb 2 are the use of minimizers to efficiently assemble reads into unitigs and a community detection algorithm based on graph modularity to cluster unitigs and to detect representative unitigs. The effectiveness of MetaProb 2 is demonstrated in both simulated and synthetic datasets in comparison with state-of-art binning tools such as MetaProb, AbundanceBin, Bimeta and MetaCluster.
Available at: https://github.com/frankandreace/metaprob2.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bayat, A., Deshpande, N.P., Wilkins, M.R., Parameswaran, S.: Fast short read de-novo assembly using overlap-layout-consensus approach. IEEE/ACM Trans. Comput. Biol. Bioinform. 17(01), 334–338 (2020). https://doi.org/10.1109/TCBB.2018.2875479
Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. Theory Experiment 2008(10), P10008 (2008). https://doi.org/10.1088/1742-5468/2008/10/p10008
Comin, M., Di. Camillo, B., Pizzi, C., Vandin, F.: Comparison of microbiome samples: methods and computational challenges. Briefings Bioinform. 22, 88 (2020). https://doi.org/10.1093/bib/bbaa121
Eisen, J.A.: Environmental shotgun sequencing its potential and challenges for studying the hidden world of microbes. PLoS Biol. 5, e82 (2007)
Felczykowska, A., Bloch, S.K., Nejman-Faleńczyk, B., Barańska, S.: Metagenomic approach in the investigation of new bioactive compounds in the marine environment. Acta Biochimica Polonica 59(4), 501–505 (2012)
Girotto, S., Comin, M., Pizzi, C.: Higher recall in metagenomic sequence classification exploiting overlapping reads. BMC Genomics 18(10), 917 (2017)
Girotto, S., Comin, M., Pizzi, C.: Metagenomic reads binning with spaced seeds. In: Theoretical Computer Science, Algorithms, Strings and Theoretical Approaches in the Big Data Era (In Honor of the 60th Birthday of Professor Raffaele Giancarlo), vol. 698, pp. 88–99 (2017). https://doi.org/10.1016/j.tcs.2017.05.023
Girotto, S., Pizzi, C., Comin, M.: Metaprob: accurate metagenomic reads binning based on probabilistic sequence signatures. Bioinformatics 32(17), i567–i575 (2016). https://doi.org/10.1093/bioinformatics/btw466
Kang, D.D., Froula, J., Egan, R., Wang, Z.: MetaBAT: an efficient tool for accurately reconstructing single genomes from complex microbial communities. PeerJ 3, e1165 (2015). https://doi.org/10.7717/peerj.1165
Li, H.: Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences. Bioinformatics 32(14), 2103–2110 (2016). https://doi.org/10.1093/bioinformatics/btw152
Li, H.: Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34(18), 3094–3100 (2018). https://doi.org/10.1093/bioinformatics/bty191
Lindgreen, S., Adair, K., Gardner, P.: An Evaluation of the Accuracy and Speed of Metagenome Analysis Tools. Cold Spring Harbor Laboratory Press, Woodburry (2015)
Mallawaarachchi, V., Wickramarachchi, A., Lin, Y.: GraphBin: refined binning of metagenomic contigs using assembly graphs. Bioinformatics 36(11), 3307–3313 (2020)
Mande, S.S., Mohammed, M.H., Ghosh, T.S.: Classification of metagenomic sequences: methods and challenges. Briefings Bioinform. 13(6), 669–681 (2012). https://doi.org/10.1093/bib/bbs054
Marchiori, D., Comin, M.: SKraken: fast and sensitive classification of short metagenomic reads based on filtering uninformative k-mers. In: BIOINFORMATICS 2017–8th International Conference on Bioinformatics Models, Methods and Algorithms, Proceedings; Part of 10th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2017, vol. 3, pp. 59–67 (2017)
Ounit, R., Wanamaker, S., Close, T.J., Lonardi, S.: CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers. BMC Genomics 16(1), 1–13 (2015)
Qian, J., Comin, M.: MetaCon: unsupervised clustering of metagenomic contigs with probabilistic k-mers statistics and coverage. BMC Bioinform. 20(367), 1–12 (2019). https://doi.org/10.1186/s12859-019-2904-4
Qian, J., Marchiori, D., Comin, M.: Fast and sensitive classification of short metagenomic reads with SKraken. In: Peixoto, N., Silveira, M., Ali, H.H., Maciel, C., van den Broek, E.L. (eds.) BIOSTEC 2017. CCIS, vol. 881, pp. 212–226. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-94806-5_12
Richter, D., Ott, F., Auch, A., Schmid, R., Huson, D.: MetaSim-a sequencing simulator for genomics and metagenomics. PloS One 3, e3373 (2008). https://doi.org/10.1371/journal.pone.0003373
Sczyrba, A., Hofmann, P., McHardy, A.C.: Critical assessment of metagenome interpretation–a benchmark of metagenomics software. Nat. Methods 14, 1063–1071 (2017)
Segata, N., Waldron, L., Ballarini, A., Narasimhan, V., Jousson, O., Huttenhower, C.: Metagenomic microbial community profiling using unique clade-specific marker genes. Nat. Methods 9, 811 (2012)
Staley, J.T., Konopka, A.: Measurement of in situ activities of nonphotosynthetic microorganisms in aquatic and terrestrial habitats. Ann. Rev. Microbiol. 39(1), 321–346 (1985). https://doi.org/10.1146/annurev.mi.39.100185.001541. pMID: 3904603
Storato, D., Comin, M.: Improving metagenomic classification using discriminative k-mers from sequencing data. In: Cai, Z., Mandoiu, I., Narasimhan, G., Skums, P., Guo, X. (eds.) ISBRA 2020. LNCS, vol. 12304, pp. 68–81. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-57821-3_7
Vinh, L.V., Lang, T.V., Binh, L.T., Hoai, T.V.: A two-phase binning algorithm using l-mer frequency on groups of non-overlapping reads. Algorithms Mol. Biol. 10(1), 1–12 (2015). https://doi.org/10.1186/s13015-014-0030-4
Wang, Y., Leung, H.C., Yiu, S.M., Chin, F.Y.: MetaCluster 5.0: a two-round binning approach for metagenomic data for low-abundance species in a noisy sample. Bioinform. 28, i356 (2012). https://doi.org/10.1093/bioinformatics/bts397
Wood, D., Salzberg, S.: Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 15, 1–12 (2014)
Wu, Y.W., Ye, Y.: A novel abundance-based algorithm for binning metagenomic sequences using l-tuples. J. Comput. Biol. 18, 523 (2011). https://doi.org/10.1089/cmb.2010.0245
Zielezinski, A., Girgis, H., Bernard, G., et al.: Benchmarking of alignment-free sequence comparison methods. Genome Biol. 20(1), 144 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Andreace, F., Pizzi, C., Comin, M. (2021). MetaProb 2: Improving Unsupervised Metagenomic Binning with Efficient Reads Assembly Using Minimizers. In: Jha, S.K., Măndoiu, I., Rajasekaran, S., Skums, P., Zelikovsky, A. (eds) Computational Advances in Bio and Medical Sciences. ICCABS 2020. Lecture Notes in Computer Science(), vol 12686. Springer, Cham. https://doi.org/10.1007/978-3-030-79290-9_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-79290-9_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-79289-3
Online ISBN: 978-3-030-79290-9
eBook Packages: Computer ScienceComputer Science (R0)