Skip to main content

MetaProb 2: Improving Unsupervised Metagenomic Binning with Efficient Reads Assembly Using Minimizers

  • Conference paper
  • First Online:
Computational Advances in Bio and Medical Sciences (ICCABS 2020)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 12686))

Abstract

Current technologies allow the sequencing of microbial communities directly from the environment without prior culturing. One of the major problems when analyzing a microbial sample is to taxonomically annotate its reads to identify the species it contains. Taxonomic analysis of microbial communities requires reads clustering, a process referred to as binning. The major problems of metagenomics reads binning are the lack of taxonomically related genomes in existing reference databases, the uneven abundance ratio of species, and sequencing errors.

In this paper we present MetaProb 2 an unsupervised binning method based on reads assembly and probabilistic k-mers statistics. The novelties of MetaProb 2 are the use of minimizers to efficiently assemble reads into unitigs and a community detection algorithm based on graph modularity to cluster unitigs and to detect representative unitigs. The effectiveness of MetaProb 2 is demonstrated in both simulated and synthetic datasets in comparison with state-of-art binning tools such as MetaProb, AbundanceBin, Bimeta and MetaCluster.

Available at: https://github.com/frankandreace/metaprob2.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bayat, A., Deshpande, N.P., Wilkins, M.R., Parameswaran, S.: Fast short read de-novo assembly using overlap-layout-consensus approach. IEEE/ACM Trans. Comput. Biol. Bioinform. 17(01), 334–338 (2020). https://doi.org/10.1109/TCBB.2018.2875479

    Article  Google Scholar 

  2. Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. Theory Experiment 2008(10), P10008 (2008). https://doi.org/10.1088/1742-5468/2008/10/p10008

    Article  MATH  Google Scholar 

  3. Comin, M., Di. Camillo, B., Pizzi, C., Vandin, F.: Comparison of microbiome samples: methods and computational challenges. Briefings Bioinform. 22, 88 (2020). https://doi.org/10.1093/bib/bbaa121

    Article  Google Scholar 

  4. Eisen, J.A.: Environmental shotgun sequencing its potential and challenges for studying the hidden world of microbes. PLoS Biol. 5, e82 (2007)

    Article  Google Scholar 

  5. Felczykowska, A., Bloch, S.K., Nejman-Faleńczyk, B., Barańska, S.: Metagenomic approach in the investigation of new bioactive compounds in the marine environment. Acta Biochimica Polonica 59(4), 501–505 (2012)

    Article  Google Scholar 

  6. Girotto, S., Comin, M., Pizzi, C.: Higher recall in metagenomic sequence classification exploiting overlapping reads. BMC Genomics 18(10), 917 (2017)

    Article  Google Scholar 

  7. Girotto, S., Comin, M., Pizzi, C.: Metagenomic reads binning with spaced seeds. In: Theoretical Computer Science, Algorithms, Strings and Theoretical Approaches in the Big Data Era (In Honor of the 60th Birthday of Professor Raffaele Giancarlo), vol. 698, pp. 88–99 (2017). https://doi.org/10.1016/j.tcs.2017.05.023

  8. Girotto, S., Pizzi, C., Comin, M.: Metaprob: accurate metagenomic reads binning based on probabilistic sequence signatures. Bioinformatics 32(17), i567–i575 (2016). https://doi.org/10.1093/bioinformatics/btw466

    Article  Google Scholar 

  9. Kang, D.D., Froula, J., Egan, R., Wang, Z.: MetaBAT: an efficient tool for accurately reconstructing single genomes from complex microbial communities. PeerJ 3, e1165 (2015). https://doi.org/10.7717/peerj.1165

    Article  Google Scholar 

  10. Li, H.: Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences. Bioinformatics 32(14), 2103–2110 (2016). https://doi.org/10.1093/bioinformatics/btw152

    Article  Google Scholar 

  11. Li, H.: Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34(18), 3094–3100 (2018). https://doi.org/10.1093/bioinformatics/bty191

    Article  Google Scholar 

  12. Lindgreen, S., Adair, K., Gardner, P.: An Evaluation of the Accuracy and Speed of Metagenome Analysis Tools. Cold Spring Harbor Laboratory Press, Woodburry (2015)

    Google Scholar 

  13. Mallawaarachchi, V., Wickramarachchi, A., Lin, Y.: GraphBin: refined binning of metagenomic contigs using assembly graphs. Bioinformatics 36(11), 3307–3313 (2020)

    Article  Google Scholar 

  14. Mande, S.S., Mohammed, M.H., Ghosh, T.S.: Classification of metagenomic sequences: methods and challenges. Briefings Bioinform. 13(6), 669–681 (2012). https://doi.org/10.1093/bib/bbs054

    Article  Google Scholar 

  15. Marchiori, D., Comin, M.: SKraken: fast and sensitive classification of short metagenomic reads based on filtering uninformative k-mers. In: BIOINFORMATICS 2017–8th International Conference on Bioinformatics Models, Methods and Algorithms, Proceedings; Part of 10th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2017, vol. 3, pp. 59–67 (2017)

    Google Scholar 

  16. Ounit, R., Wanamaker, S., Close, T.J., Lonardi, S.: CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers. BMC Genomics 16(1), 1–13 (2015)

    Article  Google Scholar 

  17. Qian, J., Comin, M.: MetaCon: unsupervised clustering of metagenomic contigs with probabilistic k-mers statistics and coverage. BMC Bioinform. 20(367), 1–12 (2019). https://doi.org/10.1186/s12859-019-2904-4

    Article  Google Scholar 

  18. Qian, J., Marchiori, D., Comin, M.: Fast and sensitive classification of short metagenomic reads with SKraken. In: Peixoto, N., Silveira, M., Ali, H.H., Maciel, C., van den Broek, E.L. (eds.) BIOSTEC 2017. CCIS, vol. 881, pp. 212–226. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-94806-5_12

    Chapter  Google Scholar 

  19. Richter, D., Ott, F., Auch, A., Schmid, R., Huson, D.: MetaSim-a sequencing simulator for genomics and metagenomics. PloS One 3, e3373 (2008). https://doi.org/10.1371/journal.pone.0003373

    Article  Google Scholar 

  20. Sczyrba, A., Hofmann, P., McHardy, A.C.: Critical assessment of metagenome interpretation–a benchmark of metagenomics software. Nat. Methods 14, 1063–1071 (2017)

    Article  Google Scholar 

  21. Segata, N., Waldron, L., Ballarini, A., Narasimhan, V., Jousson, O., Huttenhower, C.: Metagenomic microbial community profiling using unique clade-specific marker genes. Nat. Methods 9, 811 (2012)

    Article  Google Scholar 

  22. Staley, J.T., Konopka, A.: Measurement of in situ activities of nonphotosynthetic microorganisms in aquatic and terrestrial habitats. Ann. Rev. Microbiol. 39(1), 321–346 (1985). https://doi.org/10.1146/annurev.mi.39.100185.001541. pMID: 3904603

    Article  Google Scholar 

  23. Storato, D., Comin, M.: Improving metagenomic classification using discriminative k-mers from sequencing data. In: Cai, Z., Mandoiu, I., Narasimhan, G., Skums, P., Guo, X. (eds.) ISBRA 2020. LNCS, vol. 12304, pp. 68–81. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-57821-3_7

    Chapter  Google Scholar 

  24. Vinh, L.V., Lang, T.V., Binh, L.T., Hoai, T.V.: A two-phase binning algorithm using l-mer frequency on groups of non-overlapping reads. Algorithms Mol. Biol. 10(1), 1–12 (2015). https://doi.org/10.1186/s13015-014-0030-4

    Article  Google Scholar 

  25. Wang, Y., Leung, H.C., Yiu, S.M., Chin, F.Y.: MetaCluster 5.0: a two-round binning approach for metagenomic data for low-abundance species in a noisy sample. Bioinform. 28, i356 (2012). https://doi.org/10.1093/bioinformatics/bts397

  26. Wood, D., Salzberg, S.: Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 15, 1–12 (2014)

    Article  Google Scholar 

  27. Wu, Y.W., Ye, Y.: A novel abundance-based algorithm for binning metagenomic sequences using l-tuples. J. Comput. Biol. 18, 523 (2011). https://doi.org/10.1089/cmb.2010.0245

    Article  MathSciNet  Google Scholar 

  28. Zielezinski, A., Girgis, H., Bernard, G., et al.: Benchmarking of alignment-free sequence comparison methods. Genome Biol. 20(1), 144 (2019)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. Comin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Andreace, F., Pizzi, C., Comin, M. (2021). MetaProb 2: Improving Unsupervised Metagenomic Binning with Efficient Reads Assembly Using Minimizers. In: Jha, S.K., Măndoiu, I., Rajasekaran, S., Skums, P., Zelikovsky, A. (eds) Computational Advances in Bio and Medical Sciences. ICCABS 2020. Lecture Notes in Computer Science(), vol 12686. Springer, Cham. https://doi.org/10.1007/978-3-030-79290-9_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-79290-9_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-79289-3

  • Online ISBN: 978-3-030-79290-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics