Skip to main content

A Scalable Reference-Free Metagenomic Binning Pipeline

  • Conference paper
  • First Online:
Bioinformatics Research and Applications (ISBRA 2018)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 10847))

Included in the following conference series:

  • 1415 Accesses

Abstract

Metagenomics studies microbial genomes in an ecosystem such as the gastrointestinal tract of a human through sequencing thousands of organism in parallel. The sheer number of genomic fragments are challenging for current metagenomic binning software to process. Here we present a scalable reference-free metagenomic binning pipeline designed to handle large scale metagenomic data. It allows users to input several tera base pairs (TB) of reads and produces highly accurate binning results, even at a species level. The pipeline outputs all binned species in multiple metagenomic samples and their estimated relative abundance. We integrate the pipeline into an open-source software, MetaMat, which is freely available at: https://github.com/BioAlgs/MetaMat.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Huson, D.H., Auch, A.F., Qi, J., Schuster, S.C.: Megan analysis of metagenomic data. Genome Res. 17(3), 377–386 (2007)

    Article  Google Scholar 

  2. Liu, B., Gibbons, T., Ghodsi, M., Pop, M.: MetaPhyler: taxonomic profiling for metagenomic sequences. In: 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 95–100. IEEE (2010)

    Google Scholar 

  3. Wood, D.E., Salzberg, S.L.: Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 15(3), R46 (2014)

    Article  Google Scholar 

  4. Ounit, R., Wanamaker, S., Close, T.J., Lonardi, S.: CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers. BMC Genom. 16(1), 236 (2015)

    Article  Google Scholar 

  5. Abe, T., Sugawara, H., Kinouchi, M., Kanaya, S., Ikemura, T.: Novel phylogenetic studies of genomic sequence fragments derived from uncultured microbe mixtures in environmental and clinical samples. DNA Res. 12(5), 281–290 (2005)

    Article  Google Scholar 

  6. Alneberg, J., Bjarnason, B.S., Bruijn, I.D., Schirmer, M., Quick, J., Ijaz, U.Z., Lahti, L., Loman, N.J., Andersson, A.F., Quince, C.: Binning metagenomic contigs by coverage and composition. Nature Methods 11(11), 1144 (2014)

    Article  Google Scholar 

  7. Yu-Wei, W., Simmons, B.A., Singer, S.W.: MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics 32(4), 605–607 (2015)

    Google Scholar 

  8. Kang, D.D., Froula, J., Egan, R., Wang, Z.: MetaBat, an efficient tool for accurately reconstructing single genomes from complex microbial communities. PeerJ 3, e1165 (2015)

    Article  Google Scholar 

  9. Imelfort, M., Parks, D., Woodcroft, B.J., Dennis, P., Hugenholtz, P., Tyson, G.W.: GroopM: an automated tool for the recovery of population genomes from related metagenomes. PeerJ 2, e603 (2014)

    Article  Google Scholar 

  10. Laczny, C.C., Sternal, T., Plugaru, V., Gawron, P., Atashpendar, A., Margossian, H.H., Coronado, S., Van der Maaten, L., Vlassis, N., Wilmes, P.: VizBin-an application for reference-independent visualization and human-augmented binning of metagenomic data. Microbiome 3(1), 1 (2015)

    Article  Google Scholar 

  11. Boisvert, S., Raymond, F., Godzaridis, É., Laviolette, F., Corbeil, J.: Ray meta: scalable de novo metagenome assembly and profiling. Genome Biol. 13(12), R122 (2012)

    Article  Google Scholar 

  12. Li, D., Liu, C.-M., Luo, R., Sadakane, K., Lam, T.-W.: MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31(10), 1674–1676 (2015)

    Article  Google Scholar 

  13. Langmead, B., Salzberg, S.L.: Fast gapped-read alignment with Bowtie 2. Nature Methods 9(4), 357 (2012)

    Article  Google Scholar 

  14. Rousseeuw, P.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20(1), 53–65 (1987)

    Article  Google Scholar 

  15. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. Ser. B (Methodol.) 58, 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  16. Haynes, W.: Wilcoxon rank sum test. In: Dubitzky, W., Wolkenhauer, O., Cho, K.H., Yokota, H. (eds.) Encyclopedia of Systems Biology, pp. 2354–2355. Springer, New York (2013). https://doi.org/10.1007/978-1-4419-9863-7

    Chapter  Google Scholar 

  17. Tange, O., et al.: GNU parallel-the command-line power tool. USENIX Mag. 36(1), 42–47 (2011)

    Google Scholar 

  18. Analytics Revolution, Weston, S.: doParallel: foreach parallel adaptor for the parallel package. R package version, vol. 1, no. 8 (2014)

    Google Scholar 

  19. Analytics Revolution, Weston, S.: Foreach: foreach looping construct for R. R package version, vol. 1, no. 1 (2013)

    Google Scholar 

Download references

Acknowledgement

This research was supported in part by the National Institutes of Health grant R01 GM113242-01 and the National Science Foundation grants DMS-1440038 and DMS-1440037.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xin Xing .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ma, T., Xing, X. (2018). A Scalable Reference-Free Metagenomic Binning Pipeline. In: Zhang, F., Cai, Z., Skums, P., Zhang, S. (eds) Bioinformatics Research and Applications. ISBRA 2018. Lecture Notes in Computer Science(), vol 10847. Springer, Cham. https://doi.org/10.1007/978-3-319-94968-0_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-94968-0_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-94967-3

  • Online ISBN: 978-3-319-94968-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics