Abstract
Metagenomics studies microbial genomes in an ecosystem such as the gastrointestinal tract of a human through sequencing thousands of organism in parallel. The sheer number of genomic fragments are challenging for current metagenomic binning software to process. Here we present a scalable reference-free metagenomic binning pipeline designed to handle large scale metagenomic data. It allows users to input several tera base pairs (TB) of reads and produces highly accurate binning results, even at a species level. The pipeline outputs all binned species in multiple metagenomic samples and their estimated relative abundance. We integrate the pipeline into an open-source software, MetaMat, which is freely available at: https://github.com/BioAlgs/MetaMat.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Huson, D.H., Auch, A.F., Qi, J., Schuster, S.C.: Megan analysis of metagenomic data. Genome Res. 17(3), 377–386 (2007)
Liu, B., Gibbons, T., Ghodsi, M., Pop, M.: MetaPhyler: taxonomic profiling for metagenomic sequences. In: 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 95–100. IEEE (2010)
Wood, D.E., Salzberg, S.L.: Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 15(3), R46 (2014)
Ounit, R., Wanamaker, S., Close, T.J., Lonardi, S.: CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers. BMC Genom. 16(1), 236 (2015)
Abe, T., Sugawara, H., Kinouchi, M., Kanaya, S., Ikemura, T.: Novel phylogenetic studies of genomic sequence fragments derived from uncultured microbe mixtures in environmental and clinical samples. DNA Res. 12(5), 281–290 (2005)
Alneberg, J., Bjarnason, B.S., Bruijn, I.D., Schirmer, M., Quick, J., Ijaz, U.Z., Lahti, L., Loman, N.J., Andersson, A.F., Quince, C.: Binning metagenomic contigs by coverage and composition. Nature Methods 11(11), 1144 (2014)
Yu-Wei, W., Simmons, B.A., Singer, S.W.: MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics 32(4), 605–607 (2015)
Kang, D.D., Froula, J., Egan, R., Wang, Z.: MetaBat, an efficient tool for accurately reconstructing single genomes from complex microbial communities. PeerJ 3, e1165 (2015)
Imelfort, M., Parks, D., Woodcroft, B.J., Dennis, P., Hugenholtz, P., Tyson, G.W.: GroopM: an automated tool for the recovery of population genomes from related metagenomes. PeerJ 2, e603 (2014)
Laczny, C.C., Sternal, T., Plugaru, V., Gawron, P., Atashpendar, A., Margossian, H.H., Coronado, S., Van der Maaten, L., Vlassis, N., Wilmes, P.: VizBin-an application for reference-independent visualization and human-augmented binning of metagenomic data. Microbiome 3(1), 1 (2015)
Boisvert, S., Raymond, F., Godzaridis, É., Laviolette, F., Corbeil, J.: Ray meta: scalable de novo metagenome assembly and profiling. Genome Biol. 13(12), R122 (2012)
Li, D., Liu, C.-M., Luo, R., Sadakane, K., Lam, T.-W.: MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31(10), 1674–1676 (2015)
Langmead, B., Salzberg, S.L.: Fast gapped-read alignment with Bowtie 2. Nature Methods 9(4), 357 (2012)
Rousseeuw, P.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20(1), 53–65 (1987)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. Ser. B (Methodol.) 58, 267–288 (1996)
Haynes, W.: Wilcoxon rank sum test. In: Dubitzky, W., Wolkenhauer, O., Cho, K.H., Yokota, H. (eds.) Encyclopedia of Systems Biology, pp. 2354–2355. Springer, New York (2013). https://doi.org/10.1007/978-1-4419-9863-7
Tange, O., et al.: GNU parallel-the command-line power tool. USENIX Mag. 36(1), 42–47 (2011)
Analytics Revolution, Weston, S.: doParallel: foreach parallel adaptor for the parallel package. R package version, vol. 1, no. 8 (2014)
Analytics Revolution, Weston, S.: Foreach: foreach looping construct for R. R package version, vol. 1, no. 1 (2013)
Acknowledgement
This research was supported in part by the National Institutes of Health grant R01 GM113242-01 and the National Science Foundation grants DMS-1440038 and DMS-1440037.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Ma, T., Xing, X. (2018). A Scalable Reference-Free Metagenomic Binning Pipeline. In: Zhang, F., Cai, Z., Skums, P., Zhang, S. (eds) Bioinformatics Research and Applications. ISBRA 2018. Lecture Notes in Computer Science(), vol 10847. Springer, Cham. https://doi.org/10.1007/978-3-319-94968-0_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-94968-0_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-94967-3
Online ISBN: 978-3-319-94968-0
eBook Packages: Computer ScienceComputer Science (R0)