Skip to main content

MetaAB - A Novel Abundance-Based Binning Approach for Metagenomic Sequences

  • Conference paper
  • First Online:
Nature of Computation and Communication (ICTCC 2014)

Abstract

Metagenomics is a research discipline of microbial communities that studies directly on genetic materials obtained from environmental samples without isolating and culturing single organisms in laboratory. One of the crucial tasks in metagenomic projects is the identification and taxonomic characterization of DNA sequences in the samples. In this paper, we present an unsupervised binning of metagenomic reads, called MetaAB, which can be able to identify and classify reads into groups of genomes using the information of genome abundances. The method is based on a proposed reduced-dimension model that is theoretically proved to have less computational time. Besides, MetaAB detects the number of genome abundances in data automatically by using the Bayesian Information Criterion. Experimental results show that the proposed method achieves higher accuracy and run faster than a recent abundance-based binning approach. The software implementing the algorithm can be downloaded at http://it.hcmute.edu.vn/bioinfo/metaab/index.htm

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Handelsman, J.: The New Science of Metagenomics: Revealing the Secrets of Out Microbial Planet. The National Academies Press, Washington, DC (2007)

    Google Scholar 

  2. Aann, R.I., Ludwig, W., Schleifer, K.H.: Phylogenetic identification and in situ detection of individual microbial cells without cultivation. Microbiol Rev. (1995)

    Google Scholar 

  3. Wooley, J.C.: A primer on metagenomics. PloS Computational Biology (2010)

    Google Scholar 

  4. Shendure, J., Ji, H.: Next-generation dna sequencing. Nature Biotechnology (2008)

    Google Scholar 

  5. Qin, J., Li, R., Wang, J.: A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464 (2010)

    Google Scholar 

  6. Huson, D.H.: Megan analysis of metagenomic data. Genome Research (2007)

    Google Scholar 

  7. Gerlach, W.: Taxonomic classification of metagenomic shotgun sequences with carma3. Nucleic Acids Research (2011)

    Google Scholar 

  8. Diaz, N.N., Krause, L., Goesmann, A., Niehaus, K., Nattkemper, T.W.: Tacoa: Taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach. BMC Bioinformatics (2009)

    Google Scholar 

  9. Yi, W., et al.: Metacluster-ta: taxonomic annotation for metagenomic databased on assembly-assisted binning. BMC Genomics 15 (2014)

    Google Scholar 

  10. Eisen, J.A.: Environmental shotgun sequencing: Its potential and challenges for studying the hidden world of microbes. PLoS Biol. 5(3) (2007)

    Google Scholar 

  11. Yang, B., Peng, Y., Qin, J., Chin, F.Y.L.: MetaCluster: unsupervised binning of environmental genomic fragments and taxonomic annotation. In: ACM BCB (2010)

    Google Scholar 

  12. Leung, H.C., Yiu, F.M., Yang, B., Peng, Y., Wang, Y., Liu, Z., Chin, F.Y.: A robust and accurate binning algorithm for metagenomic sequences with arbitrary species abundance ratio. Bioinformatics 27(11), 1489–1495 (2011)

    Article  Google Scholar 

  13. Liao, R., Zhang, R., Guan, J., Zhou, S.: A new unsupervised binning approach for metagenomic sequences based on n-grams and automatic feature weighting. IEEE/ACM Transaction on Computational Biology and Bioinformatics (2014)

    Google Scholar 

  14. Nguyen, T.C., Zhu, D.: Markovbin: An algorithm to cluster metagenomic reads using a mixture modeling of hierarchical distributions. In: Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics

    Google Scholar 

  15. Wu, Y.W., Ye, Y.: A novel abundance-based algorithm for binning metagenomic sequences using l-tuples. Journal of Computational Biology 18(3), 523–534 (2011)

    Article  MathSciNet  Google Scholar 

  16. Tanaseichuk, O., Borneman, J., Jiang, T.: A probabilistic approach to accurate abundance-based binning of metagenomic reads. In: Raphael, B., Tang, J. (eds.) WABI 2012. LNCS, vol. 7534, pp. 404–416. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  17. Lander, E.S., Waterman, M.S.: Genomic mapping by fingerprinting random clones: a mathematic alanalysis. Genomic (1988)

    Google Scholar 

  18. Li, X., Waterman, M.S.: Estimating the repeat structure and length of dna sequences using -tuples. Genome research 13(8), 1916–1922 (2003)

    Google Scholar 

  19. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society.SeriesB (Methodological) 39(1), 1–38 (1977)

    MATH  MathSciNet  Google Scholar 

  20. Figueiredo, M.A.T., Jain, A.K.: Unsupervised learning of finite mixture models. IEEE Transactions on Pattern analysis and machine intelligence 24(3), 381–396 (2004)

    Article  Google Scholar 

  21. Hirose, K., Kawano, S., Konishi, S., Ichikawa, M.: Bayesian information criterion and selection of the number of factors in factor analysis models. Journal of Data Science 9(2), 243–259 (2011)

    MathSciNet  Google Scholar 

  22. Wang, Y., Leung, H.C., Yiu, S.M., Chin, F.Y.: Metacluster 4.0: a novel binning algorithm for ngs reads and huge number of species. Journal of Computational Biology 19(2), 241–249 (2012)

    Article  Google Scholar 

  23. Wang, Y., Leung, H.C., Yiu, S.M., Chin, F.Y.: Metacluster 5.0: a two-round binning approach for metagenomic data for low-abundance species in a noisy sample. Bioinformatics 28(18), 356–362 (2012)

    Article  Google Scholar 

  24. Richter, D.C., Ott, F., Auch, A.F., Schmid, R., Huson, D.H.: Metasim - a sequencing simulator for genomics and metagenomics. PLoS ONE (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Van-Vinh Le .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Cite this paper

Le, VV., Van Lang, T., Van Hoai, T. (2015). MetaAB - A Novel Abundance-Based Binning Approach for Metagenomic Sequences. In: Vinh, P., Vassev, E., Hinchey, M. (eds) Nature of Computation and Communication. ICTCC 2014. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 144. Springer, Cham. https://doi.org/10.1007/978-3-319-15392-6_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-15392-6_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-15391-9

  • Online ISBN: 978-3-319-15392-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics