Skip to main content

naiveBayesCall: An Efficient Model-Based Base-Calling Algorithm for High-Throughput Sequencing

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 6044))

Abstract

Immense amounts of raw instrument data (i.e., images of fluorescence) are currently being generated using ultra high-throughput sequencing platforms. An important computational challenge associated with this rapid advancement is to develop efficient algorithms that can extract accurate sequence information from raw data. To address this challenge, we recently introduced a novel model-based base-calling algorithm that is fully parametric and has several advantages over previously proposed methods. Our original algorithm, called BayesCall, significantly reduced the error rate, particularly in the later cycles of a sequencing run, and also produced useful base-specific quality scores with a high discrimination ability. Unfortunately, however, BayesCall is too computationally expensive to be of broad practical use. In this paper, we build on our previous model-based approach to devise an efficient base-calling algorithm that is orders of magnitude faster than BayesCall, while still maintaining a comparably high level of accuracy. Our new algorithm is called naiveBayesCall, and it utilizes approximation and optimization methods to achieve scalability. We describe the performance of naiveBayesCall and demonstrate how improved base-calling accuracy may facilitate de novo assembly when the coverage is low to moderate.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bentley, D.R.: Whole-genome re-sequencing. Curr. Opin. Genet. Dev. 16, 545–552 (2006)

    Article  Google Scholar 

  2. Brockman, W., Alvarez, P., Young, S., Garber, M., Giannoukos, G., Lee, W.L., Russ, C., Lander, E.S., Nusbaum, C., Jaffe, D.B.: Quality scores and SNP detection in sequencing-by-synthesis systems. Genome Res. 18, 763–770 (2008)

    Article  Google Scholar 

  3. Butler, J., MacCallum, I., Kleber, M., Shlyakhter, I.A., Belmonte, M.K., Lander, E.S., Nusbaum, C., Jaffe, D.B.: ALLPATHS: De novo assembly of whole-genome shotgun microreads. Genome Research 18(5), 810–820 (2008)

    Article  Google Scholar 

  4. Chaisson, M.J.P., Brinza, D., Pevzner, P.A.: De novo fragment assembly with short mate-paired reads: Does the read length matter? Genome research (2008)

    Google Scholar 

  5. Erlich, Y., Mitra, P., Delabastide, M., McCombie, W., Hannon, G.: Alta-Cyclic: a self-optimizing base caller for next-generation sequencing. Nat. Methods 5, 679–682 (2008)

    Article  Google Scholar 

  6. Ewing, B., Green, P.: Base-calling of automated sequencer traces using Phred. II. Error probabilities. Genome Research 8(3), 186–194 (1998)

    Google Scholar 

  7. Hellmann, I., Mang, Y., Gu, Z., Li, P., Vega, F.M.D.L., Clark, A.G., Nielsen, R.: Population genetic analysis of shotgun assemblies of genomic sequences from multiple individuals. Genome Res. 18(7), 1020–1029 (2008)

    Article  Google Scholar 

  8. Jiang, R., Tavare, S., Marjoram, P.: Population genetic inference from resequencing data. Genetics 181(1), 187–197 (2009)

    Article  Google Scholar 

  9. Kao, W.C., Stevens, K., Song, Y.S.: BayesCall: A model-based basecalling algorithm for high-throughput short-read sequencing. Genome Research 19, 1884–1895 (2009)

    Article  Google Scholar 

  10. Kiefer, J.: Sequential minimax search for a maximum. Proceedings of the American Mathematical Society 4, 502–506 (1953)

    Google Scholar 

  11. Langmead, B., Trapnell, C., Pop, M., Salzberg, S.: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology 25, R25 (2009)

    Google Scholar 

  12. Li, H., Ruan, J., Durbin, R.: Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 18, 1851–1858 (2008)

    Article  Google Scholar 

  13. Li, L., Speed, T.: An estimate of the crosstalk matrix in four-dye fluorescence-based DNA sequencing. Electrophoresis 20, 1433–1442 (1999)

    Article  Google Scholar 

  14. Medvedev, P., Brudno, M.: Ab Initio Whole Genome Shotgun Assembly with Mated Short Reads. In: Vingron, M., Wong, L. (eds.) RECOMB 2008. LNCS (LNBI), vol. 4955, pp. 50–64. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  15. Metzker, M.L.: Emerging technologies in DNA sequencing. Genome Res. 15(12), 1767–1776 (2005)

    Article  Google Scholar 

  16. Rougemont, J., Amzallag, A., Iseli, C., Farinelli, L., Xenarios, I., Naef, F.: Probabilistic base calling of Solexa sequencing data. BMC Bioinformatics 9, 431 (2008)

    Article  Google Scholar 

  17. Sundquist, A., Ronaghi, M., Tang, H., Pevzner, P., Batzoglou, S.: Whole-genome sequencing and assembly with high-throughput, short-read technologies. PLoS One 2(5), e484 (2007)

    Google Scholar 

  18. Viterbi, A.: Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory 13(2), 260–269 (1967)

    Article  MATH  Google Scholar 

  19. Whiteford, N., Skelly, T., Curtis, C., Ritchie, M., Lohr, A., Zaranek, A., Abnizova, I., Brown, C.: Swift: Primary Data Analysis for the Illumina Solexa Sequencing Platform. Bioinformatics 25(17), 2194–2199 (2009)

    Article  Google Scholar 

  20. Yin, Z., Severin, J., Giddings, M.C., Huang, W.A., Westphall, M.S., Smith, L.M.: Automatic matrix determination in four dye fluorescence-based DNA sequencing. Electrophoresis 17, 1143–1150 (1996)

    Article  Google Scholar 

  21. Zerbino, D.R., Birney, E.: Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Genome Research 18(5), 821–829 (2008)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kao, WC., Song, Y.S. (2010). naiveBayesCall: An Efficient Model-Based Base-Calling Algorithm for High-Throughput Sequencing. In: Berger, B. (eds) Research in Computational Molecular Biology. RECOMB 2010. Lecture Notes in Computer Science(), vol 6044. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12683-3_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12683-3_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12682-6

  • Online ISBN: 978-3-642-12683-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics