Skip to main content

Ab Initio Whole Genome Shotgun Assembly with Mated Short Reads

  • Conference paper
Research in Computational Molecular Biology (RECOMB 2008)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4955))

Abstract

Next Generation Sequencing (NGS) technologies are capable of reading millions of short DNA sequences both quickly and cheaply. While these technologies are already being used for resequencing individuals once a reference genome exists, it has not been shown if it is possible to use them for ab initio genome assembly. In this paper, we give a novel network flow-based algorithm that, by taking advantage of the high coverage provided by NGS, accurately estimates the copy counts of repeats in a genome. We also give a second algorithm that combines the predicted copy-counts with mate-pair data in order to assemble the reads into contigs. We run our algorithms on simulated read data from E. Coli and predict copy-counts with extremely high accuracy, while assembling long contigs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ahuja, R.K., Magnanti, T.L., Orlin, J.B.: Network flows: theory, algorithms, and applications. Prentice-Hall, Inc., Upper Saddle River, NJ, USA (1993)

    Google Scholar 

  2. Appa, G., Kotnyek, B.: A bidirected generalization of network matrices. Networks 47(4), 185–198 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  3. Batzoglou, S., Jaffe, D.B., Stanley, K., Butler, J., Gnerre, S., Mauceli, E., Berger, B., Mesirov, J.P., Lander, E.S.: Arachne: a whole-genome shotgun assembler. Genome Res. 12(1), 177–189 (2002)

    Article  Google Scholar 

  4. Chaisson, M., Pevzner, P.A., Tang, H.: Fragment assembly with short reads. Bioinformatics 20(13), 2067–2074 (2004)

    Article  Google Scholar 

  5. Chaisson, M.J., Pevzner, P.A.: Short read fragment assembly of bacterial genomes. Genome. Published online before print Doi:10.1101/gr.7088808 (2007)

    Google Scholar 

  6. Dohm, J., Lottaz, C., Borodina, T., Himmelbauer, H.: SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing. Genome Res. 17, 1697–1706 (2007)

    Article  Google Scholar 

  7. Edmonds, J.: An introduction to matching. In: Notes of engineering summer conference, University of Michigan, Ann Arbor (1967)

    Google Scholar 

  8. Gabow, H.N.: An efficient reduction technique for degree-constrained subgraph and bidirected network flow problems. In: STOC, pp. 448–456 (1983)

    Google Scholar 

  9. Goldberg, A.V.: An efficient implementation of a scaling minimum-cost flow algorithm. J. Algorithms 22(1), 1–29 (1997)

    Article  MathSciNet  Google Scholar 

  10. Hochbaum, D.S.: Monotonizing linear programs with up to two nonzeroes per column. Oper. Res. Lett. 32(1), 49–58 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  11. Chen, J., Skiena, S.: Assembly For Double-Ended Short-Read Sequencing Technologies. In: Mardis, E., Kim, S., Tang, H. (eds.) Advances in Genome Sequencing Technology and Algorithms, pp. 123–141. Artech House Publishers (2007)

    Google Scholar 

  12. Jeck, W.R., Reinhardt, J.A., Baltrus, D.A., Hickenbotham, M.T., Magrini, V., Mardis, E.R., Dangl, J.L., Jones, C.D.: Extending assembly of short DNA sequences to handle error. Bioinformatics 23, 2942–2944 (2007)

    Article  Google Scholar 

  13. Kececioglu, J.D.: Exact and approximation algorithms for DNA sequence reconstruction. PhD thesis, University of Arizona, Tucson, AZ, USA (1992)

    Google Scholar 

  14. Medvedev, P., Georgiou, K., Myers, G., Brudno, M.: Computability of models for sequence assembly. In: WABI, pp. 289–301 (2007)

    Google Scholar 

  15. Myers, E.W.: The fragment assembly string graph. In: ECCB/JBI, p. 85 (2005)

    Google Scholar 

  16. Myers, E.W., Sutton, G.G., Delcher, A.L., Dew, I.M., Fasulo, D.P., Flanigan, M.J., Kravitz, S.A., Mobarry, C.M., Reinert, K.H.J., Remington, K.A., Anson, E.L., Bolanos, R.A., Chou, H.-H., Jordan, C.M., Halpern, A.L., Lonardi, S., Beasley, E.M., Brandon, R.C., Chen, L., Dunn, P.J., Lai, Z., Liang, Y., Nusskern, D.R., Zhan, M., Zhang, Q., Zheng, X., Rubin, G.M., Adams, M.D., Venter, J.C.: A Whole-Genome Assembly of Drosophila. Science 287, 2196–2204 (2000)

    Article  Google Scholar 

  17. Pevzner, P.A., Tang, H., Waterman, M.S.: An Eulerian path approach to DNA fragment assembly. Proceedings of the National Academy of Sciences 98, 9748–9753 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  18. Pevzner, P.A., Tang, H.: Fragment assembly with double-barreled data. In: ISMB (Supplement of Bioinformatics), pp. 225–233 (2001)

    Google Scholar 

  19. Pevzner, P.A., Tang, H., Tesler, G.: De novo repeat classification and fragment assembly. In: RECOMB, pp. 213–222 (2004)

    Google Scholar 

  20. Warren, R.L., Sutton, G.G., Jones, S.J., Holt, R.A.: Assembling millions of short DNA sequences using SSAKE. Bioinformatics 23, 500–501 (2007)

    Article  Google Scholar 

  21. http://www.ebi.ac.uk/~zerbino/velvet/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Martin Vingron Limsoon Wong

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Medvedev, P., Brudno, M. (2008). Ab Initio Whole Genome Shotgun Assembly with Mated Short Reads. In: Vingron, M., Wong, L. (eds) Research in Computational Molecular Biology. RECOMB 2008. Lecture Notes in Computer Science(), vol 4955. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78839-3_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-78839-3_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-78838-6

  • Online ISBN: 978-3-540-78839-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics