Skip to main content

Design of an Efficient Out-of-Core Read Alignment Algorithm

  • Conference paper
  • 826 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 6293))

Abstract

New genome sequencing technologies are poised to enter the sequencing landscape with significantly higher throughput of read data produced at unprecedented speeds and lower costs per run. However, current in-memory methods to align a set of reads to one or more reference genomes are ill-equipped to handle the expected growth of read-throughput from newer technologies.

This paper reports the design of a new out-of-core read mapping algorithm, Syzygy, which can scale to large volumes of read and genome data. The algorithm is designed to run in a constant, user-stipulated amount of main memory – small enough to fit on standard desktops – irrespective of the sizes of read and genome data. Syzygy achieves a superior spatial locality-of-reference that allows all large data structures used in the algorithm to be maintained on disk. We compare our prototype implementation with several popular read alignment programs. Our results demonstrate clearly that Syzygy can scale to very large read volumes while using only a fraction of memory in comparison, without sacrificing performance.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Margulies, M., Egholm, M., Altman, W., et al.: Genome sequencing in microfabricated high-density picolitre reactors. Nature 437, 376–380 (2005)

    CAS  PubMed  PubMed Central  Google Scholar 

  2. Shendure, J., Porreca, G.J., Reppas, N.B., Lin, X., Mccutcheon, J.P., Rosenbaum, A.M., Wang, M.D., Zhang, K., Mitra, R.D., Church, G.M.: Accurate multiplex polony sequencing of an evolved bacterial genome. Science 309, 1728–1732 (2005)

    Article  CAS  PubMed  Google Scholar 

  3. Boyer, R.S., Moore, J.S.: A fast string searching algorithm. Commun. ACM 20(10), 762–772 (1977)

    Article  Google Scholar 

  4. Knuth Jr., D.E., Pratt, V.R.: Fast pattern matching in strings. SIAM Journal on Computing 6(2), 323–350 (1977)

    Article  Google Scholar 

  5. Karp, R.M., Rabin, M.O.: Efficient randomized pattern-matching algorithms. IBM Journal of Research and Development 31(2), 249–260 (1987)

    Article  Google Scholar 

  6. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. Journal of Molecular Biology 215, 403–410 (1990)

    Article  CAS  PubMed  Google Scholar 

  7. Kent, W.J.: BLAT–the blast-like alignment tool 12, 656–664 (April 2002)

    Google Scholar 

  8. Cox, A.J.: Ultra-high throughput alignment of short sequence tags (2007) (unpublished)

    Google Scholar 

  9. Rumble, S.M., Lacroute, P., Dalca, A.V., Fiume, M., Sidow, A., Brudno, M.: SHRiMP: accurate mapping of short color-space reads. PLoS Computational Biology 5 (May 2009)

    Google Scholar 

  10. Li, H., Ruan, J., Durbin, R.: Mapping short dna sequencing reads and calling variants using mapping quality scores. Genome Research (August 2008)

    Google Scholar 

  11. Lin, H., Zhang, Z., Zhang, M.Q., Ma, B., Li, M.: ZOOM! zillions of oligos mapped. Bioinformatics 24, 2431–2437 (2008)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Jiang, H., Wong, W.H.: SeqMap: mapping massive amount of oligonucleotides to the genome. Bioinformatics 24, 2395–2396 (2008)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Li, R., Li, Y., Kristiansen, K., Wang, J.: SOAP: short oligonucleotide alignment program. Bioinformatics 24, 713–714 (2008)

    Article  CAS  PubMed  Google Scholar 

  14. Eaves, H.L., Gao, Y.: MOM: maximum oligonucleotide mapping. Bioinformatics 25, 969–970 (2009)

    Article  CAS  PubMed  Google Scholar 

  15. Campagna, D., Albiero, A., Bilardi, A., Caniato, E., Forcato, C., Manavski, S., Vitulo, N., Valle, G.: PASS: a program to align short sequences. Bioinformatics 25, 967–968 (2009)

    Article  CAS  PubMed  Google Scholar 

  16. Li, H., Durbin, R.: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Langmead, B., Trapnell, C., Pop, M., Salzberg, S.L.: Ultrafast and memory-efficient alignment of short dna sequences to the human genome. Genome Research 10 (March 2009)

    Google Scholar 

  18. http://www.vmatch.de/

  19. Malhis, N., Butterfield, Y.S., Ester, M., Jones, S.J.: Slider–maximum use of probability information for alignment of short sequence reads and snp detection. Bioinformatics 25, 6–13 (2009)

    Article  CAS  PubMed  Google Scholar 

  20. Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: Proceedings of Foundations of Computer Science, pp. 390–398 (2000)

    Google Scholar 

  21. McIlroy, P.K., Bostic, K., Mcilroy, M.D.: Engineering radix sort. Computing Systems 6, 5–27 (1993)

    Google Scholar 

  22. Kärkkäinen, J., Rantala, T.: Engineering radix sort for strings. In: Amir, A., Turpin, A., Moffat, A. (eds.) SPIRE 2008. LNCS, vol. 5280, pp. 3–14. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  23. The quest for an accelerated population count. In: Oram, A., Wilson, G. (eds.) Beautiful code, pp. 147–160. O‘ Reilly, Sebastopol (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Konagurthu, A.S., Allison, L., Conway, T., Beresford-Smith, B., Zobel, J. (2010). Design of an Efficient Out-of-Core Read Alignment Algorithm. In: Moulton, V., Singh, M. (eds) Algorithms in Bioinformatics. WABI 2010. Lecture Notes in Computer Science(), vol 6293. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15294-8_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-15294-8_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-15293-1

  • Online ISBN: 978-3-642-15294-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics