Read Mapping Algorithms for Single Molecule Sequencing Data

Yanovsky, Vladimir; Rumble, Stephen M.; Brudno, Michael

doi:10.1007/978-3-540-87361-7_4

Vladimir Yanovsky¹,
Stephen M. Rumble¹ &
Michael Brudno^1,2

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 5251))

Included in the following conference series:

International Workshop on Algorithms in Bioinformatics

1065 Accesses
1 Citations

Abstract

Single Molecule Sequencing technologies such as the Heliscope simplify the preparation of DNA for sequencing, while sampling millions of reads in a day. Simultaneously, the technology suffers from a significantly higher error rate, ameliorated by the ability to sample multiple reads from the same location. In this paper we develop novel rapid alignment algorithms for two-pass Single Molecule Sequencing methods. We combine the Weighted Sequence Graph (WSG) representation of all optimal and near optimal alignments between the two reads sampled from a piece of DNA with k-mer filtering methods and spaced seeds to quickly generate candidate locations for the reads on the reference genome. We also propose a fast implementation of the Smith-Waterman algorithm using vectorized instructions that significantly speeds up the matching process. Our method combines these approaches in order to build an algorithm that is both fast and accurate, since it is able to take complete advantage of both of the reads sampled during two pass sequencing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Mapping Algorithms in High-Throughput Sequencing

Bazam: a rapid method for read extraction and realignment of high-throughput sequencing data

Article Open access 18 April 2019

SMURF-seq: efficient copy number profiling on long-read sequencers

Article Open access 08 July 2019

References

Califano, A., Rigoutsos, I.: Flash: A fast look-up algorithm for string homology. In: Proceedings of the 1st International Conference on Intelligent Systems for Molecular Biology, pp. 56–64. AAAI Press, Menlo Park (1993)
Google Scholar
Chenna, R., Sugawara, H., Koike, T., Lopez, R., Gibson, T.J., Higgins, D.G., Thompson, J.D.: Multiple sequence alignment with the clustal series of programs. Nucleic Acids Res. 31(13), 3497–3500 (2003)
Article Google Scholar
Jettand, J.H., et al.: High-speed DNA sequencing: an approach based upon fluorescence detection of single molecules. Journal of biomolecular structure and dynamics 7(2), 301–309 (1989)
Google Scholar
Harris, T.D., et al.: Single-molecule DNA sequencing of a viral genome. Science 320(5872), 106–109 (2008)
Article Google Scholar
Farrar, M.: Striped Smith-Waterman speeds database searches six times over other SIMD implementations. Bioinformatics 23(2), 156–161 (2007)
Article Google Scholar
Hein, J.: A new method that simultaneously aligns and reconstructs ancestral sequences for any number of homologous sequences, when the phylogeny is given. Molecular Biology and Evolution 6(6), 649–668 (1989)
Google Scholar
Naor, D., Brutlag, D.L.: On near-optimal alignments of biological sequences. Journal of Computational Biology 1(4), 349–366 (1994)
Google Scholar
Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequences of two proteins. JMB 48, 443–453 (1970)
Article Google Scholar
Rasmussen, K.R., Stoye, J., Myers, E.W.: Efficient q-gram filters for finding all epsilon-matches over a given length. Journal of Computational Biology 13(2), 296–308 (2006)
Article MathSciNet Google Scholar
Rognes, T., Seeberg, E.: Six-fold speed-up of smith-waterman sequence database searches using parallel processing on common microprocessors. Bioinformatics 16(8), 699–706 (2000)
Article Google Scholar
Schwikowski, B., Vingron, M.: Weighted sequence graphs: boosting iterated dynamic programming using locally suboptimal solutions. Discrete Appl. Math. 127(1), 95–117 (2003)
Article MATH MathSciNet Google Scholar
Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. Journal of Molecular Biology 147, 195–197 (1981)
Article Google Scholar
Wozniak, A.: Using video-oriented instructions to speed up sequence comparison. Computer Applications in the Biosciences 13(2), 145–150 (1997)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science,
Vladimir Yanovsky, Stephen M. Rumble & Michael Brudno
Donnelly Centre for Cellular and Biomolecular Research, University of Toronto,
Michael Brudno

Authors

Vladimir Yanovsky
View author publications
You can also search for this author in PubMed Google Scholar
Stephen M. Rumble
View author publications
You can also search for this author in PubMed Google Scholar
Michael Brudno
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Keith A. Crandall Jens Lagergren

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yanovsky, V., Rumble, S.M., Brudno, M. (2008). Read Mapping Algorithms for Single Molecule Sequencing Data. In: Crandall, K.A., Lagergren, J. (eds) Algorithms in Bioinformatics. WABI 2008. Lecture Notes in Computer Science(), vol 5251. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87361-7_4

Download citation

DOI: https://doi.org/10.1007/978-3-540-87361-7_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87360-0
Online ISBN: 978-3-540-87361-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Read Mapping Algorithms for Single Molecule Sequencing Data

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Mapping Algorithms in High-Throughput Sequencing

Bazam: a rapid method for read extraction and realignment of high-throughput sequencing data

SMURF-seq: efficient copy number profiling on long-read sequencers

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Read Mapping Algorithms for Single Molecule Sequencing Data

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Mapping Algorithms in High-Throughput Sequencing

Bazam: a rapid method for read extraction and realignment of high-throughput sequencing data

SMURF-seq: efficient copy number profiling on long-read sequencers

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation