Skip to main content

A Bit-Parallel, General Integer-Scoring Sequence Alignment Algorithm

  • Conference paper
Combinatorial Pattern Matching (CPM 2013)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7922))

Included in the following conference series:

Abstract

Mapping of next-generation sequencing data and other pro-cessor-intensive sequence comparison applications have motivated a continued search for high efficiency sequence alignment algorithms. In one approach, which exploits the inherent parallelism in computer logic calculations, individual cells in an alignment scoring matrix are represented as bits in a computer word and the calculation of scores is emulated by a series of bit operations comprised of AND, OR, XOR, complement, shift, and addition. Bit-parallelism has been successfully applied to the Longest Common Subsequence (LCS) and edit-distance problems, producing solutions which are significantly faster than standard implementations. But, the intensive mental effort required to produce these solutions, which are closely tied to special properties of the problems, has limited efforts to extend bit-parallelism to more general scoring schemes. In this paper, we give the first bit-parallel solution for general, integer-scoring global alignment. Integer-scoring schemes, which are widely used, assign integer weights for match, mismatch, and insertion/deletion or indel. Our method depends on structural properties of the relationship between adjacent scores in the scoring matrix. We utilize these properties to construct a class of efficient algorithms, each designed for a particular set of weights, and we introduce a standard for characterizing the efficiency in terms of the average number of bit-operations per cell of the original scoring matrix.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 49.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Allison, L., Dix, T.I.: A bit-string longest-common-subsequence algorithm. Information Processing Letters 23(5), 305–310 (1986)

    Article  MathSciNet  Google Scholar 

  2. Bergeron, A., Hamel, S.: Vector algorithms for approximate string matching. International Journal of Foundations of Computer Science 13(01), 53–65 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  3. Crochemore, M., Iliopoulos, C.S., Pinzon, Y.J., Reid, J.F.: A fast and practical bit-vector algorithm for the longest common subsequence problem. Information Processing Letters 80(6), 279–285 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  4. Gelfand, Y., Loving, J., Hernandez, Y., Benson, G.: VNTRseek – A Computational Pipeline to Detect Tandem Repeat Variants in Next-Generation Sequencing Data: Analysis of the 454 Watson Genome. In: Proc. of RECOMB-seq: The Third Annual RECOMB Satellite Workshop on Massively Parallel Sequencing (to appear, 2013)

    Google Scholar 

  5. Hyyrö, H.: Bit-parallel LCS-length computation revisited. In: Proc. 15th Australasian Workshop on Combinatorial Algorithms, AWOCA 2004 (2004)

    Google Scholar 

  6. Hyyrö, H., Fredriksson, K., Navarro, G.: Increased bit-parallelism for approximate and multiple string matching. Journal of Experimental Algorithmics (JEA) 10, 2–6 (2005)

    Google Scholar 

  7. Kernighan, B.W., Ritchie, D.M.: The C programming language, 2nd edn. Prentice Hall (1988)

    Google Scholar 

  8. Myers, G.: A fast bit-vector algorithm for approximate string matching based on dynamic programming. Journal of the ACM (JACM) 46(3), 395–415 (1999)

    Article  MATH  Google Scholar 

  9. Navarro, G.: Approximate regular expression searching with arbitrary integer weights. Nordic Journal of Computing 11(4), 356–373 (2004)

    MathSciNet  MATH  Google Scholar 

  10. Needleman, S., Wunch, C.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453 (1970)

    Article  Google Scholar 

  11. Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. Journal of Molecular Biology 147(1), 195–197 (1981)

    Article  Google Scholar 

  12. Wu, S., Manber, U.: Fast text searching: allowing errors. Communications of the ACM 35(10), 83–91 (1992)

    Article  Google Scholar 

  13. Wu, S., Manber, U., Myers, G.: A subquadratic algorithm for approximate limited expression matching. Algorithmica 15(1), 50–67 (1996)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Benson, G., Hernandez, Y., Loving, J. (2013). A Bit-Parallel, General Integer-Scoring Sequence Alignment Algorithm. In: Fischer, J., Sanders, P. (eds) Combinatorial Pattern Matching. CPM 2013. Lecture Notes in Computer Science, vol 7922. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38905-4_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-38905-4_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-38904-7

  • Online ISBN: 978-3-642-38905-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics