Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4614))

Abstract

DNA sequences are the fundamental information for each species and a comparison between DNA sequences of different species is an important task. Since DNA sequences are very long and there exist many species, not only fast matching but also efficient storage is an important factor for DNA sequences. Thus, a fast string matching method suitable for encoded DNA sequences is needed. In this paper, we present a fast string matching method for encoded DNA sequences which does not decode DNA sequences while matching. We use four-characters-to-one-byte encoding and combine a suffix approach and a multi-pattern matching approach. Experimental results show that our method is about 5 times faster than AGREP and the fastest among known algorithms.

This work was supported by FPR05A2-341 of 21C Frontier Functional Proteomics Project from Korean Ministry of Science & Technology.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Amir, A., Benson, G.: Efficient Two-Dimensional Compressed Matching. In: Data Compression Conference, pp. 279–288 (1992)

    Google Scholar 

  2. Amir, A., Benson, G., Farach, M.: Let Sleeping Files Lie: Pattern Matching in Z-compressed Files. In: 5th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 705–714 (1994)

    Google Scholar 

  3. Allauzen, C., Crochemore, M., Raffinot, M.: Efficient experimental string matching by weak factor recognition. In: Amir, A., Landau, G.M. (eds.) CPM 2001. LNCS, vol. 2089, pp. 51–72. Springer, Heidelberg (2001)

    Google Scholar 

  4. Baeza-Yates, R., Gonnet, G.H.: A New Approach to Text Searching. Communications of the ACM 35(10), 74–82 (1992)

    Article  Google Scholar 

  5. BLAST, http://www.ncbi.nlm.nih.gov/BLAST

  6. Boyer, R.S., Strother Moore, J.: A Fast String Searching Algorithm. Communications of the ACM 20(10), 762–772 (1977)

    Article  Google Scholar 

  7. Charras, C., Lecroq, T., Daniel Pehoushek, J.: A Very Fast String Matching Algorithm for Small Alphabets and Long Patterns. In: Farach-Colton, M. (ed.) CPM 1998. LNCS, vol. 1448, pp. 55–64. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  8. Chen, L., Lu, S., Ram, J.: Compressed Pattern Matching in DNA Sequences. In: CSB 2004. IEEE Computational Systems Bioinformatics Conference, pp. 62–68 (2004)

    Google Scholar 

  9. Commentz-Walter, B.: A String Matching Algorithm Fast on the Average. In: Maurer, H.A. (ed.) Automata, Languages, and Programming. LNCS, vol. 71, pp. 118–132. Springer, Heidelberg (1979)

    Google Scholar 

  10. Commentz-Walter, B.: A String Matching Algorithm Fast on the Average. Technical Report TR 79.09.007, IBM Germany, Heidelberg Scientific Center (1979)

    Google Scholar 

  11. FASTA, http://www.ebi.ac.uk/fasta

  12. Franek, F., Jennings, C.G., Smyth, W.F.: A Simple Fast Hybrid Pattern-Matching Algorithm. In: Apostolico, A., Crochemore, M., Park, K. (eds.) CPM 2005. LNCS, vol. 3537, pp. 288–297. Springer, Heidelberg (2005)

    Google Scholar 

  13. Fredriksson, K.: Shift-Or String Matching with Super-Alphabets. Information Processing Letters 87(4), 201–204 (2003)

    Article  MathSciNet  Google Scholar 

  14. Fredriksson, K., Grabowski, S.: Practical and Optimal String Matching. In: Consens, M.P., Navarro, G. (eds.) SPIRE 2005. LNCS, vol. 3772, pp. 376–387. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  15. Nigel Horspool, R.: Practical Fast Searching in Strings. Software Practice and Experience 10(6), 501–506 (1980)

    Article  Google Scholar 

  16. Knuth, D.E., Morris Jr, J.H., Pratt, V.R.: Fast pattern matching in strings. SIAM Journal on Computing 6, 323–350 (1977)

    Article  MATH  MathSciNet  Google Scholar 

  17. Manber, U.: A Text Compression Scheme That Allows Fast Searching Directly in the Compressed File. ACM Transactions on Information Systems 15(2), 124–136 (1997)

    Article  Google Scholar 

  18. de Moura, E.S., Navarro, G., Ziviani, N., Baeza-Yates, R.: Direct Pattern Matching on Compressed Text. In: 5th International Symposium on String Processing and Information Retrieval, pp. 90–95. IEEE Computer Society Press, Los Alamitos (1998)

    Google Scholar 

  19. Navarro, G., Raffinot, M.: Fast and Flexible String Matching by Combining Bit-Parallelism and Suffix Automata. ACM Journal of Experimental Algorithmics 5(4) (2000)

    Google Scholar 

  20. Navarro, G., Raffinot, M.: Flexible Pattern Matching in Strings: Practical On-Line Search Algorithms for Texts and Biological Sequences. Cambridge University Press, Cambridge (2002)

    MATH  Google Scholar 

  21. Navarro, G., Raffinot, M.: Practical and Flexible Pattern Matching over Ziv-Lempel Compressed Text. Journal of Discrete Algorithms 2(3), 347–371 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  22. Navarro, G., Tarhio, J.: LZgrep: a Boyer-Moore String Matching Tool for Ziv-Lempel Compressed Text. Software-Practice and Experience 35(12), 1107–1130 (2005)

    Article  Google Scholar 

  23. Shibata, Y., Kida, T., Fukamachi, S., Takeda, M., Shinohara, A., Shinohara, T., Arikawa, S.: Speeding Up Pattern Matching by Text Compression. In: Bongiovanni, G., Petreschi, R., Gambosi, G. (eds.) CIAC 2000. LNCS, vol. 1767, pp. 306–315. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  24. Shibata, Y., Matsumoto, T., Takeda, M., Shinohara, A., Arikawa, S.: A Boyer-Moore Type Algorithm for Compressed Pattern Matching. In: Giancarlo, R., Sankoff, D. (eds.) CPM 2000. LNCS, vol. 1848, pp. 181–194. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  25. Sunday, D.M.: A Very Fast Substring Search Algorithm. Communications of the ACM 33(8), 132–142 (1990)

    Article  Google Scholar 

  26. Tarhio, J., Peltola, H.: String Matching in the DNA Alphabet. Software-Practice and Experience 27(7), 851–861 (1997)

    Article  Google Scholar 

  27. Wu, S., Manber, U.: Fast Text Searching Allowing Errors. Communications of the ACM 35(10), 83–91 (1992)

    Article  Google Scholar 

  28. Wu, S., Manber, U.: AGREP - A Fast Approximate Pattern-matching Tool. In: The Winter 1992 USENIX Conference, pp. 153–162 (1992)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Bo Chen Mike Paterson Guochuan Zhang

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kim, J.W., Kim, E., Park, K. (2007). Fast Matching Method for DNA Sequences. In: Chen, B., Paterson, M., Zhang, G. (eds) Combinatorics, Algorithms, Probabilistic and Experimental Methodologies. ESCAPE 2007. Lecture Notes in Computer Science, vol 4614. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74450-4_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74450-4_25

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-74449-8

  • Online ISBN: 978-3-540-74450-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics