Abstract
Given two strings, a pattern P of length m and a text T of length n over some alphabet Σ of size σ, we consider the exact string matching problem, i.e. we want to report all occurrences of P in T. The well-known Backward-Nondeterministic-DAWG-Matching (BNDM) algorithm is one of the most efficient algorithm for short to moderate length patterns. In this paper – as a prelude – we take the underlying nondeterministic suffix automaton and apply it to the text instead of to the pattern. The resulting algorithm is surprisingly simple, and efficient for relatively short patterns and small alphabet sizes in practice. We then show how the algorithm can be easily adapted to construct the suffix tree of T in a lazy manner. Both of the algorithms are efficient if the text is static but the patterns are given on-line (without possibility to batch the queries). We discuss various variants of the algorithms, and conclude with some experimental results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aho, A.V., Corasick, M.J.: Efficient string matching: an aid to bibliographic search. Communications of the ACM 18(6), 333–340 (1975)
Allauzen, C., Raffinot, M.: Simple optimal string matching. J. of Algorithms 36, 102–116 (2000)
Apostolico, A.: The myriad virtues of suffix trees. In: Apostolico, A., Galil, Z. (eds.) Combinatorial Algorithms on Words. NATO Advanced Science Institutes, Series F, vol. 12, pp. 85–96. Springer, Heidelberg (1985)
Baeza-Yates, R.A., Gonnet, G.H.: A new approach to text searching. Communications of the ACM 35(10), 74–82 (1992)
Bille, P.: Fast searching in packed strings. In: Kucherov, G., Ukkonen, E. (eds.) CPM 2009. LNCS, vol. 5577, pp. 116–126. Springer, Heidelberg (2009)
Boyer, R.S., Moore, J.S.: A fast string searching algorithm. Communications of the ACM 20(10), 762–772 (1977)
Claude, F., Navarro, G., Peltola, H., Salmela, L., Tarhio, J.: Speeding up pattern matching by text sampling. In: Amir, A., Turpin, A., Moffat, A. (eds.) SPIRE 2008. LNCS, vol. 5280, pp. 87–98. Springer, Heidelberg (2008)
Crochemore, M., Czumaj, A., Ga̧sieniec, L., Jarominek, S., Lecroq, T., Plandowski, W., Rytter, W.: Speeding up two string matching algorithms. Algorithmica 12(4/5), 247–267 (1994)
Crochemore, M., Rytter, W.: Text algorithms. Oxford University Press, Oxford (1994)
Crochemore, M., Rytter, W.: Jewels of Stringology. World Scientific, Singapore (2002)
Deorowicz, S.: Computing the longest common transposition-invariant subsequence with GPU. In: Proceedings of Man-Machine Interactions, Advances in Intelligent and Soft Computing, vol. 59, pp. 551–559. Springer, Heidelberg (2009)
Farach, M.: Optimal suffix tree construction with large alphabets. In: Proceedings of FOCS 1997, pp. 137–143. IEEE, Los Alamitos (1997)
Fredkin, E.: Trie memory. Communications of the ACM 3(9), 490–499 (1960)
Fredriksson, K.: Row-wise tiling for the myers’ bit-parallel approximate string matching algorithm. In: Nascimento, M.A., de Moura, E.S., Oliveira, A.L. (eds.) SPIRE 2003. LNCS, vol. 2857, pp. 66–79. Springer, Heidelberg (2003)
Fredriksson, K., Grabowski, S.: Average-optimal string matching. J. Discrete Algorithms 7(4), 579–594 (2009)
Giegerich, R., Kurtz, S., Stoye, J.: Efficient implementation of lazy suffix trees. Softw., Pract. Exper. 33(11), 1035–1049 (2003)
Grabowski, S., Fredriksson, K.: Bit-parallel string matching under Hamming distance in O(n⌈m/w ⌉) worst case time. Information Processing Letters 105(5), 182–187 (2008)
Gusfield, D.: Algorithms on strings, trees and sequences: computer science and computational biology. Cambridge University Press, Cambridge (1997)
He, L., Fang, B.: Linear nondeterministic dawg string matching algorithm. In: Apostolico, A., Melucci, M. (eds.) SPIRE 2004. LNCS, vol. 3246, pp. 70–71. Springer, Heidelberg (2004)
Hyyrö, H., Fredriksson, K., Navarro, G.: Increased bit-parallelism for approximate and multiple string matching. ACM J. of Experimental Algorithmics 10(2.6), 1–27 (2005)
Kärkkäinen, J., Ukkonen, E.: Sparse suffix trees. In: Cai, J.-Y., Wong, C.K. (eds.) COCOON 1996. LNCS, vol. 1090, pp. 219–230. Springer, Heidelberg (1996)
Knuth, D.: The art of computer programming: Combinatorial algorithms. Pre-fascicle 1a. Draft of section 7.1.3: Bitwise tricks and techniques (2008)
Knuth, D.E., Morris Jr, J.H., Pratt, V.R.: Fast pattern matching in strings. SIAM Journal on Computing 6(1), 323–350 (1977)
McCreight, E.M.: A space-economical suffix tree construction algorithm. J. Algorithms 23(2), 262–272 (1976)
Navarro, G.: NR-grep: a fast and flexible pattern matching tool. Softw. Pract. Exp. 31, 1265–1312 (2001)
Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Computing Surveys 39(1), article 2 (2007)
Navarro, G., Raffinot, M.: Fast and flexible string matching by combining bit-parallelism and suffix automata. ACM J. of Experimental Algorithmics 5(4) (2000)
Navarro, G., Raffinot, M.: Flexible Pattern Matching in Strings – Practical on-line search algorithms for texts and biological sequences. Cambridge University Press, Cambridge (2002)
Peltola, H., Tarhio, J.: Alternative algorithms for bit-parallel string matching. In: Nascimento, M.A., de Moura, E.S., Oliveira, A.L. (eds.) SPIRE 2003. LNCS, vol. 2857, pp. 80–94. Springer, Heidelberg (2003)
Raman, R.: Priority queues: Small, monotone and trans-dichotomous. In: Díaz, J. (ed.) ESA 1996. LNCS, vol. 1136, pp. 121–137. Springer, Heidelberg (1996)
Thorup, M.: Combinatorial power in multimedia processors. SIGARCH Comput. Archit. News 31(4), 5–11 (2003)
Thorup, M.: On AC0 implementations of fusion trees and atomic heaps. In: Proceedings of SODA 2003, pp. 699–707. SIAM, Philadelphia (2003)
Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14(3), 249–260 (1995)
Weiner, P.: Linear pattern matching algorithm. In: Proceedings of the 14th Annual IEEE Symposium on Switching and Automata Theory, pp. 1–11 (1973)
Wu, S., Manber, U.: Fast text searching allowing errors. Communications of the ACM 35(10), 83–91 (1992)
Yao, A.C.: The complexity of pattern matching for a random string. SIAM Journal on Computing 8(3), 368–387 (1979)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Fredriksson, K. (2010). From Nondeterministic Suffix Automaton to Lazy Suffix Tree. In: Elomaa, T., Mannila, H., Orponen, P. (eds) Algorithms and Applications. Lecture Notes in Computer Science, vol 6060. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12476-1_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-12476-1_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12475-4
Online ISBN: 978-3-642-12476-1
eBook Packages: Computer ScienceComputer Science (R0)