Abstract
String matching is one of the most extensively studied problems in computer science, mainly due to its direct applications to such diverse areas as text, image and signal processing, speech analysis and recognition, information retrieval, data compression, computational biology and chemistry. In the last few decades a myriad of alternative solutions have been proposed, based on very different techniques. However, automata have always played a very important role in the design of efficient string matching algorithms. In this paper we introduce the Range Automaton, a weak yet efficient variant of the non-deterministic suffix automaton of a string whose configuration can be encoded in a very simple form and which is particularly suitable to be used for solving text-searching problems. As a first example of its effectiveness we present an efficient string matching algorithm based on the Range Automaton, named Backward Range Automaton Matcher, which turns out to be very fast in many practical cases. Despite our algorithm has a quadratic worst-case time complexity, experimental results show that it obtains in most cases the best running times when compared against the most effective automata based algorithms. In the case of long patterns, the speed-up reaches \(250\%\). This makes our proposed solution one of the most flexible algorithms in practical cases.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Search speed of MAS and its variants, MAS\(_4\) and TMAS, has been omitted starting from \(m = 256\), since the preprocessing time of such solutions become prohibitive as the length of the pattern increases.
- 2.
We notice that the EPSM algorithm is designed for simply counting the number of matching occurrences without reporting the corresponding positions.
- 3.
The source code of the new BRAM algorithm is available at the following link: https://github.com/ostafen/range-automaton.
- 4.
Additional details on the sequences can be found in Faro et al. [10].
References
Baeza-Yates, R.A., Gonnet, G.H.: A new approach to text searching. Commun. ACM 35(10), 74–82 (1992). https://doi.org/10.1145/135239.135243
Boyer, R.S., Strother Moore, J.: A fast string searching algorithm. Commun. ACM 20(10), 762–772 (1977). https://doi.org/10.1145/359842.359859
Cantone, D., Faro, S., Giaquinta, F.: A compact representation of nondeterministic (suffix) automata for the bit-parallel approach. Inf. Comput. 213, 3–12 (2012). https://doi.org/10.1016/j.ic.2011.03.006
Cantone, D., Faro, S., Pavone, A.: Linear and efficient string matching algorithms based on weak factor recognition. ACM J. Exp. Algorithmics 24(1), 1..8:1-1.8:20 (2019). https://doi.org/10.1145/3301295
Crochemore, M.: Text Algorithms. Oxford University Press, Oxford (1994).http://www-igm.univ-mlv.fr/%7Emac/REC/B1.html
Durian, B., Peltola, H., Salmela, L., Salmela, J.: Bit-parallel search algorithms for long patterns. In: Festa, P. (ed.) SEA 2010. LNCS, vol. 6049, pp. 129–140. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13193-6_12
Faro, S., Oguzhan Külekci, M.: Fast and flexible packed string matching. J. Discrete Algorithms 28, 61–72 (2014)
Faro, S., Lecroq, T.: A fast suffix automata based algorithm for exact online string Matching. In: Moreira, N., Reis, R. (eds.) CIAA 2012. LNCS, vol. 7381, pp. 149–158. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31606-7_13
Faro, S., Lecroq, T.: The exact online string matching problem: a review of the most recent results. ACM Comput. Surv 45(2), 13:1-13:42 (2013). https://doi.org/10.1145/2431211.2431212
Faro, S., Lecroq, T., Borzi, S., Di Mauro, S., Maggio, A.: The string matching algorithms research tool. In: 2016 Proceedings of the Prague Stringology Conference, pp. 99–111 (2016). http://www.stringology.org/event/2016/p09.html
Knuth, D.E., Morris, J.H., Jr., Pratt, V.R.: Fast pattern matching in strings. SIAM J. Comput., 6(2), 323–350 (1977). https://doi.org/10.1137/0206024
Lecroq, T.: Fast exact string matching algorithms. Inf. Process. Lett 1012(6), 229–235 (2007). https://doi.org/10.1016/j.ipl.2007.01.002
Navarro, G., Raffinot, M.: A bit-parallel approach to suffix automata: fast extended string matching. In: Farach-Colton, M. (ed.) CPM 1998. LNCS, vol. 1448, pp. 14–33. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0030778
Peltola, H., Tarhio, J.: Alternative algorithms for bit-parallel string matching. In: Nascimento, M.A., de Moura, E.S., Oliveira, A.L. (eds.) SPIRE 2003. LNCS, vol. 2857, pp. 80–93. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-39984-1_7
Ryu, C., Lecroq, T., Park, K.: Fast string matching for DNA sequences. Theor. Comput. Sci. 812, 137–148 (2020). https://doi.org/10.1016/j.tcs.2019.09.031
Uratani, N., Takeda, M.: A fast string-searching algorithm for multiple patterns. Inf. Process. Manag 29(6), 775–792 (1993). https://doi.org/10.1016/0306-4573(93)90106-N
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Faro, S., Scafiti, S. (2021). The Range Automaton: An Efficient Approach to Text-Searching. In: Lecroq, T., Puzynina, S. (eds) Combinatorics on Words. WORDS 2021. Lecture Notes in Computer Science(), vol 12847. Springer, Cham. https://doi.org/10.1007/978-3-030-85088-3_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-85088-3_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-85087-6
Online ISBN: 978-3-030-85088-3
eBook Packages: Computer ScienceComputer Science (R0)