Skip to main content

The Range Automaton: An Efficient Approach to Text-Searching

  • Conference paper
  • First Online:
Combinatorics on Words (WORDS 2021)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12847))

Included in the following conference series:

Abstract

String matching is one of the most extensively studied problems in computer science, mainly due to its direct applications to such diverse areas as text, image and signal processing, speech analysis and recognition, information retrieval, data compression, computational biology and chemistry. In the last few decades a myriad of alternative solutions have been proposed, based on very different techniques. However, automata have always played a very important role in the design of efficient string matching algorithms. In this paper we introduce the Range Automaton, a weak yet efficient variant of the non-deterministic suffix automaton of a string whose configuration can be encoded in a very simple form and which is particularly suitable to be used for solving text-searching problems. As a first example of its effectiveness we present an efficient string matching algorithm based on the Range Automaton, named Backward Range Automaton Matcher, which turns out to be very fast in many practical cases. Despite our algorithm has a quadratic worst-case time complexity, experimental results show that it obtains in most cases the best running times when compared against the most effective automata based algorithms. In the case of long patterns, the speed-up reaches \(250\%\). This makes our proposed solution one of the most flexible algorithms in practical cases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Search speed of MAS and its variants, MAS\(_4\) and TMAS, has been omitted starting from \(m = 256\), since the preprocessing time of such solutions become prohibitive as the length of the pattern increases.

  2. 2.

    We notice that the EPSM algorithm is designed for simply counting the number of matching occurrences without reporting the corresponding positions.

  3. 3.

    The source code of the new BRAM algorithm is available at the following link: https://github.com/ostafen/range-automaton.

  4. 4.

    Additional details on the sequences can be found in Faro et al. [10].

References

  1. Baeza-Yates, R.A., Gonnet, G.H.: A new approach to text searching. Commun. ACM 35(10), 74–82 (1992). https://doi.org/10.1145/135239.135243

    Article  Google Scholar 

  2. Boyer, R.S., Strother Moore, J.: A fast string searching algorithm. Commun. ACM 20(10), 762–772 (1977). https://doi.org/10.1145/359842.359859

    Article  MATH  Google Scholar 

  3. Cantone, D., Faro, S., Giaquinta, F.: A compact representation of nondeterministic (suffix) automata for the bit-parallel approach. Inf. Comput. 213, 3–12 (2012). https://doi.org/10.1016/j.ic.2011.03.006

    Article  MathSciNet  MATH  Google Scholar 

  4. Cantone, D., Faro, S., Pavone, A.: Linear and efficient string matching algorithms based on weak factor recognition. ACM J. Exp. Algorithmics 24(1), 1..8:1-1.8:20 (2019). https://doi.org/10.1145/3301295

    Article  MathSciNet  MATH  Google Scholar 

  5. Crochemore, M.: Text Algorithms. Oxford University Press, Oxford (1994).http://www-igm.univ-mlv.fr/%7Emac/REC/B1.html

    MATH  Google Scholar 

  6. Durian, B., Peltola, H., Salmela, L., Salmela, J.: Bit-parallel search algorithms for long patterns. In: Festa, P. (ed.) SEA 2010. LNCS, vol. 6049, pp. 129–140. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13193-6_12

    Chapter  Google Scholar 

  7. Faro, S., Oguzhan Külekci, M.: Fast and flexible packed string matching. J. Discrete Algorithms 28, 61–72 (2014)

    Article  MathSciNet  Google Scholar 

  8. Faro, S., Lecroq, T.: A fast suffix automata based algorithm for exact online string Matching. In: Moreira, N., Reis, R. (eds.) CIAA 2012. LNCS, vol. 7381, pp. 149–158. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31606-7_13

    Chapter  MATH  Google Scholar 

  9. Faro, S., Lecroq, T.: The exact online string matching problem: a review of the most recent results. ACM Comput. Surv 45(2), 13:1-13:42 (2013). https://doi.org/10.1145/2431211.2431212

    Article  MATH  Google Scholar 

  10. Faro, S., Lecroq, T., Borzi, S., Di Mauro, S., Maggio, A.: The string matching algorithms research tool. In: 2016 Proceedings of the Prague Stringology Conference, pp. 99–111 (2016). http://www.stringology.org/event/2016/p09.html

  11. Knuth, D.E., Morris, J.H., Jr., Pratt, V.R.: Fast pattern matching in strings. SIAM J. Comput., 6(2), 323–350 (1977). https://doi.org/10.1137/0206024

    Article  MathSciNet  MATH  Google Scholar 

  12. Lecroq, T.: Fast exact string matching algorithms. Inf. Process. Lett 1012(6), 229–235 (2007). https://doi.org/10.1016/j.ipl.2007.01.002

    Article  MathSciNet  MATH  Google Scholar 

  13. Navarro, G., Raffinot, M.: A bit-parallel approach to suffix automata: fast extended string matching. In: Farach-Colton, M. (ed.) CPM 1998. LNCS, vol. 1448, pp. 14–33. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0030778

    Chapter  MATH  Google Scholar 

  14. Peltola, H., Tarhio, J.: Alternative algorithms for bit-parallel string matching. In: Nascimento, M.A., de Moura, E.S., Oliveira, A.L. (eds.) SPIRE 2003. LNCS, vol. 2857, pp. 80–93. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-39984-1_7

    Chapter  MATH  Google Scholar 

  15. Ryu, C., Lecroq, T., Park, K.: Fast string matching for DNA sequences. Theor. Comput. Sci. 812, 137–148 (2020). https://doi.org/10.1016/j.tcs.2019.09.031

    Article  MathSciNet  MATH  Google Scholar 

  16. Uratani, N., Takeda, M.: A fast string-searching algorithm for multiple patterns. Inf. Process. Manag 29(6), 775–792 (1993). https://doi.org/10.1016/0306-4573(93)90106-N

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Simone Faro .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Faro, S., Scafiti, S. (2021). The Range Automaton: An Efficient Approach to Text-Searching. In: Lecroq, T., Puzynina, S. (eds) Combinatorics on Words. WORDS 2021. Lecture Notes in Computer Science(), vol 12847. Springer, Cham. https://doi.org/10.1007/978-3-030-85088-3_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-85088-3_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-85087-6

  • Online ISBN: 978-3-030-85088-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics