Skip to main content

Automata with Bounded Repetition in RE2

  • Conference paper
  • First Online:
Computer Aided Systems Theory – EUROCAST 2022 (EUROCAST 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13789))

Included in the following conference series:

Abstract

Regular expression (regex) matching has an irreplaceable role in software development. It is a computationally intensive process often applied on large texts. Predictability of its efficiency has a significant impact on the overall usability of software applications in practice. A problem is that standard approaches for regex matching suffer from high worst case complexity. An unlucky combination of a regex and text may increase the matching time by orders of magnitude. This can be a doorway for the so-called Regular Expression Denial of Service (ReDoS) attack in which the attacker causes a denial of service by providing a specially crafted regex or text. We focus on one of the sources of these attacks, which are regex with bounded repetition (e.g., ‘(ab)100’). Succinct representation and fast matching of such regexes can be archived by using a novel counting-set automaton. We present a C++ implementation of a matching algorithm based on the counting-set automaton. The implementation is done within RE2, which is a fast state-of-the-art regex matcher. We perform experiments on real-life regexes. The experiments show that implementation within the RE2 is faster than the original C# implementation.

This work has been supported by the FIT BUT internal project FIT-S-20-6427 and the Czech Science Foundation (project No. 19-24397S).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    These operations can be implemented to work in constant time (see [18]), hence simulation of CsA gives a fast matching algorithm for bounded repetition.

  2. 2.

    The implementation can be found at https://gitlab.com/MichalHorky/DP-re2-repository.

References

  1. Outage postmortem. https://stackstatus.net/post/147710624694/outage-postmortem-july-20-2016, Accessed 14 Mar 2022

  2. Chapman, C., Stolee, K.T.: Exploring regular expression usage and context in python. In: Proceedings of ISSTA 2016. Association for Computing Machinery (2016)

    Google Scholar 

  3. Davis, J.C.: Rethinking regex engines to address redos. In: Proceedings of ESEC/FSE 2019. Association for Computing Machinery (2019)

    Google Scholar 

  4. Davis, J.C., Coghlan, C.A., Servant, F., Lee, D.: The impact of regular expression denial of service (redos) in practice: an empirical study at the ecosystem scale. In: Proceedings of ESEC/FSE 2018. Association for Computing Machinery (2018)

    Google Scholar 

  5. Davis, J.C., Michael IV, L.G., Coghlan, C.A., Servant, F., Lee, D.: Why aren’t regular expressions a lingua franca? an empirical study on the re-use and portability of regular expressions. In: Proceedings of ESEC/FSE 2019. Association for Computing Machinery (2019)

    Google Scholar 

  6. docs.rs: regex - rust. https://docs.rs/regex/1.5.4/regex/

  7. Google: Re2. https://github.com/google/re2

  8. Graham-Cumming, J.: Details of the cloudflare outage on july 2, 2019. https://blog.cloudflare.com/details-of-the-cloudflare-outage-on-july-2-2019, Accessed 14 Mar 2022

  9. Haertel, M., et al.: GNU grep. https://www.gnu.org/software/grep/

  10. Holík, L., Lengál, O., Saarikivi, O., Turoňová, L., Veanes, M., Vojnar, T.: Succinct determinisation of counting automata via sphere construction. In: Lin, A.W. (ed.) APLAS 2019. LNCS, vol. 11893, pp. 468–489. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-34175-6_24

    Chapter  Google Scholar 

  11. Roesch, M., et al.: Snort: a network intrusion detection and prevention system. http://www.snort.org

  12. RegExLib.com: The Internet’s first Regular Expression Library. http://regexlib.com/

  13. Sommer, R., et al.: The Bro Network Security Monitor. http://www.bro.org

  14. Russ, C.: Regular expression matching in the wild (2010). https://swtch.com/rsc/regexp/regexp3.html, Accessed 18 May 2021

  15. Sipser, M.: Introduction to the theory of computation. SIGACT News 27(1), 27–29 (1996)

    Article  Google Scholar 

  16. The Sagan team: The Sagan Log Analysis Engine. https://quadrantsec.com/sagan_log_analysis_engine/

  17. Thompson, K.: Programming techniques: regular expression search algorithm. Commun. ACM 11(6), 419–422 (1968)

    Article  MATH  Google Scholar 

  18. Turoňová, L., Holík, L., Lengál, O., Saarikivi, O., Veanes, M., Vojnar, T.: Regex matching with counting-set automata. Proc. ACM Program. Lang. 4(OOPSLA) (2020)

    Google Scholar 

  19. Češka, M., Havlena, V., Holík, L., Lengál, O., Vojnar, T.: Approximate reduction of finite automata for high-speed network intrusion detection. In: Beyer, D., Huisman, M. (eds.) TACAS 2018. LNCS, vol. 10806, pp. 155–175. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-89963-3_9

    Chapter  Google Scholar 

  20. Yang, L., Karim, R., Ganapathy, V., Smith, R.: Improving NFA-based signature matching using ordered binary decision diagrams. In: Jha, S., Sommer, R., Kreibich, C. (eds.) RAID 2010. LNCS, vol. 6307, pp. 58–78. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15512-3_4

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lenka Turoňová .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Horký, M., Síč, J., Turoňová, L. (2022). Automata with Bounded Repetition in RE2. In: Moreno-Díaz, R., Pichler, F., Quesada-Arencibia, A. (eds) Computer Aided Systems Theory – EUROCAST 2022. EUROCAST 2022. Lecture Notes in Computer Science, vol 13789. Springer, Cham. https://doi.org/10.1007/978-3-031-25312-6_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-25312-6_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-25311-9

  • Online ISBN: 978-3-031-25312-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics