Skip to main content

Smaller Representation of Compiled Regular Expressions

  • Conference paper
  • First Online:
Implementation and Application of Automata (CIAA 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14151))

Included in the following conference series:

  • 220 Accesses

Abstract

We consider the problem of running the regex pattern matching in a space-efficient manner. Given a regex, we suggest a bit-packing scheme for representing a compiled regex in a compressed way, which is its position automaton. Our scheme reduces its representation size further by relying on the homogeneous property of position automata and practical features of regexes. We implement the proposed scheme and evaluate the memory consumption using a practical regex benchmark dataset. Our approach produces a much smaller representation compared to two common FA representations. In addition, experimental results show that our bit-packing regex engine is effective for matching regexes that have large compiled forms, by showing less memory consumption compared to the current state-of-the-art regex engine (RE2).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We bring this example from polyglot [13]. Here, is a character class for alphanumeric characters and an underscore.

References

  1. Almeida, A., Almeida, M., Alves, J., Moreira, N., Reis, R.: FAdo and GUItar. In: Maneth, S. (ed.) CIAA 2009. LNCS, vol. 5642, pp. 65–74. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02979-0_10

    Chapter  Google Scholar 

  2. Berglund, M., van der Merwe, B.: On the semantics of regular expression parsing in the wild. In: Proceedings of the 20th International Conference on Implementation and Application of Automata, pp. 292–304 (2015)

    Google Scholar 

  3. Berglund, M., van der Merwe, B., van Litsenborgh, S.: Regular expressions with lookahead. J. Univ. Comput. Sci. 27(4), 324–340 (2021)

    Google Scholar 

  4. Brüggemann-Klein, A.: Regular expressions into finite automata. Theoret. Comput. Sci. 120(2), 197–213 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  5. Caron, P., Ziadi, D.: Characterization of Glushkov automata. Theoret. Comput. Sci. 233(1–2), 75–90 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  6. Champarnaud, J., Coulon, F., Paranthoën, T.: Compact and fast algorithms for safe regular expression search. Int. J. Comput. Math. 81(4), 383–401 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  7. Chang, C., Paige, R.: From regular expressions to DFA’s using compressed NFA’s. Theoret. Comput. Sci. 178(1–2), 1–36 (1997)

    MathSciNet  MATH  Google Scholar 

  8. Contributors: Compilation and reuse in regular expressions, September 2021. https://learn.microsoft.com/en-us/dotnet/standard/base-types/compilation-and-reuse-in-regular-expressions. Accessed 25 Apr 2023

  9. Cortes, C., Mohri, M.: Learning with weighted transducers. In: Proceedings of the 7th International Workshop on Finite-State Methods and Natural Language Processing, pp. 14–22 (2008)

    Google Scholar 

  10. CTRE: Compile time regular expression in C++, January 2023. https://github.com/hanickadot/compile-time-regular-expressions. Accessed 25 Apr 2023

  11. Daciuk, J.: Experiments with automata compression. In: Proceedings of the 5th International Conference on Implementation and Application of Automata, pp. 105–112 (2001)

    Google Scholar 

  12. Daciuk, J., Weiss, D.: Smaller representation of finite state automata. In: Proceedings of the 16th International Conference on Implementation and Application of Automata, pp. 118–129 (2011)

    Google Scholar 

  13. Davis, J.C., Michael IV, L.G., Coghlan, C.A., Servant, F., Lee, D.: Why aren’t regular expressions a lingua franca? An empirical study on the re-use and portability of regular expressions. In: Proceedings of the 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 443–454 (2019)

    Google Scholar 

  14. Davis, J.C., Moyer, D., Kazerouni, A.M., Lee, D.: Testing regex generalizability and its implications: a large-scale many-language measurement study. In: Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering, pp. 427–439 (2019)

    Google Scholar 

  15. Giammarresi, D., Ponty, J., Wood, D., Ziadi, D.: A characterization of Thompson digraphs. Discret. Appl. Math. 134(1–3), 317–337 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  16. Glushkov, V.M.: The abstract theory of automata. Russ. Math. Surv. 16(5), 1–53 (1961)

    Article  MathSciNet  MATH  Google Scholar 

  17. Hossain, S.: Visualization of bioinformatics data with dash bio. In: Proceedings of the 18th Python in Science Conference, pp. 126–133 (2019)

    Google Scholar 

  18. Lin, J., Chen, W.M., Lin, Y., Cohn, J., Gan, C., Han, S.: MCUNet: tiny deep learning on IoT devices. In: Advances in Neural Information Processing System, vol. 33, pp. 11711–11722 (2020)

    Google Scholar 

  19. Luo, B., Lee, D., Lee, W., Liu, P.: QFilter: fine-grained run-time XML access control via NFA-based query rewriting. In: Proceedings of the 13th ACM International Conference on Information and Knowledge Management, pp. 543–552 (2004)

    Google Scholar 

  20. McNaughton, R., Yamada, H.: Regular expressions and state graphs for automata. IRE Trans. Electron. Comput. EC-9(1), 39–47 (1960)

    Google Scholar 

  21. Meiners, C.R., Patel, J., Norige, E., Torng, E., Liu, A.X.: Fast regular expression matching using small TCAMs for network intrusion detection and prevention systems. In: Proceedings of 19th USENIX Security Symposium, pp. 111–126 (2010)

    Google Scholar 

  22. van der Merwe, B., Mouton, J., van Litsenborgh, S., Berglund, M.: Memoized regular expressions. In: Maneth, S. (ed.) CIAA 2021. LNCS, vol. 12803, pp. 39–52. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-79121-6_4

    Chapter  Google Scholar 

  23. Nethercote, N., Seward, J.: Valgrind: a framework for heavyweight dynamic binary instrumentation. In: Proceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation, pp. 89–100 (2007)

    Google Scholar 

  24. Nunes, D.S.N., Ayala-Rincón, M.: A compressed suffix tree based implementation with low peak memory usage. Electron. Notes Theoret. Comput. Sci. 302, 73–94 (2014)

    Article  Google Scholar 

  25. Ramey, R.: November 2004. https://www.boost.org/doc/libs/1_82_0/libs/serialization/doc/index.html. Accessed 25 Apr 2023

  26. Raymond, D.R., Wood, D.: Grail: a C++ library for automata and expressions. J. Symb. Comput. 17(4), 341–350 (1994)

    Article  MATH  Google Scholar 

  27. Rodger, S.H., Finley, T.W.: JFLAP: An Interactive Formal Languages and Automata Package. Jones & Bartlett Learning (2006)

    Google Scholar 

  28. Sung, S., Cheon, H., Han, Y.S.: How to settle the ReDoS problem: back to the classical automata theory. In: Proceedings of the 26th International Conference on Implementation and Application of Automata, pp. 34–49 (2022)

    Google Scholar 

  29. Thompson, K.: Programming techniques: regular expression search algorithm. Commun. ACM 11(6), 419–422 (1968)

    Article  MATH  Google Scholar 

  30. Xeger: Xeger (2019). https://pypi.org/project/xeger/. Accessed 25 Apr 2023

  31. Zani, S., Riani, M., Corbellini, A.: Robust bivariate boxplots and multiple outlier detection. Comput. Stat. Data Anal. 28(3), 257–270 (1998)

    Article  MATH  Google Scholar 

Download references

Acknowledgments

We wish to thank the referees for the careful reading of the paper and valuable suggestions. This research was supported by the NRF grant funded by MIST (RS-2023-00208094).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yo-Sub Han .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sung, S., Ko, SK., Han, YS. (2023). Smaller Representation of Compiled Regular Expressions. In: Nagy, B. (eds) Implementation and Application of Automata. CIAA 2023. Lecture Notes in Computer Science, vol 14151. Springer, Cham. https://doi.org/10.1007/978-3-031-40247-0_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-40247-0_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-40246-3

  • Online ISBN: 978-3-031-40247-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics