Abstract
We consider the problem of running the regex pattern matching in a space-efficient manner. Given a regex, we suggest a bit-packing scheme for representing a compiled regex in a compressed way, which is its position automaton. Our scheme reduces its representation size further by relying on the homogeneous property of position automata and practical features of regexes. We implement the proposed scheme and evaluate the memory consumption using a practical regex benchmark dataset. Our approach produces a much smaller representation compared to two common FA representations. In addition, experimental results show that our bit-packing regex engine is effective for matching regexes that have large compiled forms, by showing less memory consumption compared to the current state-of-the-art regex engine (RE2).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
We bring this example from polyglot [13]. Here, is a character class for alphanumeric characters and an underscore.
References
Almeida, A., Almeida, M., Alves, J., Moreira, N., Reis, R.: FAdo and GUItar. In: Maneth, S. (ed.) CIAA 2009. LNCS, vol. 5642, pp. 65–74. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02979-0_10
Berglund, M., van der Merwe, B.: On the semantics of regular expression parsing in the wild. In: Proceedings of the 20th International Conference on Implementation and Application of Automata, pp. 292–304 (2015)
Berglund, M., van der Merwe, B., van Litsenborgh, S.: Regular expressions with lookahead. J. Univ. Comput. Sci. 27(4), 324–340 (2021)
Brüggemann-Klein, A.: Regular expressions into finite automata. Theoret. Comput. Sci. 120(2), 197–213 (1993)
Caron, P., Ziadi, D.: Characterization of Glushkov automata. Theoret. Comput. Sci. 233(1–2), 75–90 (2000)
Champarnaud, J., Coulon, F., Paranthoën, T.: Compact and fast algorithms for safe regular expression search. Int. J. Comput. Math. 81(4), 383–401 (2004)
Chang, C., Paige, R.: From regular expressions to DFA’s using compressed NFA’s. Theoret. Comput. Sci. 178(1–2), 1–36 (1997)
Contributors: Compilation and reuse in regular expressions, September 2021. https://learn.microsoft.com/en-us/dotnet/standard/base-types/compilation-and-reuse-in-regular-expressions. Accessed 25 Apr 2023
Cortes, C., Mohri, M.: Learning with weighted transducers. In: Proceedings of the 7th International Workshop on Finite-State Methods and Natural Language Processing, pp. 14–22 (2008)
CTRE: Compile time regular expression in C++, January 2023. https://github.com/hanickadot/compile-time-regular-expressions. Accessed 25 Apr 2023
Daciuk, J.: Experiments with automata compression. In: Proceedings of the 5th International Conference on Implementation and Application of Automata, pp. 105–112 (2001)
Daciuk, J., Weiss, D.: Smaller representation of finite state automata. In: Proceedings of the 16th International Conference on Implementation and Application of Automata, pp. 118–129 (2011)
Davis, J.C., Michael IV, L.G., Coghlan, C.A., Servant, F., Lee, D.: Why aren’t regular expressions a lingua franca? An empirical study on the re-use and portability of regular expressions. In: Proceedings of the 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 443–454 (2019)
Davis, J.C., Moyer, D., Kazerouni, A.M., Lee, D.: Testing regex generalizability and its implications: a large-scale many-language measurement study. In: Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering, pp. 427–439 (2019)
Giammarresi, D., Ponty, J., Wood, D., Ziadi, D.: A characterization of Thompson digraphs. Discret. Appl. Math. 134(1–3), 317–337 (2004)
Glushkov, V.M.: The abstract theory of automata. Russ. Math. Surv. 16(5), 1–53 (1961)
Hossain, S.: Visualization of bioinformatics data with dash bio. In: Proceedings of the 18th Python in Science Conference, pp. 126–133 (2019)
Lin, J., Chen, W.M., Lin, Y., Cohn, J., Gan, C., Han, S.: MCUNet: tiny deep learning on IoT devices. In: Advances in Neural Information Processing System, vol. 33, pp. 11711–11722 (2020)
Luo, B., Lee, D., Lee, W., Liu, P.: QFilter: fine-grained run-time XML access control via NFA-based query rewriting. In: Proceedings of the 13th ACM International Conference on Information and Knowledge Management, pp. 543–552 (2004)
McNaughton, R., Yamada, H.: Regular expressions and state graphs for automata. IRE Trans. Electron. Comput. EC-9(1), 39–47 (1960)
Meiners, C.R., Patel, J., Norige, E., Torng, E., Liu, A.X.: Fast regular expression matching using small TCAMs for network intrusion detection and prevention systems. In: Proceedings of 19th USENIX Security Symposium, pp. 111–126 (2010)
van der Merwe, B., Mouton, J., van Litsenborgh, S., Berglund, M.: Memoized regular expressions. In: Maneth, S. (ed.) CIAA 2021. LNCS, vol. 12803, pp. 39–52. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-79121-6_4
Nethercote, N., Seward, J.: Valgrind: a framework for heavyweight dynamic binary instrumentation. In: Proceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation, pp. 89–100 (2007)
Nunes, D.S.N., Ayala-Rincón, M.: A compressed suffix tree based implementation with low peak memory usage. Electron. Notes Theoret. Comput. Sci. 302, 73–94 (2014)
Ramey, R.: November 2004. https://www.boost.org/doc/libs/1_82_0/libs/serialization/doc/index.html. Accessed 25 Apr 2023
Raymond, D.R., Wood, D.: Grail: a C++ library for automata and expressions. J. Symb. Comput. 17(4), 341–350 (1994)
Rodger, S.H., Finley, T.W.: JFLAP: An Interactive Formal Languages and Automata Package. Jones & Bartlett Learning (2006)
Sung, S., Cheon, H., Han, Y.S.: How to settle the ReDoS problem: back to the classical automata theory. In: Proceedings of the 26th International Conference on Implementation and Application of Automata, pp. 34–49 (2022)
Thompson, K.: Programming techniques: regular expression search algorithm. Commun. ACM 11(6), 419–422 (1968)
Xeger: Xeger (2019). https://pypi.org/project/xeger/. Accessed 25 Apr 2023
Zani, S., Riani, M., Corbellini, A.: Robust bivariate boxplots and multiple outlier detection. Comput. Stat. Data Anal. 28(3), 257–270 (1998)
Acknowledgments
We wish to thank the referees for the careful reading of the paper and valuable suggestions. This research was supported by the NRF grant funded by MIST (RS-2023-00208094).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Sung, S., Ko, SK., Han, YS. (2023). Smaller Representation of Compiled Regular Expressions. In: Nagy, B. (eds) Implementation and Application of Automata. CIAA 2023. Lecture Notes in Computer Science, vol 14151. Springer, Cham. https://doi.org/10.1007/978-3-031-40247-0_22
Download citation
DOI: https://doi.org/10.1007/978-3-031-40247-0_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-40246-3
Online ISBN: 978-3-031-40247-0
eBook Packages: Computer ScienceComputer Science (R0)