Abstract
Finite languages lie at the heart of literally every regular expression. Therefore, we investigate the approximation complexity of minimizing regular expressions without Kleene star, or, equivalently, regular expressions describing finite languages. On the side of approximation hardness, given such an expression of size s, we prove that it is impossible to approximate the minimum size required by an equivalent regular expression within a factor of \(O\left( \frac{s}{(\log s)^{\delta }}\right) \) if the running time is bounded by a quasipolynomial function depending on \(\delta \), for every \(\delta >1\), unless the exponential time hypothesis (ETH) fails. For approximation ratio \(O(s^{1-\delta })\), we prove an exponential-time lower bound depending on \(\delta \), assuming ETH. The lower bounds apply to alphabets of constant size. On the algorithmic side, we show that the problem can be approximated in polynomial time within \(O(\frac{s\log \log s}{\log s})\), with s being the size of the given regular expression. For constant alphabet size, the bound improves to \(O(\frac{s}{\log s})\). Finally, we devise a family of superpolynomial approximation algorithms with approximation ratios matching the lower bounds, while the running times are just above the lower bounds excluded by the exponential time hypothesis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
See for example the following questions drawn from various sites: (i) P. Krauss: Minimal regular expression that matches a given set of words, URL: https://cs.stackexchange.com/q/72344, Accessed: 2021-01-02, (ii) J. Mason: A released perl with trie-based regexps! URL: http://taint.org/2006/07/07/184022a.html, Accessed: 2020-07-21, (iii) pdanese (StackOverflow username): Speed up millions of regex replacements in Python 3, URL: https://stackoverflow.com/q/42742810, Accessed: 2021-01-02, (iv) P. Scheibe: RegEx performance: Alternation vs Trie, URL: https://stackoverflow.com/q/56177330, Accessed: 2021-01-02, and (v) Ch. Xu: Minimizing size of regular expression for finite sets, URL: https://cstheory.stackexchange.com/q/16860, Accessed: 2021-01-02.
- 2.
See, e.g., item (v) of the previous footnote.
- 3.
For convenience, parentheses in regular expressions are sometimes omitted and concatenation is sometimes simply written as juxtaposition. The priority of operators is specified in the usual fashion: concatenation is performed before union, and star before both concatenation and union.
- 4.
We say that a function f(n) is time constructible if there exists an f(n) time-bounded multitape Turing machine M such that for each n there exists some input on which M actually makes f(n) moves [17].
- 5.
The grammar in [16, Proposition 8.3] does not generate all valid regular expressions, but incorporates some performance tweaks. These tweaks perfectly fit our purpose: while the grammar does not generate all feasible solutions, it still generates at least one optimal solution. More precisely, given a finite language L with \(\mathsf {awidth}(L) = k\), the context-free grammar is guaranteed to enumerate a regular expression of alphabetic width k for it.
References
Abboud, A., Backurs, A., Williams, V.V.: If the current clique algorithms are optimal, so is Valiant’s parlser. SIAM J. Comput. 47(6), 2527–2555 (2015)
Bringmann, K., Grønlund, A., Larsen, K.G.: A dichotomy for regular expression membership testing. In: Proceedings of the \(58\)th Annual IEEE Symposium on Foundations of Computer Science, pp. 307–318. IEEE, Berkeley, October 2017
Chalermsook, P., Heydrich, S., Holm, E., Karrenbauer, A.: Nearly tight approximability results for minimum biclique cover and partition. In: Schulz, A.S., Wagner, D. (eds.) ESA 2014. LNCS, vol. 8737, pp. 235–246. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44777-2_20
Clemente, L., Mayr, R.: Efficient reduction of nondeterministic automata with application to language inclusion testing. Log. Methods Comput. Sci. 15(1) (2019)
de Oliveira Oliveira, M., Wehar, M.: On the fine grained complexity of finite automata non-emptiness of intersection. In: Jonoska, N., Savchuk, D. (eds.) DLT 2020. LNCS, vol. 12086, pp. 69–82. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-48516-0_6
Eberhard, S., Hetzl, St.: On the compressibility of finite languages and formal proofs. Inform. Comput. 259, 191–213 (2018)
Ellul, K., Krawetz, B., Shallit, J., Wang, M.: Regular expressions: new results and open problems. J. Autom. Lang. Comb. 10(4), 407–437 (2005)
Fernau, H., Krebs, A.: Problems on finite automata and the exponential time hypothesis. Algorithms 10(1), 24 (2017)
Florêncio, Ch.C., Daenen, J., Ramon, J., Van den Bussche, J., Van Dyck, D.: Naive infinite enumeration of context-free languages in incremental polynomial time. J. Univ. Comput. Sci. 21(7), 891–911 (2015)
Gramlich, G., Schnitger, G.: Minimizing NFA’s and regular expressions. J. Comput. Syst. Sci. 73(6), 908–923 (2007)
Gruber, H., Holzer, M.: Computational complexity of NFA minimization for finite and unary languages. In: Preproceedings of the 1st International Conference on Language and Automata Theory and Applications, Technical Report 35/07, pp. 261–272. Research Group on Mathematical Linguistics, Universitat Rovira i Virgili, Tarragona, March 2007
Gruber, H., Holzer, M.: Inapproximability of nondeterministic state and transition complexity assuming P \(\ne \) NP. In: Harju, T., Karhumäki, J., Lepistö, A. (eds.) DLT 2007. LNCS, vol. 4588, pp. 205–216. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-73208-2_21
Gruber, H., Holzer, M.: Language operations with regular expressions of polynomial size. Theoret. Comput. Sci. 410(35), 3281–3289 (2009)
Gruber, H., Holzer, M.: Optimal regular expressions for palindromes of given length. In: Bonchi, F., Puglisi, S.J. (eds.) Proceedings of the \(46\)th International Symposium on Mathematical Foundations of Computer Science, Leibniz International Proceedings in Informatics. Schloss Dagstuhl-Leibniz-Zentrum für Informatik, Dagstuhl (2021, accepted)
Gruber, H., Holzer, M., Wolfsteiner, S.: On minimal grammar problems for finite languages. In: Hoshi, M., Seki, S. (eds.) DLT 2018. LNCS, vol. 11088, pp. 342–353. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98654-8_28
Gruber, H., Lee, J., Shallit, J.: Enumerating regular expressions and their languages. arXiv:1204.4982 [cs.FL], April 2012
Hopcroft, J.E., Ullman, J.D.: Introduction to Automata Theory, Languages and Computation. Addison-Wesley, Boston (1979)
Hunt III, H.B.: On the time and tape complexity of languages I. In: Proceedings of the \(5\)th Annual ACM Symposium on Theory of Computing, pp. 10–19. ACM, Austin, April-May 1973
Impagliazzo, R., Paturi, R., Zane, F.: Which problems have strongly exponential complexity? J. Comput. Syst. Sci. 63(4), 512–530 (2001)
Lokshtanov, D., Marx, D., Saurabh, S.: Lower bounds based on the exponential time hypothesis. Bull. Eur. Assoc. Theor. Comput. Sci. 105, 41–72 (2011)
Mandl, R.: Precise bounds associated with the subset construction on various classes of nondeterministic finite automata. In: Proceedings of the \(7\)th Princeton Conference on Information and System Sciences, pp. 263–267, March 1973
Meyer, A.R., Stockmeyer, L. J.: The equivalence problem for regular expressions with squaring requires exponential time. In: Proceedings of the \(13\)th Annual Symposium on Switching and Automata Theory, pp. 125–129. IEEE Society Press, October 1972
Mráz, F., Průša, D., Wehar, M.: Two-dimensional pattern matching against basic picture languages. In: Hospodár, M., Jirásková, G. (eds.) CIAA 2019. LNCS, vol. 11601, pp. 209–221. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-23679-3_17
Wehar, M.: Hardness results for intersection non-emptiness. In: Esparza, J., Fraigniaud, P., Husfeldt, T., Koutsoupias, E. (eds.) ICALP 2014, Part II. LNCS, vol. 8573, pp. 354–362. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-43951-7_30
Williams, V.V.: On some fine-grained questions in algorithms and complexity. In: Sirakov, B., de Souza, P.N., Viana, M. (eds.) Proceedings of the International Congress of Mathematicians, pp. 3447–3487. World Scientific, Rio de Janeiro, April 2018
Acknowledgments
We would like to thank Michael Wehar for some discussion, and the anonymous reviewers for their valuable comments.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Gruber, H., Holzer, M., Wolfsteiner, S. (2021). On Minimizing Regular Expressions Without Kleene Star. In: Bampis, E., Pagourtzis, A. (eds) Fundamentals of Computation Theory. FCT 2021. Lecture Notes in Computer Science(), vol 12867. Springer, Cham. https://doi.org/10.1007/978-3-030-86593-1_17
Download citation
DOI: https://doi.org/10.1007/978-3-030-86593-1_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86592-4
Online ISBN: 978-3-030-86593-1
eBook Packages: Computer ScienceComputer Science (R0)