Skip to main content

On Minimizing Regular Expressions Without Kleene Star

  • Conference paper
  • First Online:
Book cover Fundamentals of Computation Theory (FCT 2021)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12867))

Included in the following conference series:

  • 534 Accesses

Abstract

Finite languages lie at the heart of literally every regular expression. Therefore, we investigate the approximation complexity of minimizing regular expressions without Kleene star, or, equivalently, regular expressions describing finite languages. On the side of approximation hardness, given such an expression of size s, we prove that it is impossible to approximate the minimum size required by an equivalent regular expression within a factor of \(O\left( \frac{s}{(\log s)^{\delta }}\right) \) if the running time is bounded by a quasipolynomial function depending on \(\delta \), for every \(\delta >1\), unless the exponential time hypothesis (ETH) fails. For approximation ratio \(O(s^{1-\delta })\), we prove an exponential-time lower bound depending on \(\delta \), assuming ETH. The lower bounds apply to alphabets of constant size. On the algorithmic side, we show that the problem can be approximated in polynomial time within \(O(\frac{s\log \log s}{\log s})\), with s being the size of the given regular expression. For constant alphabet size, the bound improves to \(O(\frac{s}{\log s})\). Finally, we devise a family of superpolynomial approximation algorithms with approximation ratios matching the lower bounds, while the running times are just above the lower bounds excluded by the exponential time hypothesis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    See for example the following questions drawn from various sites: (i) P. Krauss: Minimal regular expression that matches a given set of words, URL: https://cs.stackexchange.com/q/72344, Accessed: 2021-01-02, (ii) J. Mason: A released perl with trie-based regexps! URL: http://taint.org/2006/07/07/184022a.html, Accessed: 2020-07-21, (iii) pdanese (StackOverflow username): Speed up millions of regex replacements in Python 3, URL: https://stackoverflow.com/q/42742810, Accessed: 2021-01-02, (iv) P. Scheibe: RegEx performance: Alternation vs Trie, URL: https://stackoverflow.com/q/56177330, Accessed: 2021-01-02, and (v) Ch. Xu: Minimizing size of regular expression for finite sets, URL: https://cstheory.stackexchange.com/q/16860, Accessed: 2021-01-02.

  2. 2.

    See, e.g., item (v) of the previous footnote.

  3. 3.

    For convenience, parentheses in regular expressions are sometimes omitted and concatenation is sometimes simply written as juxtaposition. The priority of operators is specified in the usual fashion: concatenation is performed before union, and star before both concatenation and union.

  4. 4.

    We say that a function f(n) is time constructible if there exists an f(n) time-bounded multitape Turing machine M such that for each n there exists some input on which M actually makes f(n) moves [17].

  5. 5.

    The grammar in [16, Proposition 8.3] does not generate all valid regular expressions, but incorporates some performance tweaks. These tweaks perfectly fit our purpose: while the grammar does not generate all feasible solutions, it still generates at least one optimal solution. More precisely, given a finite language L with \(\mathsf {awidth}(L) = k\), the context-free grammar is guaranteed to enumerate a regular expression of alphabetic width k for it.

References

  1. Abboud, A., Backurs, A., Williams, V.V.: If the current clique algorithms are optimal, so is Valiant’s parlser. SIAM J. Comput. 47(6), 2527–2555 (2015)

    Article  Google Scholar 

  2. Bringmann, K., Grønlund, A., Larsen, K.G.: A dichotomy for regular expression membership testing. In: Proceedings of the \(58\)th Annual IEEE Symposium on Foundations of Computer Science, pp. 307–318. IEEE, Berkeley, October 2017

    Google Scholar 

  3. Chalermsook, P., Heydrich, S., Holm, E., Karrenbauer, A.: Nearly tight approximability results for minimum biclique cover and partition. In: Schulz, A.S., Wagner, D. (eds.) ESA 2014. LNCS, vol. 8737, pp. 235–246. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44777-2_20

    Chapter  Google Scholar 

  4. Clemente, L., Mayr, R.: Efficient reduction of nondeterministic automata with application to language inclusion testing. Log. Methods Comput. Sci. 15(1) (2019)

    Google Scholar 

  5. de Oliveira Oliveira, M., Wehar, M.: On the fine grained complexity of finite automata non-emptiness of intersection. In: Jonoska, N., Savchuk, D. (eds.) DLT 2020. LNCS, vol. 12086, pp. 69–82. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-48516-0_6

    Chapter  Google Scholar 

  6. Eberhard, S., Hetzl, St.: On the compressibility of finite languages and formal proofs. Inform. Comput. 259, 191–213 (2018)

    Google Scholar 

  7. Ellul, K., Krawetz, B., Shallit, J., Wang, M.: Regular expressions: new results and open problems. J. Autom. Lang. Comb. 10(4), 407–437 (2005)

    MathSciNet  MATH  Google Scholar 

  8. Fernau, H., Krebs, A.: Problems on finite automata and the exponential time hypothesis. Algorithms 10(1), 24 (2017)

    Article  MathSciNet  Google Scholar 

  9. Florêncio, Ch.C., Daenen, J., Ramon, J., Van den Bussche, J., Van Dyck, D.: Naive infinite enumeration of context-free languages in incremental polynomial time. J. Univ. Comput. Sci. 21(7), 891–911 (2015)

    Google Scholar 

  10. Gramlich, G., Schnitger, G.: Minimizing NFA’s and regular expressions. J. Comput. Syst. Sci. 73(6), 908–923 (2007)

    Article  MathSciNet  Google Scholar 

  11. Gruber, H., Holzer, M.: Computational complexity of NFA minimization for finite and unary languages. In: Preproceedings of the 1st International Conference on Language and Automata Theory and Applications, Technical Report 35/07, pp. 261–272. Research Group on Mathematical Linguistics, Universitat Rovira i Virgili, Tarragona, March 2007

    Google Scholar 

  12. Gruber, H., Holzer, M.: Inapproximability of nondeterministic state and transition complexity assuming P \(\ne \) NP. In: Harju, T., Karhumäki, J., Lepistö, A. (eds.) DLT 2007. LNCS, vol. 4588, pp. 205–216. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-73208-2_21

    Chapter  Google Scholar 

  13. Gruber, H., Holzer, M.: Language operations with regular expressions of polynomial size. Theoret. Comput. Sci. 410(35), 3281–3289 (2009)

    Article  MathSciNet  Google Scholar 

  14. Gruber, H., Holzer, M.: Optimal regular expressions for palindromes of given length. In: Bonchi, F., Puglisi, S.J. (eds.) Proceedings of the \(46\)th International Symposium on Mathematical Foundations of Computer Science, Leibniz International Proceedings in Informatics. Schloss Dagstuhl-Leibniz-Zentrum für Informatik, Dagstuhl (2021, accepted)

    Google Scholar 

  15. Gruber, H., Holzer, M., Wolfsteiner, S.: On minimal grammar problems for finite languages. In: Hoshi, M., Seki, S. (eds.) DLT 2018. LNCS, vol. 11088, pp. 342–353. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98654-8_28

    Chapter  Google Scholar 

  16. Gruber, H., Lee, J., Shallit, J.: Enumerating regular expressions and their languages. arXiv:1204.4982 [cs.FL], April 2012

  17. Hopcroft, J.E., Ullman, J.D.: Introduction to Automata Theory, Languages and Computation. Addison-Wesley, Boston (1979)

    MATH  Google Scholar 

  18. Hunt III, H.B.: On the time and tape complexity of languages I. In: Proceedings of the \(5\)th Annual ACM Symposium on Theory of Computing, pp. 10–19. ACM, Austin, April-May 1973

    Google Scholar 

  19. Impagliazzo, R., Paturi, R., Zane, F.: Which problems have strongly exponential complexity? J. Comput. Syst. Sci. 63(4), 512–530 (2001)

    Article  MathSciNet  Google Scholar 

  20. Lokshtanov, D., Marx, D., Saurabh, S.: Lower bounds based on the exponential time hypothesis. Bull. Eur. Assoc. Theor. Comput. Sci. 105, 41–72 (2011)

    MathSciNet  MATH  Google Scholar 

  21. Mandl, R.: Precise bounds associated with the subset construction on various classes of nondeterministic finite automata. In: Proceedings of the \(7\)th Princeton Conference on Information and System Sciences, pp. 263–267, March 1973

    Google Scholar 

  22. Meyer, A.R., Stockmeyer, L. J.: The equivalence problem for regular expressions with squaring requires exponential time. In: Proceedings of the \(13\)th Annual Symposium on Switching and Automata Theory, pp. 125–129. IEEE Society Press, October 1972

    Google Scholar 

  23. Mráz, F., Průša, D., Wehar, M.: Two-dimensional pattern matching against basic picture languages. In: Hospodár, M., Jirásková, G. (eds.) CIAA 2019. LNCS, vol. 11601, pp. 209–221. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-23679-3_17

    Chapter  Google Scholar 

  24. Wehar, M.: Hardness results for intersection non-emptiness. In: Esparza, J., Fraigniaud, P., Husfeldt, T., Koutsoupias, E. (eds.) ICALP 2014, Part II. LNCS, vol. 8573, pp. 354–362. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-43951-7_30

    Chapter  Google Scholar 

  25. Williams, V.V.: On some fine-grained questions in algorithms and complexity. In: Sirakov, B., de Souza, P.N., Viana, M. (eds.) Proceedings of the International Congress of Mathematicians, pp. 3447–3487. World Scientific, Rio de Janeiro, April 2018

    Google Scholar 

Download references

Acknowledgments

We would like to thank Michael Wehar for some discussion, and the anonymous reviewers for their valuable comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Markus Holzer .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gruber, H., Holzer, M., Wolfsteiner, S. (2021). On Minimizing Regular Expressions Without Kleene Star. In: Bampis, E., Pagourtzis, A. (eds) Fundamentals of Computation Theory. FCT 2021. Lecture Notes in Computer Science(), vol 12867. Springer, Cham. https://doi.org/10.1007/978-3-030-86593-1_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86593-1_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86592-4

  • Online ISBN: 978-3-030-86593-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics