Skip to main content

Efficient Enumeration of Regular Expressions for Faster Regular Expression Synthesis

  • Conference paper
  • First Online:
Book cover Implementation and Application of Automata (CIAA 2021)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12803))

Included in the following conference series:

Abstract

We study the problem of synthesizing regular expressions from a set of positive and negative strings. The previous synthesis algorithm proposed by Lee et al. [12] relies on the best-first enumeration of regular expressions. To improve the performance of the enumeration process, we define a new normal form of regular expressions called the concise normal form which allows us to significantly reduce the search space by pruning those not in the normal form while still capturing the whole class of regular languages. We conduct experiments with two benchmark datasets and demonstrate that our synthesis algorithm based on the proposed normal form outperforms the previous algorithm in terms of runtime complexity and scalability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The OCaml implementation of AlphaRegex and dataset are publicly available at https://github.com/kupl/AlphaRegexPublic.

  2. 2.

    https://pypi.org/project/xeger/.

References

  1. Broda, S., Machiavelo, A., Moreira, N., Reis, R.: On average behaviour of regular expressions in strong star normal form. Int. J. Found. Comput. Sci. 30(6–7), 899–920 (2019)

    Article  MathSciNet  Google Scholar 

  2. Brüggemann-Klein, A.: Regular expressions into finite automata. Theoret. Comput. Sci. 120(2), 197–213 (1993)

    Article  MathSciNet  Google Scholar 

  3. Chen, Q., Wang, X., Ye, X., Durrett, G., Dillig, I.: Multi-modal synthesis of regular expressions. In: PLDI 2020, pp. 487–502 (2020)

    Google Scholar 

  4. Chomsky, N., Schützenberger, M.: The algebraic theory of context-free languages. In: Computer Programming and Formal Systems. Studies in Logic and the Foundations of Mathematics, vol. 35, pp. 118–161. Elsevier (1963)

    Google Scholar 

  5. Ellul, K., Krawetz, B., Shallit, J.O., Wang, M.: Regular expressions: new results and open problems. J. Autom. Lang. Comb. 10(4), 407–437 (2005)

    MathSciNet  MATH  Google Scholar 

  6. Frishert, M., Watson, B.W.: Combining regular expressions with (near-)optimal Brzozowski automata. In: Domaratzki, M., Okhotin, A., Salomaa, K., Yu, S. (eds.) CIAA 2004. LNCS, vol. 3317, pp. 319–320. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-30500-2_34

    Chapter  MATH  Google Scholar 

  7. Gruber, H., Gulan, S.: Simplifying regular expressions. In: Dediu, A.-H., Fernau, H., Martín-Vide, C. (eds.) LATA 2010. LNCS, vol. 6031, pp. 285–296. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13089-2_24

    Chapter  Google Scholar 

  8. Gulwani, S.: Dimensions in program synthesis. In: PPDP 2010, pp. 13–24 (2010)

    Google Scholar 

  9. Hopcroft, J., Ullman, J.: Introduction to Automata Theory, Languages, and Computation, 2nd edn. Addison-Wesley, Reading (1979)

    MATH  Google Scholar 

  10. Kushman, N., Barzilay, R.: Using semantic unification to generate regular expressions from natural language. In: NAACL-HLT 2013, pp. 826–836 (2013)

    Google Scholar 

  11. Lee, J., Shallit, J.: Enumerating regular expressions and their languages. In: Domaratzki, M., Okhotin, A., Salomaa, K., Yu, S. (eds.) CIAA 2004. LNCS, vol. 3317, pp. 2–22. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-30500-2_2

    Chapter  MATH  Google Scholar 

  12. Lee, M., So, S., Oh, H.: Synthesizing regular expressions from examples for introductory automata assignments. In: GPCE 2016, pp. 70–80 (2016)

    Google Scholar 

  13. Owens, S., Reppy, J.H., Turon, A.: Regular-expression derivatives re-examined. J. Funct. Program. 19(2), 173–190 (2009)

    Article  MathSciNet  Google Scholar 

  14. Park, J., Ko, S., Cognetta, M., Han, Y.: Softregex: Generating regex from natural language descriptions using softened regex equivalence. In: EMNLP-IJCNLP 2019, pp. 6424–6430 (2019)

    Google Scholar 

  15. Sipser, M.: Introduction to the Theory of Computation. Cengage Learning (2012)

    Google Scholar 

  16. Stockmeyer, L.J., Meyer, A.R.: Word problems requiring exponential time: preliminary report. In: STOC 1973, pp. 1–9 (1973)

    Google Scholar 

  17. Wang, X., Gulwani, S., Singh, R.: FIDEX: filtering spreadsheet data using examples. In: OOPSLA 2016, pp. 195–213 (2016)

    Google Scholar 

  18. Wood, D.: Theory of Computation. Harper & Row (1987)

    Google Scholar 

  19. Ye, X., Chen, Q., Wang, X., Dillig, I., Durrett, G.: Sketch-driven regular expression generation from natural language and examples. Trans. Assoc. Comput. Linguist. 8, 679–694 (2020)

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MIST) (No. 2020R1A4A3079947).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sang-Ki Ko .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kim, SH., Im, H., Ko, SK. (2021). Efficient Enumeration of Regular Expressions for Faster Regular Expression Synthesis. In: Maneth, S. (eds) Implementation and Application of Automata. CIAA 2021. Lecture Notes in Computer Science(), vol 12803. Springer, Cham. https://doi.org/10.1007/978-3-030-79121-6_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-79121-6_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-79120-9

  • Online ISBN: 978-3-030-79121-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics