Skip to main content

Efficient Index-Based Regular Expression Matching with Optimal Query Plan Tree

  • Conference paper
  • First Online:
Database Systems for Advanced Applications (DASFAA 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13943))

Included in the following conference series:

  • 1927 Accesses

Abstract

The problem of matching a regular expression (regex) on a text exists in many applications such as entity matching, protein sequences matching, and shell commands. Classical methods to support regex matching usually adopt the finite automaton which has a high matching cost. Recent methods solve the regex matching problem by utilizing the positional q-gram inverted index – one of the most widely used index schemes, and all matching results can be matched directly based on this index. The efficiency of these methods depends critically on the query plan tree, which is built from the query with some heuristic rules. However, these methods could become inefficient when an improper rule is used for building the query plan tree. To remedy this issue, this paper aims to build a good query plan tree with an efficiency guarantee. We propose a novel method to build an optimal query plan tree with the minimal expected matching cost for the index-based regex matching method. While computing an optimal query plan tree is an NP-hard problem even with strong assumptions, we propose a pseudo-polynomial time algorithm to build an optimal query plan tree. Finally, extensive experiments have been conducted on real-world data sets and the results show that our method outperforms state-of-the-art methods.

This work is partly supported by the National Natural Science Foundation of China (Nos. 62002245, U22A2025, 62072088, 62232007, 61802268), Ten Thousand Talent Program (No. ZX20200035), Liaoning Distinguished Professor (No. XLYC1902057), and the Natural Science Foundation of Liaoning Province (Nos. 2022-BS-218, 2022-MS-303, 2022-MS-302).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.regexlib.com/.

References

  1. Berry, G., Sethi, R.: From regular expressions to deterministic automata. Theoret. Comput. Sci. 48, 117–126 (1986)

    Article  MathSciNet  MATH  Google Scholar 

  2. Cho, J., Rajagopalan, S.: A fast regular expression indexing engine. In: ICDE, vol. 2, p. 0419 (2002)

    Google Scholar 

  3. Cox, R.: https://swtch.com/rsc/regexp/regexp4.html

  4. DeRose, P., Shen, W., Chen, F., Lee, Y., et al.: DBLife: a community information management platform for the database research community. In: CIDR, pp. 169–172 (2007)

    Google Scholar 

  5. GNUgrep. http://reality.sgiweb.org/freeware/relnotes/ fw-5.3/fw_gnugrep/gnugrep.html

  6. Greiner, R., Hayward, R., et al.: Finding optimal satisficing strategies for and-or trees. Artif. Intell. 170(1), 19–58 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  7. Hofmann, K., Bucher, P., Falquet, L., Bairoch, A.: The PROSITE database. Nucleic Acids Res. 27(1), 215–219 (1999)

    Article  Google Scholar 

  8. Kandhan, R., Teletia, N., Patel, J.M.: SigMatch: fast and scalable multi-pattern matching. VLDB 3(1–2), 1173–1184 (2010)

    Google Scholar 

  9. Majumder, A., Rastogi, R., Vanama, S.: Scalable regular expression matching on data streams. In: SIGMOD, pp. 161–172. ACM (2008)

    Google Scholar 

  10. McNaughton, R., Yamada, H.: Regular expressions and state graphs for automata. IEEE Trans. Electron. Comput. 1(EC-9), 39–47 (1960)

    Google Scholar 

  11. Mohri, M.: String-matching with automata. Nord. J. Comput. 4(2), 217–231 (1997)

    MathSciNet  MATH  Google Scholar 

  12. Navarro, C.: NR-grep: a fast and flexible pattern matching tool. Softw. Pract. Experience (SPE) 31, 1265–1312 (2001)

    Article  MATH  Google Scholar 

  13. Navarro, G., Raffinot, M.: Flexible Pattern Matching in Strings: Practical Online Search Algorithms for Texts and Biological Sequences. Cambridge University Press, Cambridge (2002)

    Book  MATH  Google Scholar 

  14. Navarro, G., Raffinot, M.: New techniques for regular expression searching. Algorithmica 41(2), 89–116 (2005). https://doi.org/10.1007/s00453-004-1120-3

    Article  MathSciNet  MATH  Google Scholar 

  15. Qiu, T., Yang, X., Wang, B., Wang, W.: Efficient regular expression matching based on positional inverted index. IEEE Trans. Knowl. Data Eng. 34, 1133–1148 (2020)

    Article  Google Scholar 

  16. Watson, B.W.: A new regula grammar pattern matching algorithm. In: Diaz, J., Serna, M. (eds.) ESA 1996. LNCS, vol. 1136, pp. 364–377. Springer, Heidelberg (1996). https://doi.org/10.1007/3-540-61680-2_68

    Chapter  Google Scholar 

  17. Yang, X., Qiu, T., Wang, B., Zheng, B., Wang, Y., Li, C.: Negative factor: improving regular-expression matching in strings. ACM Trans. Database Syst. 40(4), 25 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  18. Yu, F., Chen, Z., Diao, Y., Lakshman, T., Katz, R.H.: Fast and memory-efficient regular expression matching for deep packet inspection. In: ANCS, 2006, pp. 93–102. IEEE (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tao Qiu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Qiu, T., Yang, X., Wang, B., Zong, C., Zhu, R., Xia, X. (2023). Efficient Index-Based Regular Expression Matching with Optimal Query Plan Tree. In: Wang, X., et al. Database Systems for Advanced Applications. DASFAA 2023. Lecture Notes in Computer Science, vol 13943. Springer, Cham. https://doi.org/10.1007/978-3-031-30637-2_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-30637-2_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-30636-5

  • Online ISBN: 978-3-031-30637-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics