Efficient Index-Based Regular Expression Matching with Optimal Query Plan Tree

Qiu, Tao; Yang, Xiaochun; Wang, Bin; Zong, Chuanyu; Zhu, Rui; Xia, Xiufeng

doi:10.1007/978-3-031-30637-2_3

Tao Qiu¹⁵,
Xiaochun Yang¹⁶,
Bin Wang¹⁶,
Chuanyu Zong¹⁵,
Rui Zhu¹⁵ &
…
Xiufeng Xia¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13943))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

1927 Accesses

Abstract

The problem of matching a regular expression (regex) on a text exists in many applications such as entity matching, protein sequences matching, and shell commands. Classical methods to support regex matching usually adopt the finite automaton which has a high matching cost. Recent methods solve the regex matching problem by utilizing the positional q-gram inverted index – one of the most widely used index schemes, and all matching results can be matched directly based on this index. The efficiency of these methods depends critically on the query plan tree, which is built from the query with some heuristic rules. However, these methods could become inefficient when an improper rule is used for building the query plan tree. To remedy this issue, this paper aims to build a good query plan tree with an efficiency guarantee. We propose a novel method to build an optimal query plan tree with the minimal expected matching cost for the index-based regex matching method. While computing an optimal query plan tree is an NP-hard problem even with strong assumptions, we propose a pseudo-polynomial time algorithm to build an optimal query plan tree. Finally, extensive experiments have been conducted on real-world data sets and the results show that our method outperforms state-of-the-art methods.

This work is partly supported by the National Natural Science Foundation of China (Nos. 62002245, U22A2025, 62072088, 62232007, 61802268), Ten Thousand Talent Program (No. ZX20200035), Liaoning Distinguished Professor (No. XLYC1902057), and the Natural Science Foundation of Liaoning Province (Nos. 2022-BS-218, 2022-MS-303, 2022-MS-302).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://www.regexlib.com/.

References

Berry, G., Sethi, R.: From regular expressions to deterministic automata. Theoret. Comput. Sci. 48, 117–126 (1986)
Article MathSciNet MATH Google Scholar
Cho, J., Rajagopalan, S.: A fast regular expression indexing engine. In: ICDE, vol. 2, p. 0419 (2002)
Google Scholar
Cox, R.: https://swtch.com/rsc/regexp/regexp4.html
DeRose, P., Shen, W., Chen, F., Lee, Y., et al.: DBLife: a community information management platform for the database research community. In: CIDR, pp. 169–172 (2007)
Google Scholar
GNUgrep. http://reality.sgiweb.org/freeware/relnotes/ fw-5.3/fw_gnugrep/gnugrep.html
Greiner, R., Hayward, R., et al.: Finding optimal satisficing strategies for and-or trees. Artif. Intell. 170(1), 19–58 (2006)
Article MathSciNet MATH Google Scholar
Hofmann, K., Bucher, P., Falquet, L., Bairoch, A.: The PROSITE database. Nucleic Acids Res. 27(1), 215–219 (1999)
Article Google Scholar
Kandhan, R., Teletia, N., Patel, J.M.: SigMatch: fast and scalable multi-pattern matching. VLDB 3(1–2), 1173–1184 (2010)
Google Scholar
Majumder, A., Rastogi, R., Vanama, S.: Scalable regular expression matching on data streams. In: SIGMOD, pp. 161–172. ACM (2008)
Google Scholar
McNaughton, R., Yamada, H.: Regular expressions and state graphs for automata. IEEE Trans. Electron. Comput. 1(EC-9), 39–47 (1960)
Google Scholar
Mohri, M.: String-matching with automata. Nord. J. Comput. 4(2), 217–231 (1997)
MathSciNet MATH Google Scholar
Navarro, C.: NR-grep: a fast and flexible pattern matching tool. Softw. Pract. Experience (SPE) 31, 1265–1312 (2001)
Article MATH Google Scholar
Navarro, G., Raffinot, M.: Flexible Pattern Matching in Strings: Practical Online Search Algorithms for Texts and Biological Sequences. Cambridge University Press, Cambridge (2002)
Book MATH Google Scholar
Navarro, G., Raffinot, M.: New techniques for regular expression searching. Algorithmica 41(2), 89–116 (2005). https://doi.org/10.1007/s00453-004-1120-3
Article MathSciNet MATH Google Scholar
Qiu, T., Yang, X., Wang, B., Wang, W.: Efficient regular expression matching based on positional inverted index. IEEE Trans. Knowl. Data Eng. 34, 1133–1148 (2020)
Article Google Scholar
Watson, B.W.: A new regula grammar pattern matching algorithm. In: Diaz, J., Serna, M. (eds.) ESA 1996. LNCS, vol. 1136, pp. 364–377. Springer, Heidelberg (1996). https://doi.org/10.1007/3-540-61680-2_68
Chapter Google Scholar
Yang, X., Qiu, T., Wang, B., Zheng, B., Wang, Y., Li, C.: Negative factor: improving regular-expression matching in strings. ACM Trans. Database Syst. 40(4), 25 (2016)
Article MathSciNet MATH Google Scholar
Yu, F., Chen, Z., Diao, Y., Lakshman, T., Katz, R.H.: Fast and memory-efficient regular expression matching for deep packet inspection. In: ANCS, 2006, pp. 93–102. IEEE (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, Shenyang Aerospace University, Shenyang, China
Tao Qiu, Chuanyu Zong, Rui Zhu & Xiufeng Xia
School of Computer Science and Engineering, Northeastern University, Shenyang, China
Xiaochun Yang & Bin Wang

Authors

Tao Qiu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaochun Yang
View author publications
You can also search for this author in PubMed Google Scholar
Bin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Chuanyu Zong
View author publications
You can also search for this author in PubMed Google Scholar
Rui Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Xiufeng Xia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tao Qiu .

Editor information

Editors and Affiliations

Tianjin University, Tianjin, China
Xin Wang
University of Torino, Turin, Italy
Maria Luisa Sapino
POSTECH, Pohang, Korea (Republic of)
Wook-Shin Han
University of California Santa Barbara, Santa Barbara, CA, USA
Amr El Abbadi
University of Auckland, Auckland, New Zealand
Gill Dobbie
Tianjin University, Tianjin, China
Zhiyong Feng
Beijing University of Posts and Telecommunications, Beijing, China
Yingxiao Shao
The University of Queensland, Brisbane, QLD, Australia
Hongzhi Yin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Qiu, T., Yang, X., Wang, B., Zong, C., Zhu, R., Xia, X. (2023). Efficient Index-Based Regular Expression Matching with Optimal Query Plan Tree. In: Wang, X., et al. Database Systems for Advanced Applications. DASFAA 2023. Lecture Notes in Computer Science, vol 13943. Springer, Cham. https://doi.org/10.1007/978-3-031-30637-2_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-30637-2_3
Published: 14 April 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-30636-5
Online ISBN: 978-3-031-30637-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Efficient Index-Based Regular Expression Matching with Optimal Query Plan Tree