Succinct Determinisation of Counting Automata via Sphere Construction

Holík, Lukáš; Lengál, Ondřej; Saarikivi, Olli; Turoňová, Lenka; Veanes, Margus; Vojnar, Tomáš

doi:10.1007/978-3-030-34175-6_24

Succinct Determinisation of Counting Automata via Sphere Construction

Lukáš Holík⁹,
Ondřej Lengál⁹,
Olli Saarikivi¹⁰,
Lenka Turoňová⁹,
Margus Veanes¹⁰ &
…
Tomáš Vojnar⁹

Conference paper
First Online: 18 November 2019

521 Accesses
5 Citations

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 11893))

Abstract

We propose an efficient algorithm for determinising counting automata (CAs), i.e., finite automata extended with bounded counters. The algorithm avoids unfolding counters into control states, unlike the naïve approach, and thus produces much smaller deterministic automata. We also develop a simplified and faster version of the general algorithm for the sub-class of so-called monadic CAs (MCAs), i.e., CAs with counting loops on character classes, which are common in practice. Our main motivation is (besides applications in verification and decision procedures of logics) the application of deterministic (M)CAs in pattern matching regular expressions with counting, which are very common in e.g. network traffic processing and log analysis. We have evaluated our algorithm against practical benchmarks from these application domains and concluded that compared to the naïve approach, our algorithm is much less prone to explode, produces automata that can be several orders of magnitude smaller, and is overall faster.

This work has been supported by the Czech Science Foundation (project No. 19-24397S), the IT4Innovations Excellence in Science (project No. LQ1602), and the FIT BUT internal project FIT-S-17-4014.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
To handle large or infinite sets of symbols symbolically, the predicates \(\texttt {l}= a\) may be generalised to predicates from an arbitrary effective Boolean algebra, as in [6].
2.
A Boolean combination of atomic guards and updates can be factorised through (1) a transformation to DNF, yielding a set of clauses X; (2) writing each clause \(\varphi \in X\) as a conjunction of a guard formula \(g_\varphi \) and an assignment formula \(f_\varphi \); (3) computing minterms of the set \(\{g_\varphi \mid \varphi \in X\}\); (4) creating one factor \((g)\wedge (f)\) from every minterm g where f is the disjunction of all the assignment formulae \(f_\varphi \) with \(\varphi \in X\) compatible with g (i.e., such that \(g\wedge f_\varphi \) is satisfiable).
3.
We note that we only need to use a specialised, simple, and cheap quantifier elimination. In particular, we only need to eliminate counter variables c from formulae such that, in clauses of their DNF, c always appears together with a predicate \(c=p\) where p is a parameter. Eliminating c from such a DNF clause is then done by simply substituting occurrences of c by p. We do not need complex algorithms such as the general quantifier elimination for Presburger arithmetic.
4.
The choice of the parameters in the image of \(\theta _{ at }: at ( u _i)\rightarrow \mathcal {P}'\) on line 9 is arbitrary, although, in practice, it would be sensible to define some systematic parameter naming policy and reuse existing parameters whenever possible.
5.
For this step to preserve the language of the automaton, we need to assume that the input CA does not assign nondeterministic values to live counters. We are refering to the standard notion: a counter is live at a state if the value it holds at that state may influence satisfaction of some guard in the future. Any CA can be transformed into this form, and CAs we compile from regular expressions satisfy this condition by construction.
6.
We note that we restrict ourselves to range sub-expressions of the form \(\sigma \{n,n\}\) or \(\sigma \{0,n\}\) only. This is without loss of generality since a general range expression \(\sigma \{m,n\}\) can be rewritten as \(\sigma \{m,m\}.\sigma \{0,n-m\}\).
7.
Notice that the guards \(c_q < {{\varvec{max}}}_{q}\) on the incrementing self-loops of exact counting states could be removed without affecting the language since when \(c_q\) exceeds \({{\varvec{max}}}_{q}\), then the run can never leave q and has thus no chance of accepting. We include these guards only to conform to the condition on boundedness of counter values in the definition of CAs.
8.
Notice that maintaining a fixed association of a parameter to a counter is a difference from Algorithms 1 and 2, where one parameter may represent different counters.
9.
The fact that this relation is indeed a simulation can be seen from that both the higher and lower value of \(c_q\) can use any exit transition of q at any moment regardless of the value of \(c_q\), but the lower value of \(c_q\) can stay in the counting loop longer.

References

Abdulla, P.A., Krcal, P., Yi, W.: R-automata. In: van Breugel, F., Chechik, M. (eds.) CONCUR 2008. LNCS, vol. 5201, pp. 67–81. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-85361-9_9
Chapter Google Scholar
Bardin, S., Finkel, A., Leroux, J., Petrucci, L.: FAST: acceleration from theory to practice. STTT 10(5) (2008)
Article Google Scholar
Börklund, E., Martens, W., Timm, T.: Efficient incremental evaluation of succinct regular expressions. In: Proceedings of CIKM 2015, ACM (2015)
Google Scholar
Chen, H., Lu, P.: Checking determinism of regular expressions with counting. Inf. Comput. 241, 302–320 (2015)
Article MathSciNet Google Scholar
Cheng, K., Krishnakumar, A.S.: Automatic functional test generation using the extended finite state machine model. In: Proceedings of DAC 1993, ACM Press (1993)
Google Scholar
D’Antoni, L., Veanes, M.: Minimization of symbolic automata. In: Proceedings of POPL 2014, ACM (2014)
Google Scholar
Dill, D.L., Hu, A.J., Wong-Toi, H.: Checking for language inclusion using simulation preorders. In: Larsen, K.G., Skou, A. (eds.) CAV 1991. LNCS, vol. 575, pp. 255–265. Springer, Heidelberg (1992). https://doi.org/10.1007/3-540-55179-4_25
Chapter Google Scholar
Gelade, W., Martens, W., Neven, F.: Optimizing schema languages for XML: numerical constraints and interleaving. In: Schwentick, T., Suciu, D. (eds.) ICDT 2007. LNCS, vol. 4353, pp. 269–283. Springer, Heidelberg (2006). https://doi.org/10.1007/11965893_19
Chapter Google Scholar
Gelade, W., Gyssens, M., Martens, W.: Regular expressions with counting: weak versus strong determinism. In: Královič, R., Niwiński, D. (eds.) MFCS 2009. LNCS, vol. 5734, pp. 369–381. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03816-7_32
Chapter Google Scholar
van Glabbeek, R., Ploeger, B.: Five Determinisation algorithms. In: Ibarra, O.H., Ravikumar, B. (eds.) CIAA 2008. LNCS, vol. 5148, pp. 161–170. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-70844-5_17
Chapter Google Scholar
Heizmann, M., Hoenicke, J., Podelski, A.: Software model checking for people who love automata. In: Sharygina, N., Veith, H. (eds.) CAV 2013. LNCS, vol. 8044, pp. 36–52. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39799-8_2
Chapter Google Scholar
Henriksen, J.G., et al.: Mona: monadic second-order logic in practice. In: Brinksma, E., Cleaveland, W.R., Larsen, K.G., Margaria, T., Steffen, B. (eds.) TACAS 1995. LNCS, vol. 1019, pp. 89–110. Springer, Heidelberg (1995). https://doi.org/10.1007/3-540-60630-0_5
Chapter Google Scholar
Hovland, D.: Regular expressions with numerical constraints and automata with counters. In: Leucker, M., Morgan, C. (eds.) ICTAC 2009. LNCS, vol. 5684, pp. 231–245. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03466-4_15
Chapter Google Scholar
Hovland, D.: The membership problem for regular expressions with unordered concatenation and numerical constraints. In: Dediu, A.-H., Martín-Vide, C. (eds.) LATA 2012. LNCS, vol. 7183, pp. 313–324. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28332-1_27
Chapter MATH Google Scholar
Kilpeläinen, P., Tuhkanen, R.: One-unambiguity of regular expressions with numeric occurrence indicators. Inf. Comput. 205(6), 890–916 (2007)
Article MathSciNet Google Scholar
Lengál, O., Šimáček, J., Vojnar, T.: VATA: a library for efficient manipulation of non-deterministic tree automata. In: Flanagan, C., König, B. (eds.) TACAS 2012. LNCS, vol. 7214, pp. 79–94. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28756-5_7
Chapter MATH Google Scholar
Roesch, M., et al.: Snort: A Network Intrusion Detection and Prevention System. http://www.snort.org
Microsoft Automata Library: Automata and Transducer Library for .NET. https://github.com/AutomataDotNet/Automata
OWASP Foundation and Checkmarx: Regular Expression Denial of Service: ReDoS (2017)
Google Scholar
RegExLib.com: The Internet’s First Regular Expression Library. http://regexlib.com/
Sommer, R., et al.: The Bro Network Security Monitor. http://www.bro.org
Shiple, T.R., Kukula, J.H., Ranjan, R.K.: A comparison of Presburger engines for EFSM reachability. In: Hu, A.J., Vardi, M.Y. (eds.) CAV 1998. LNCS, vol. 1427, pp. 280–292. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0028752
Chapter Google Scholar
Smith, R., Estan, C., Jha, S.: XFA: faster signature matching with extended automata. In: Proceedings of SSP 2008, IEEE (2008)
Google Scholar
Smith, R., Estan, C., Jha, S., Siahaan, I.: Fast signature matching using extended finite automaton (XFA). In: Sekar, R., Pujari, A.K. (eds.) ICISS 2008. LNCS, vol. 5352, pp. 158–172. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-89862-7_15
Chapter Google Scholar
Sperberg-McQueen, M.: Notes on Finite State Automata with Counters. https://www.w3.org/XML/2004/05/msm-cfa.html. Accessed 08 Aug 2018
The Sagan Team: The Sagan Log Analysis Engine. https://quadrantsec.com/sagan_log_analysis_engine/
Thompson, K.: Programming techniques: regular expression search algorithm. Commun. ACM 11(6), 419–422 (1968)
Article Google Scholar
Češka, M., Havlena, V., Holík, L., Lengál, O., Vojnar, T.: Approximate reduction of finite automata for high-speed network intrusion detection. In: Beyer, D., Huisman, M. (eds.) TACAS 2018. LNCS, vol. 10806, pp. 155–175. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-89963-3_9
Chapter Google Scholar
Yang, L., Karim, R., Ganapathy, V., Smith, R.: Improving NFA-based signature matching using ordered binary decision diagrams. In: Jha, S., Sommer, R., Kreibich, C. (eds.) RAID 2010. LNCS, vol. 6307, pp. 58–78. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15512-3_4
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

FIT, IT4Innovations Centre of Excellence, Brno University of Technology, Brno, Czech Republic
Lukáš Holík, Ondřej Lengál, Lenka Turoňová & Tomáš Vojnar
Microsoft Research, Redmond, USA
Olli Saarikivi & Margus Veanes

Authors

Lukáš Holík
View author publications
You can also search for this author in PubMed Google Scholar
Ondřej Lengál
View author publications
You can also search for this author in PubMed Google Scholar
Olli Saarikivi
View author publications
You can also search for this author in PubMed Google Scholar
Lenka Turoňová
View author publications
You can also search for this author in PubMed Google Scholar
Margus Veanes
View author publications
You can also search for this author in PubMed Google Scholar
Tomáš Vojnar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Margus Veanes .

Editor information

Editors and Affiliations

University of Kaiserslautern, Kaiserslautern, Germany
Anthony Widjaja Lin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Holík, L., Lengál, O., Saarikivi, O., Turoňová, L., Veanes, M., Vojnar, T. (2019). Succinct Determinisation of Counting Automata via Sphere Construction. In: Lin, A. (eds) Programming Languages and Systems. APLAS 2019. Lecture Notes in Computer Science(), vol 11893. Springer, Cham. https://doi.org/10.1007/978-3-030-34175-6_24

Download citation

DOI: https://doi.org/10.1007/978-3-030-34175-6_24
Published: 18 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34174-9
Online ISBN: 978-3-030-34175-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics