Abstract
We propose an efficient algorithm for determinising counting automata (CAs), i.e., finite automata extended with bounded counters. The algorithm avoids unfolding counters into control states, unlike the naïve approach, and thus produces much smaller deterministic automata. We also develop a simplified and faster version of the general algorithm for the sub-class of so-called monadic CAs (MCAs), i.e., CAs with counting loops on character classes, which are common in practice. Our main motivation is (besides applications in verification and decision procedures of logics) the application of deterministic (M)CAs in pattern matching regular expressions with counting, which are very common in e.g. network traffic processing and log analysis. We have evaluated our algorithm against practical benchmarks from these application domains and concluded that compared to the naïve approach, our algorithm is much less prone to explode, produces automata that can be several orders of magnitude smaller, and is overall faster.
This work has been supported by the Czech Science Foundation (project No. 19-24397S), the IT4Innovations Excellence in Science (project No. LQ1602), and the FIT BUT internal project FIT-S-17-4014.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
To handle large or infinite sets of symbols symbolically, the predicates \(\texttt {l}= a\) may be generalised to predicates from an arbitrary effective Boolean algebra, as in [6].
- 2.
A Boolean combination of atomic guards and updates can be factorised through (1) a transformation to DNF, yielding a set of clauses X; (2) writing each clause \(\varphi \in X\) as a conjunction of a guard formula \(g_\varphi \) and an assignment formula \(f_\varphi \); (3) computing minterms of the set \(\{g_\varphi \mid \varphi \in X\}\); (4) creating one factor \((g)\wedge (f)\) from every minterm g where f is the disjunction of all the assignment formulae \(f_\varphi \) with \(\varphi \in X\) compatible with g (i.e., such that \(g\wedge f_\varphi \) is satisfiable).
- 3.
We note that we only need to use a specialised, simple, and cheap quantifier elimination. In particular, we only need to eliminate counter variables c from formulae such that, in clauses of their DNF, c always appears together with a predicate \(c=p\) where p is a parameter. Eliminating c from such a DNF clause is then done by simply substituting occurrences of c by p. We do not need complex algorithms such as the general quantifier elimination for Presburger arithmetic.
- 4.
The choice of the parameters in the image of \(\theta _{ at }: at ( u _i)\rightarrow \mathcal {P}'\) on line 9 is arbitrary, although, in practice, it would be sensible to define some systematic parameter naming policy and reuse existing parameters whenever possible.
- 5.
For this step to preserve the language of the automaton, we need to assume that the input CA does not assign nondeterministic values to live counters. We are refering to the standard notion: a counter is live at a state if the value it holds at that state may influence satisfaction of some guard in the future. Any CA can be transformed into this form, and CAs we compile from regular expressions satisfy this condition by construction.
- 6.
We note that we restrict ourselves to range sub-expressions of the form \(\sigma \{n,n\}\) or \(\sigma \{0,n\}\) only. This is without loss of generality since a general range expression \(\sigma \{m,n\}\) can be rewritten as \(\sigma \{m,m\}.\sigma \{0,n-m\}\).
- 7.
Notice that the guards \(c_q < {{\varvec{max}}}_{q}\) on the incrementing self-loops of exact counting states could be removed without affecting the language since when \(c_q\) exceeds \({{\varvec{max}}}_{q}\), then the run can never leave q and has thus no chance of accepting. We include these guards only to conform to the condition on boundedness of counter values in the definition of CAs.
- 8.
Notice that maintaining a fixed association of a parameter to a counter is a difference from Algorithms 1 and 2, where one parameter may represent different counters.
- 9.
The fact that this relation is indeed a simulation can be seen from that both the higher and lower value of \(c_q\) can use any exit transition of q at any moment regardless of the value of \(c_q\), but the lower value of \(c_q\) can stay in the counting loop longer.
References
Abdulla, P.A., Krcal, P., Yi, W.: R-automata. In: van Breugel, F., Chechik, M. (eds.) CONCUR 2008. LNCS, vol. 5201, pp. 67–81. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-85361-9_9
Bardin, S., Finkel, A., Leroux, J., Petrucci, L.: FAST: acceleration from theory to practice. STTT 10(5) (2008)
Börklund, E., Martens, W., Timm, T.: Efficient incremental evaluation of succinct regular expressions. In: Proceedings of CIKM 2015, ACM (2015)
Chen, H., Lu, P.: Checking determinism of regular expressions with counting. Inf. Comput. 241, 302–320 (2015)
Cheng, K., Krishnakumar, A.S.: Automatic functional test generation using the extended finite state machine model. In: Proceedings of DAC 1993, ACM Press (1993)
D’Antoni, L., Veanes, M.: Minimization of symbolic automata. In: Proceedings of POPL 2014, ACM (2014)
Dill, D.L., Hu, A.J., Wong-Toi, H.: Checking for language inclusion using simulation preorders. In: Larsen, K.G., Skou, A. (eds.) CAV 1991. LNCS, vol. 575, pp. 255–265. Springer, Heidelberg (1992). https://doi.org/10.1007/3-540-55179-4_25
Gelade, W., Martens, W., Neven, F.: Optimizing schema languages for XML: numerical constraints and interleaving. In: Schwentick, T., Suciu, D. (eds.) ICDT 2007. LNCS, vol. 4353, pp. 269–283. Springer, Heidelberg (2006). https://doi.org/10.1007/11965893_19
Gelade, W., Gyssens, M., Martens, W.: Regular expressions with counting: weak versus strong determinism. In: Královič, R., Niwiński, D. (eds.) MFCS 2009. LNCS, vol. 5734, pp. 369–381. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03816-7_32
van Glabbeek, R., Ploeger, B.: Five Determinisation algorithms. In: Ibarra, O.H., Ravikumar, B. (eds.) CIAA 2008. LNCS, vol. 5148, pp. 161–170. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-70844-5_17
Heizmann, M., Hoenicke, J., Podelski, A.: Software model checking for people who love automata. In: Sharygina, N., Veith, H. (eds.) CAV 2013. LNCS, vol. 8044, pp. 36–52. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39799-8_2
Henriksen, J.G., et al.: Mona: monadic second-order logic in practice. In: Brinksma, E., Cleaveland, W.R., Larsen, K.G., Margaria, T., Steffen, B. (eds.) TACAS 1995. LNCS, vol. 1019, pp. 89–110. Springer, Heidelberg (1995). https://doi.org/10.1007/3-540-60630-0_5
Hovland, D.: Regular expressions with numerical constraints and automata with counters. In: Leucker, M., Morgan, C. (eds.) ICTAC 2009. LNCS, vol. 5684, pp. 231–245. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03466-4_15
Hovland, D.: The membership problem for regular expressions with unordered concatenation and numerical constraints. In: Dediu, A.-H., Martín-Vide, C. (eds.) LATA 2012. LNCS, vol. 7183, pp. 313–324. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28332-1_27
Kilpeläinen, P., Tuhkanen, R.: One-unambiguity of regular expressions with numeric occurrence indicators. Inf. Comput. 205(6), 890–916 (2007)
Lengál, O., Šimáček, J., Vojnar, T.: VATA: a library for efficient manipulation of non-deterministic tree automata. In: Flanagan, C., König, B. (eds.) TACAS 2012. LNCS, vol. 7214, pp. 79–94. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28756-5_7
Roesch, M., et al.: Snort: A Network Intrusion Detection and Prevention System. http://www.snort.org
Microsoft Automata Library: Automata and Transducer Library for .NET. https://github.com/AutomataDotNet/Automata
OWASP Foundation and Checkmarx: Regular Expression Denial of Service: ReDoS (2017)
RegExLib.com: The Internet’s First Regular Expression Library. http://regexlib.com/
Sommer, R., et al.: The Bro Network Security Monitor. http://www.bro.org
Shiple, T.R., Kukula, J.H., Ranjan, R.K.: A comparison of Presburger engines for EFSM reachability. In: Hu, A.J., Vardi, M.Y. (eds.) CAV 1998. LNCS, vol. 1427, pp. 280–292. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0028752
Smith, R., Estan, C., Jha, S.: XFA: faster signature matching with extended automata. In: Proceedings of SSP 2008, IEEE (2008)
Smith, R., Estan, C., Jha, S., Siahaan, I.: Fast signature matching using extended finite automaton (XFA). In: Sekar, R., Pujari, A.K. (eds.) ICISS 2008. LNCS, vol. 5352, pp. 158–172. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-89862-7_15
Sperberg-McQueen, M.: Notes on Finite State Automata with Counters. https://www.w3.org/XML/2004/05/msm-cfa.html. Accessed 08 Aug 2018
The Sagan Team: The Sagan Log Analysis Engine. https://quadrantsec.com/sagan_log_analysis_engine/
Thompson, K.: Programming techniques: regular expression search algorithm. Commun. ACM 11(6), 419–422 (1968)
Češka, M., Havlena, V., Holík, L., Lengál, O., Vojnar, T.: Approximate reduction of finite automata for high-speed network intrusion detection. In: Beyer, D., Huisman, M. (eds.) TACAS 2018. LNCS, vol. 10806, pp. 155–175. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-89963-3_9
Yang, L., Karim, R., Ganapathy, V., Smith, R.: Improving NFA-based signature matching using ordered binary decision diagrams. In: Jha, S., Sommer, R., Kreibich, C. (eds.) RAID 2010. LNCS, vol. 6307, pp. 58–78. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15512-3_4
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Holík, L., Lengál, O., Saarikivi, O., Turoňová, L., Veanes, M., Vojnar, T. (2019). Succinct Determinisation of Counting Automata via Sphere Construction. In: Lin, A. (eds) Programming Languages and Systems. APLAS 2019. Lecture Notes in Computer Science(), vol 11893. Springer, Cham. https://doi.org/10.1007/978-3-030-34175-6_24
Download citation
DOI: https://doi.org/10.1007/978-3-030-34175-6_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34174-9
Online ISBN: 978-3-030-34175-6
eBook Packages: Computer ScienceComputer Science (R0)