Abstract
We revisit the popular delayed deterministic finite automaton (D2FA) compression algorithm introduced by Kumar et al. [SIGCOMM 2006] for compressing deterministic finite automata (DFAs) used in intrusion detection systems. This compression scheme exploits similarities in the outgoing sets of transitions among states to achieve strong compression while maintaining high throughput for matching.
Unfortunately, the D2FA algorithm and later variants of it require at least quadratic compression time since they compare all pairs of states to compute an optimal compression. This is too slow and, in some cases, even infeasible for collections of regular expression in modern intrusion detection systems that produce DFAs of millions of states.
Our main result is a simple, general framework for constructing D2FA based on locality-sensitive hashing that constructs an approximation of the optimal D2FA in near-linear time. We apply our approach to the original D2FA compression algorithm and two important variants, and we experimentally evaluate our algorithms on DFAs from widely used modern intrusion detection systems. Overall, our new algorithms compress up to an order of magnitude faster than existing solutions with either no or little loss of compression size. Consequently, our algorithms are significantly more scalable and can handle larger collections of regular expressions than previous solutions.
Partially supported by the Independent Research Fund Denmark (DFF-9131-00069B and 10.46540/3105-00302B).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The running time is not explicitly stated in the paper, but follows from the description.
References
Aho, A.V., Corasick, M.J.: Efficient string matching: an aid to bibliographic search. Commun. ACM 18(6), 333–340 (1975). https://doi.org/10.1145/360825.360855
Antonello, R., Fernandes, S.F.L., Sadok, D., Kelner, J., Szabó, G.: Deterministic finite automaton for scalable traffic identification: the power of compressing by range. In: NOMS 2012, pp. 155–162 (2012). https://doi.org/10.1109/NOMS.2012.6211894
Antonello, R., Fernandes, S.F.L., Sadok, D.F.H., Kelner, J., Szabó, G.: Design and optimizations for efficient regular expression matching in DPI systems. Comput. Commun. 61, 103–120 (2015). https://doi.org/10.1016/j.comcom.2014.12.011
Becchi, M., Cadambi, S.: Memory-efficient regular expression search using state merging. In: Proceedings of 26th INFOCOM, pp. 1064–1072 (2007). https://doi.org/10.1109/INFCOM.2007.128
Becchi, M., Crowley, P.: A hybrid finite automaton for practical deep packet inspection. In: Proceedings of 3rd CoNEXT Conference, pp. 1–12 (2007)
Becchi, M., Crowley, P.: An improved algorithm to accelerate regular expression evaluation. In: Proceedings of ANCS 2007, pp. 145–154 (2007). https://doi.org/10.1145/1323548.1323573
Becchi, M., Crowley, P.: Efficient regular expression evaluation: theory to practice. In: Proceedings of ANCS 2008, pp. 50–59 (2008). https://doi.org/10.1145/1477942.1477950
Becchi, M., Crowley, P.: A-DFA: A time- and space-efficient DFA compression algorithm for fast regular expression evaluation. ACM Trans. Archit. Code Optim. 10(1), 4:1–4:26 (2013). https://doi.org/10.1145/2445572.2445576
Bille, P., Gørtz, I.L., Pedersen, M.R.: Fast practical compression of deterministic finite automata. arXiv:2306.12771 (2024)
Bille, P., Gørtz, I.L., Puglisi, S.J., Tarnow, S.R.: Hierarchical relative lempel-ziv compression. In: Proceedings of 21st SEA (2023)
Black, J., Halevi, S., Krawczyk, H., Krovetz, T., Rogaway, P.: UMAC: fast and secure message authentication. In: Wiener, M. (ed.) CRYPTO 1999. LNCS, vol. 1666, pp. 216–233. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-48405-1_14
Broder, A.Z.: On the resemblance and containment of documents. In: Proceedings of SEQUENCES, pp. 21–29 (1997)
Broder, A.Z., Charikar, M., Frieze, A.M., Mitzenmacher, M.: Min-wise independent permutations. J. Comput. Syst. Sci. 60(3), 630–659 (2000). https://doi.org/10.1006/jcss.1999.1690
Brodie, B.C., Taylor, D.E., Cytron, R.K.: A scalable architecture for high-throughput regular-expression pattern matching. In: Proceedings of 33rd ISCA, pp. 191–202 (2006). https://doi.org/10.1109/ISCA.2006.7
Charikar, M.: Similarity estimation techniques from rounding algorithms. In: Proceedings of 34th STOC, pp. 380–388 (2002). https://doi.org/10.1145/509907.509965
Ding, S., Attenberg, J., Suel, T.: Scalable techniques for document identifier assignment in inverted indexes. In: Proceedings of 19th WWW, pp. 311–320 (2010)
Douglis, F., Iyengar, A.: Application-specific delta-encoding via resemblance detection. In: Proceedings of USENIX ATC, General Track 2003, pp. 113–126 (2003). http://www.usenix.org/events/usenix03/tech/douglis.html
Farley, A.M., Hedetniemi, S.T., Proskurowski, A.: Partitioning trees: matching, domination, and maximum diameter. Int. J. Parallel Program. 10(1), 55–61 (1981). https://doi.org/10.1007/BF00978378
Ficara, D., Giordano, S., Procissi, G., Vitucci, F., Antichi, G., Pietro, A.D.: An improved DFA for fast regular expression matching. Comput. Commun. Rev. 38(5), 29–40 (2008). https://doi.org/10.1145/1452335.1452339
Ficara, D., Pietro, A.D., Giordano, S., Procissi, G., Vitucci, F., Antichi, G.: Differential encoding of dfas for fast regular expression matching. IEEE/ACM Trans. Netw. 19(3), 683–694 (2011). https://doi.org/10.1109/TNET.2010.2089639
Gong, L., Wang, C., Xia, H., Chen, X., Li, X., Zhou, X.: Enabling fast and memory-efficient acceleration for pattern matching workloads: the lightweight automata processing engine. IEEE Trans. Comput. 72(4), 1011–1025 (2023). https://doi.org/10.1109/TC.2022.3187338
Har-Peled, S., Indyk, P., Motwani, R.: Approximate nearest neighbor: towards removing the curse of dimensionality. Theory Comput. 8(1), 321–350 (2012). https://doi.org/10.4086/toc.2012.v008a014
Hemmingsen, M., Lam, B.W.: Fast Compression of DFAs for Intrusion Detection Systems. Master’s thesis, Tech. Uni. Denmark. (2021)
Kong, S., Smith, R., Estan, C.: Efficient signature matching with multiple alphabet compression tables. In: Proceedings of 4th SECURECOMM, p. 1 (2008). https://doi.org/10.1145/1460877.1460879
Krcál, L., Holub, J.: Incremental locality and clustering-based compression. In: DCC 2015, pp. 203–212 (2015). https://doi.org/10.1109/DCC.2015.23
Kruskal, J.B.: On the shortest spanning subtree of a graph and the traveling salesman problem. Proc. Am. Math. Soc. 7(1), 48–50 (1956)
Kulkarni, P., Douglis, F., LaVoie, J.D., Tracey, J.M.: Redundancy elimination within large collections of files. In: Proceedings of USENIX ATC, General Track 2004, pp. 59–72 (2004)
Kumar, S., Dharmapurikar, S., Yu, F., Crowley, P., Turner, J.S.: Algorithms to accelerate multiple regular expressions matching for deep packet inspection. In: Proceedings of SIGCOMM 2006, pp. 339–350 (2006). https://doi.org/10.1145/1159913.1159952
Kumar, S., Turner, J.S., Williams, J.: Advanced algorithms for fast and scalable deep packet inspection. In: Proceedings of ANCS 2006, pp. 81–92 (2006). https://doi.org/10.1145/1185347.1185359
Liu, A.X., Torng, E.: An overlay automata approach to regular expression matching. In: Proceedings of 33rd INFOCOM, pp. 952–960 (2014). https://doi.org/10.1109/INFOCOM.2014.6848024
Liu, S., Su, S., Liu, D., Huang, Z., Xiao, M.: Efficient compression algorithm for ternary content addressable memory-based regular expression matching. Electron. Lett. 53(3), 152–154 (2017)
Matousek, D., Kubis, J., Matousek, J., Korenek, J.: Regular expression matching with pipelined delayed input dfas for high-speed networks. In: Proceedings of ANCS 2018, pp. 104–110 (2018). https://doi.org/10.1145/3230718.3230730
Matousek, D., Matousek, J., Korenek, J.: High-speed regular expression matching with pipelined memory-based automata. In: Proceedings of 26th FCCM, p. 214 (2018). https://doi.org/10.1109/FCCM.2018.00048
Meiners, C.R., Patel, J., Norige, E., Torng, E., Liu, A.X.: Fast regular expression matching using small tcams for network intrusion detection and prevention systems. In: 19th USENIX Security, pp. 111–126 (2010)
Ouyang, Z., Memon, N.D., Suel, T., Trendafilov, D.: Cluster-based delta compression of a collection of files. In: Proceedings of 3rd WISE, pp. 257–268 (2002). https://doi.org/10.1109/WISE.2002.1181662
Patel, J., Liu, A.X., Torng, E.: Bypassing space explosion in high-speed regular expression matching. IEEE/ACM Trans. Netw. 22(6), 1701–1714 (2014). https://doi.org/10.1109/TNET.2014.2309014
Paxson, V.: Bro: a system for detecting network intruders in real-time. Comput. Netw. 31(23–24), 2435–2463 (1999)
Peel, A., Wirth, A., Zobel, J.: Collection-based compression using discovered long matching strings. In: Proceedings of 20th CIKM, pp. 2361–2364 (2011). https://doi.org/10.1145/2063576.2063967
Prim, R.C.: Shortest connection networks and some generalizations. Bell Syst. Tech. J. 36(6), 1389–1401 (1957)
Prithi, S., Sumathi, S.: A survey on recent dfa compression techniques for deep packet inspection in network intrusion detection system. J. Electr. Eng. 17(3), 14–14 (2017)
Qi, Y., et al.: FEACAN: front-end acceleration for content-aware network processing. In: Proceedings of 30th INFOCOM, pp. 2114–2122 (2011). https://doi.org/10.1109/INFCOM.2011.5935021
Roesch, M.: Snort: lightweight intrusion detection for networks. In: Proceedings of 13th LISA, pp. 229–238 (1999)
Roussev, V.: Data fingerprinting with similarity digests. In: IFIP International Conference Digital Forensics 2010, vol. 337, pp. 207–226 (2010). https://doi.org/10.1007/978-3-642-15506-2_15
Shankar, S.S., Lin, P., Herkersdorf, A., Wild, T.: A divide and conquer state grouping method for bitmap based transition compression. In: Proceedings of 18th PDCAT, pp. 400–406 (2017). https://doi.org/10.1109/PDCAT.2017.00071
Shilane, P., Huang, M., Wallace, G., Hsu, W.: Wan-optimized replication of backup datasets using stream-informed delta compression. ACM Trans. Storage 8(4), 13:1–13:26 (2012). https://doi.org/10.1145/2385603.2385606
Tang, Q., Jiang, L., Dai, Q., Su, M., Xie, H., Fang, B.: RICS-DFA: a space and time-efficient signature matching algorithm with reduced input character set. Concurr. Comput. Pract. Exp. 29(20) (2017). https://doi.org/10.1002/cpe.3940
Tuck, N., Sherwood, T., Calder, B., Varghese, G.: Deterministic memory-efficient string matching algorithms for intrusion detection. In: Proceedings of 23rd INFOCOM, pp. 2628–2639 (2004). https://doi.org/10.1109/INFCOM.2004.1354682
Xia, W., Jiang, H., Feng, D., Hua, Y.: Silo: a similarity-locality based near-exact deduplication scheme with low RAM overhead and high throughput. In: USENIX ATC 2011 (2011)
Xu, C., Chen, S., Su, J., Yiu, S., Hui, L.C.K.: A survey on regular expression matching for deep packet inspection: applications, algorithms, and hardware platforms. IEEE Commun. Surv. Tutorials 18(4), 2991–3029 (2016). https://doi.org/10.1109/COMST.2016.2566669
Yu, F., Chen, Z., Diao, Y., Lakshman, T.V., Katz, R.H.: Fast and memory-efficient regular expression matching for deep packet inspection. In: Proceedings of ANCS 2006, pp. 93–102 (2006). https://doi.org/10.1145/1185347.1185360
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Bille, P., Gørtz, I.L., Pedersen, M.R. (2025). Fast Practical Compression of Deterministic Finite Automata. In: Královič, R., Kůrková, V. (eds) SOFSEM 2025: Theory and Practice of Computer Science. SOFSEM 2025. Lecture Notes in Computer Science, vol 15538. Springer, Cham. https://doi.org/10.1007/978-3-031-82670-2_11
Download citation
DOI: https://doi.org/10.1007/978-3-031-82670-2_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-82669-6
Online ISBN: 978-3-031-82670-2
eBook Packages: Computer ScienceComputer Science (R0)