Skip to main content

Fast Practical Compression of Deterministic Finite Automata

  • Conference paper
  • First Online:
SOFSEM 2025: Theory and Practice of Computer Science (SOFSEM 2025)

Abstract

We revisit the popular delayed deterministic finite automaton (D2FA) compression algorithm introduced by Kumar et al. [SIGCOMM 2006] for compressing deterministic finite automata (DFAs) used in intrusion detection systems. This compression scheme exploits similarities in the outgoing sets of transitions among states to achieve strong compression while maintaining high throughput for matching.

Unfortunately, the D2FA algorithm and later variants of it require at least quadratic compression time since they compare all pairs of states to compute an optimal compression. This is too slow and, in some cases, even infeasible for collections of regular expression in modern intrusion detection systems that produce DFAs of millions of states.

Our main result is a simple, general framework for constructing D2FA based on locality-sensitive hashing that constructs an approximation of the optimal D2FA in near-linear time. We apply our approach to the original D2FA compression algorithm and two important variants, and we experimentally evaluate our algorithms on DFAs from widely used modern intrusion detection systems. Overall, our new algorithms compress up to an order of magnitude faster than existing solutions with either no or little loss of compression size. Consequently, our algorithms are significantly more scalable and can handle larger collections of regular expressions than previous solutions.

Partially supported by the Independent Research Fund Denmark (DFF-9131-00069B and 10.46540/3105-00302B).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The running time is not explicitly stated in the paper, but follows from the description.

References

  1. https://www.suricata.io/

  2. Aho, A.V., Corasick, M.J.: Efficient string matching: an aid to bibliographic search. Commun. ACM 18(6), 333–340 (1975). https://doi.org/10.1145/360825.360855

  3. Antonello, R., Fernandes, S.F.L., Sadok, D., Kelner, J., Szabó, G.: Deterministic finite automaton for scalable traffic identification: the power of compressing by range. In: NOMS 2012, pp. 155–162 (2012). https://doi.org/10.1109/NOMS.2012.6211894

  4. Antonello, R., Fernandes, S.F.L., Sadok, D.F.H., Kelner, J., Szabó, G.: Design and optimizations for efficient regular expression matching in DPI systems. Comput. Commun. 61, 103–120 (2015). https://doi.org/10.1016/j.comcom.2014.12.011

    Article  MATH  Google Scholar 

  5. Becchi, M., Cadambi, S.: Memory-efficient regular expression search using state merging. In: Proceedings of 26th INFOCOM, pp. 1064–1072 (2007). https://doi.org/10.1109/INFCOM.2007.128

  6. Becchi, M., Crowley, P.: A hybrid finite automaton for practical deep packet inspection. In: Proceedings of 3rd CoNEXT Conference, pp. 1–12 (2007)

    Google Scholar 

  7. Becchi, M., Crowley, P.: An improved algorithm to accelerate regular expression evaluation. In: Proceedings of ANCS 2007, pp. 145–154 (2007). https://doi.org/10.1145/1323548.1323573

  8. Becchi, M., Crowley, P.: Efficient regular expression evaluation: theory to practice. In: Proceedings of ANCS 2008, pp. 50–59 (2008). https://doi.org/10.1145/1477942.1477950

  9. Becchi, M., Crowley, P.: A-DFA: A time- and space-efficient DFA compression algorithm for fast regular expression evaluation. ACM Trans. Archit. Code Optim. 10(1), 4:1–4:26 (2013). https://doi.org/10.1145/2445572.2445576

  10. Bille, P., Gørtz, I.L., Pedersen, M.R.: Fast practical compression of deterministic finite automata. arXiv:2306.12771 (2024)

    Google Scholar 

  11. Bille, P., Gørtz, I.L., Puglisi, S.J., Tarnow, S.R.: Hierarchical relative lempel-ziv compression. In: Proceedings of 21st SEA (2023)

    Google Scholar 

  12. Black, J., Halevi, S., Krawczyk, H., Krovetz, T., Rogaway, P.: UMAC: fast and secure message authentication. In: Wiener, M. (ed.) CRYPTO 1999. LNCS, vol. 1666, pp. 216–233. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-48405-1_14

    Chapter  MATH  Google Scholar 

  13. Broder, A.Z.: On the resemblance and containment of documents. In: Proceedings of SEQUENCES, pp. 21–29 (1997)

    Google Scholar 

  14. Broder, A.Z., Charikar, M., Frieze, A.M., Mitzenmacher, M.: Min-wise independent permutations. J. Comput. Syst. Sci. 60(3), 630–659 (2000). https://doi.org/10.1006/jcss.1999.1690

    Article  MathSciNet  MATH  Google Scholar 

  15. Brodie, B.C., Taylor, D.E., Cytron, R.K.: A scalable architecture for high-throughput regular-expression pattern matching. In: Proceedings of 33rd ISCA, pp. 191–202 (2006). https://doi.org/10.1109/ISCA.2006.7

  16. Charikar, M.: Similarity estimation techniques from rounding algorithms. In: Proceedings of 34th STOC, pp. 380–388 (2002). https://doi.org/10.1145/509907.509965

  17. Ding, S., Attenberg, J., Suel, T.: Scalable techniques for document identifier assignment in inverted indexes. In: Proceedings of 19th WWW, pp. 311–320 (2010)

    Google Scholar 

  18. Douglis, F., Iyengar, A.: Application-specific delta-encoding via resemblance detection. In: Proceedings of USENIX ATC, General Track 2003, pp. 113–126 (2003). http://www.usenix.org/events/usenix03/tech/douglis.html

  19. Farley, A.M., Hedetniemi, S.T., Proskurowski, A.: Partitioning trees: matching, domination, and maximum diameter. Int. J. Parallel Program. 10(1), 55–61 (1981). https://doi.org/10.1007/BF00978378

    Article  MathSciNet  MATH  Google Scholar 

  20. Ficara, D., Giordano, S., Procissi, G., Vitucci, F., Antichi, G., Pietro, A.D.: An improved DFA for fast regular expression matching. Comput. Commun. Rev. 38(5), 29–40 (2008). https://doi.org/10.1145/1452335.1452339

    Article  Google Scholar 

  21. Ficara, D., Pietro, A.D., Giordano, S., Procissi, G., Vitucci, F., Antichi, G.: Differential encoding of dfas for fast regular expression matching. IEEE/ACM Trans. Netw. 19(3), 683–694 (2011). https://doi.org/10.1109/TNET.2010.2089639

    Article  Google Scholar 

  22. Gong, L., Wang, C., Xia, H., Chen, X., Li, X., Zhou, X.: Enabling fast and memory-efficient acceleration for pattern matching workloads: the lightweight automata processing engine. IEEE Trans. Comput. 72(4), 1011–1025 (2023). https://doi.org/10.1109/TC.2022.3187338

    Article  MATH  Google Scholar 

  23. Har-Peled, S., Indyk, P., Motwani, R.: Approximate nearest neighbor: towards removing the curse of dimensionality. Theory Comput. 8(1), 321–350 (2012). https://doi.org/10.4086/toc.2012.v008a014

    Article  MathSciNet  MATH  Google Scholar 

  24. Hemmingsen, M., Lam, B.W.: Fast Compression of DFAs for Intrusion Detection Systems. Master’s thesis, Tech. Uni. Denmark. (2021)

    Google Scholar 

  25. Kong, S., Smith, R., Estan, C.: Efficient signature matching with multiple alphabet compression tables. In: Proceedings of 4th SECURECOMM, p. 1 (2008). https://doi.org/10.1145/1460877.1460879

  26. Krcál, L., Holub, J.: Incremental locality and clustering-based compression. In: DCC 2015, pp. 203–212 (2015). https://doi.org/10.1109/DCC.2015.23

  27. Kruskal, J.B.: On the shortest spanning subtree of a graph and the traveling salesman problem. Proc. Am. Math. Soc. 7(1), 48–50 (1956)

    Article  MathSciNet  MATH  Google Scholar 

  28. Kulkarni, P., Douglis, F., LaVoie, J.D., Tracey, J.M.: Redundancy elimination within large collections of files. In: Proceedings of USENIX ATC, General Track 2004, pp. 59–72 (2004)

    Google Scholar 

  29. Kumar, S., Dharmapurikar, S., Yu, F., Crowley, P., Turner, J.S.: Algorithms to accelerate multiple regular expressions matching for deep packet inspection. In: Proceedings of SIGCOMM 2006, pp. 339–350 (2006). https://doi.org/10.1145/1159913.1159952

  30. Kumar, S., Turner, J.S., Williams, J.: Advanced algorithms for fast and scalable deep packet inspection. In: Proceedings of ANCS 2006, pp. 81–92 (2006). https://doi.org/10.1145/1185347.1185359

  31. Liu, A.X., Torng, E.: An overlay automata approach to regular expression matching. In: Proceedings of 33rd INFOCOM, pp. 952–960 (2014). https://doi.org/10.1109/INFOCOM.2014.6848024

  32. Liu, S., Su, S., Liu, D., Huang, Z., Xiao, M.: Efficient compression algorithm for ternary content addressable memory-based regular expression matching. Electron. Lett. 53(3), 152–154 (2017)

    Article  MATH  Google Scholar 

  33. Matousek, D., Kubis, J., Matousek, J., Korenek, J.: Regular expression matching with pipelined delayed input dfas for high-speed networks. In: Proceedings of ANCS 2018, pp. 104–110 (2018). https://doi.org/10.1145/3230718.3230730

  34. Matousek, D., Matousek, J., Korenek, J.: High-speed regular expression matching with pipelined memory-based automata. In: Proceedings of 26th FCCM, p. 214 (2018). https://doi.org/10.1109/FCCM.2018.00048

  35. Meiners, C.R., Patel, J., Norige, E., Torng, E., Liu, A.X.: Fast regular expression matching using small tcams for network intrusion detection and prevention systems. In: 19th USENIX Security, pp. 111–126 (2010)

    Google Scholar 

  36. Ouyang, Z., Memon, N.D., Suel, T., Trendafilov, D.: Cluster-based delta compression of a collection of files. In: Proceedings of 3rd WISE, pp. 257–268 (2002). https://doi.org/10.1109/WISE.2002.1181662

  37. Patel, J., Liu, A.X., Torng, E.: Bypassing space explosion in high-speed regular expression matching. IEEE/ACM Trans. Netw. 22(6), 1701–1714 (2014). https://doi.org/10.1109/TNET.2014.2309014

    Article  MATH  Google Scholar 

  38. Paxson, V.: Bro: a system for detecting network intruders in real-time. Comput. Netw. 31(23–24), 2435–2463 (1999)

    Article  MATH  Google Scholar 

  39. Peel, A., Wirth, A., Zobel, J.: Collection-based compression using discovered long matching strings. In: Proceedings of 20th CIKM, pp. 2361–2364 (2011). https://doi.org/10.1145/2063576.2063967

  40. Prim, R.C.: Shortest connection networks and some generalizations. Bell Syst. Tech. J. 36(6), 1389–1401 (1957)

    Article  MATH  Google Scholar 

  41. Prithi, S., Sumathi, S.: A survey on recent dfa compression techniques for deep packet inspection in network intrusion detection system. J. Electr. Eng. 17(3), 14–14 (2017)

    MATH  Google Scholar 

  42. Qi, Y., et al.: FEACAN: front-end acceleration for content-aware network processing. In: Proceedings of 30th INFOCOM, pp. 2114–2122 (2011). https://doi.org/10.1109/INFCOM.2011.5935021

  43. Roesch, M.: Snort: lightweight intrusion detection for networks. In: Proceedings of 13th LISA, pp. 229–238 (1999)

    Google Scholar 

  44. Roussev, V.: Data fingerprinting with similarity digests. In: IFIP International Conference Digital Forensics 2010, vol. 337, pp. 207–226 (2010). https://doi.org/10.1007/978-3-642-15506-2_15

  45. Shankar, S.S., Lin, P., Herkersdorf, A., Wild, T.: A divide and conquer state grouping method for bitmap based transition compression. In: Proceedings of 18th PDCAT, pp. 400–406 (2017). https://doi.org/10.1109/PDCAT.2017.00071

  46. Shilane, P., Huang, M., Wallace, G., Hsu, W.: Wan-optimized replication of backup datasets using stream-informed delta compression. ACM Trans. Storage 8(4), 13:1–13:26 (2012). https://doi.org/10.1145/2385603.2385606

  47. Tang, Q., Jiang, L., Dai, Q., Su, M., Xie, H., Fang, B.: RICS-DFA: a space and time-efficient signature matching algorithm with reduced input character set. Concurr. Comput. Pract. Exp. 29(20) (2017). https://doi.org/10.1002/cpe.3940

  48. Tuck, N., Sherwood, T., Calder, B., Varghese, G.: Deterministic memory-efficient string matching algorithms for intrusion detection. In: Proceedings of 23rd INFOCOM, pp. 2628–2639 (2004). https://doi.org/10.1109/INFCOM.2004.1354682

  49. Xia, W., Jiang, H., Feng, D., Hua, Y.: Silo: a similarity-locality based near-exact deduplication scheme with low RAM overhead and high throughput. In: USENIX ATC 2011 (2011)

    Google Scholar 

  50. Xu, C., Chen, S., Su, J., Yiu, S., Hui, L.C.K.: A survey on regular expression matching for deep packet inspection: applications, algorithms, and hardware platforms. IEEE Commun. Surv. Tutorials 18(4), 2991–3029 (2016). https://doi.org/10.1109/COMST.2016.2566669

    Article  MATH  Google Scholar 

  51. Yu, F., Chen, Z., Diao, Y., Lakshman, T.V., Katz, R.H.: Fast and memory-efficient regular expression matching for deep packet inspection. In: Proceedings of ANCS 2006, pp. 93–102 (2006). https://doi.org/10.1145/1185347.1185360

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Philip Bille .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bille, P., Gørtz, I.L., Pedersen, M.R. (2025). Fast Practical Compression of Deterministic Finite Automata. In: Královič, R., Kůrková, V. (eds) SOFSEM 2025: Theory and Practice of Computer Science. SOFSEM 2025. Lecture Notes in Computer Science, vol 15538. Springer, Cham. https://doi.org/10.1007/978-3-031-82670-2_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-82670-2_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-82669-6

  • Online ISBN: 978-3-031-82670-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics