Abstract
In the last decade, the volume of unstructured data that Internet and enterprise applications create and consume has been growing at impressive rates. The tools we use to process these data are search engines, business analytics suites, natural-language processors and XML processors. These tools rely on tokenization, a form of regular expression matching aimed at extracting words and keywords in a character stream. The further growth of unstructured data-processing paradigms depends critically on the availability of high-performance tokenizers. Despite the impressive amount of parallelism that the multi-core revolution has made available (in terms of multiple threads and wider SIMD units), most applications employ tokenizers that do not exploit this parallelism. I present a technique to design tokenizers that exploit multiple threads and wide SIMD units to process multiple independent streams of data at a high throughput. The technique benefits indefinitely from any future scaling in the number of threads or SIMD width. I show the approach’s viability by presenting a family of tokenizer kernels optimized for the Cell/B.E. processor that deliver a performance seen, so far, only on dedicated hardware. These kernels deliver a peak throughput of 14.30 Gbps per chip, and a typical throughput of 9.76 Gbps on Wikipedia input. Also, they achieve almost-ideal resource utilization (99.2%). The approach is applicable to any SIMD enabled processor and matches well the trend toward wider SIMD units in contemporary architecture design.
Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Abrash, M.: A first look at the Larrabee new instructions (LRBni). Dr. Dobb’s J. (2009). http://www.drdobbs.com/high-performance-computing/216402188
Aho A.V., Corasick M.J.: Efficient string matching: an aid to bibliographic search. Commun. ACM 18(6), 333–340 (1975)
Antonatos, S., Anagnostakis, K., Polychronakis, M., Markatos, E.: Performance analysis of content matching intrusion detection systems. In: 4th IEEE/IPSJ Symposium on Applications and the Internet (SAINT 2004) (2004)
Berk, E.: Jlex: a lexical analyzer generator for Java (2003)
Bloom B.H.: Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)
Bu L., Chandy J.A.: A CAM-based keyword match processor architecture. Microelectron. J. 37(8), 828–836 (2006)
Bumbulis P., Cowan D.D.: RE2C: a more versatile scanner generator. ACM Lett. Program. Lang. Syst. 2(1–4), 70–84 (1993)
Cameron, R.D.: A case study in SIMD text processing with parallel bit streams—UTF-8 to UTF-16 transcoding. In: 2008 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’08), pp. 91–98. Salt Lake City, Utah (2008)
Cameron, R.D.: Method and apparatus for processing character streams, U.S. Patent 7400271 (2008)
Cavium Networks. Cavium networks debuts NITROX DPI family of layer 7 content processors with market leading performance (press release) (2000)
Chang, C., Paige, R.: From regular expressions to DFAs using compressed NFAs. In: Apostolico, A., Crochemore, M., Galil, Z., Manber, U. (eds.) CPM ’92, Lecture Notes in Computer Science, No. 644, pp. 88–108. Springer (1992)
Degener, J.: ANSI C grammar, lex specification, http://www.lysator.liu.se/c/ANSI-C-grammar-l.html
Dharmapurikar S., Krishnamurthy P., Sproull T.S., Lockwood J.W.: Deep packet inspection using parallel bloom filters. IEEE Micro 24(1), 52–61 (2004)
Firasta, N., Buxton, M., Jinbo, P., Nasri, K., Kuo, S.: Intel AVX: new frontiers in performance improvements and energy efficiency (2008)
The Apache Software Foundation. Lucene, http://lucene.apache.org
Goyal, N., Ormont, J., Smith, R., Sankaralingam, K., Estan, C.: Signature matching in network processing using SIMD/GPU architectures. Technical Report 1628, University of Wisconsin at Madison (2008)
Grosch J.: Efficient generation of lexical analysers. Softw. Pract. Exp. 19(11), 1089–1103 (1989)
Gschwind M., Hofstee H.P., Flachs B., Hopkins M., Watanabe Y., Yamazaki T.: Synergistic processing in Cell’s multicore architecture. IEEE Micro 26(2), 10–24 (2006)
Hofstee, H.P.: Efficient processor architecture and the Cell processor. In: Conference on High Performance Computing Architectures (HPCA’05) (2005)
Hutchings, B.L., Franklin, R., Carver, D.: Assisting network intrusion detection with reconfigurable hardware. In: 10th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’02), p. 111. IEEE Computer Society, Washington, DC, USA (2002)
IDC Corporation. The expanding digital universe. White Pap. (2007)
Intel Corp. Tera-scale research prototype—connecting 80 simple cores on a single test chip (2006)
Intel Corp. Intel SSE4 programming reference, reference number: D91561-001 (2007)
Intel Corp. Single-chip cloud computer; techresearch.intel.com/articles/tera-scale/1826.htm (2009)
Iorio, F., van Lunteren, J.: Fast pattern matching on the Cell broadband engine. In: 2008 Workshop on Cell Systems and Applications (WCSA), affiliated with the 2008 International Symposium on Computer Architecture (ISCA’08), Beijing, China (2008)
Kahle, J.A., Day, M.N., Hofstee, H.P., Johns, C.R., Maeurer, T.R., Shippy D.: Introduction to the Cell multiprocessor. IBM J. Res. Dev. 589–604 (2005)
Kistler, M., Perrone, M., Petrini, F.: Cell processor interconnection network: built for speed. IEEE Micro, 25(3) (2006)
Lee, J., Hwang, S.H., Park, N., Lee, S.-W., Jun, S., Kim, Y.S.: A high performance NIDS using FPGA-based regular expression matching. In: 2007 ACM Symposium on Applied Computing (SAC ’07), pp. 1187–1191. ACM (2007)
Lee, T.-H.: Generalized Aho-Corasick algorithm for signature based anti-virus applications. In: 16th IEEE Internatinal Conference on Computer Communications and Networks (ICCCN’07), pp. 792–797. Honolulu, Hawaii, USA (2007)
Lockwood, J.W., Naufel, N., Turner, J.S., Taylor, D.E.: Reprogrammable network packet processing on the field programmable port extender (FPX). In: ACM International Symposium on Field Programmable Gate Arrays (FPGA 2001), pp. 87–93 (2001)
Moscola, J., Lockwood, J., Loui, R., Pachos, M.: Implementation of a content-scanning module for an internet firewall. In: IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM), pp. 31–38. Napa, CA, USA, (2003)
Nickolls J., Buck I., Garland M., Skadron K.: Scalable parallel programming with CUDA. ACM Queue 6(2), 40–53 (2008)
Nicola, M., John, J.: XML parsing: a threat to database performance. In: Conference on Information and Knowledge Management (CIKM). ACM (2003)
Paxson, V.: Flex—a fast lexical analyzer generator (1988)
Perkins, E., Kostoulas, M., Heifets, A., Matsa, M., Mendelsohn, N.: Performance analysis of XML APIs. In: XML 2005 Conference and Exposition (2005)
RMI—Raza Microelectronics, Inc. XLR700 processor series, next generation multiprocessing, product brief (2008)
Scarpazza, D.P.: Is Larrabee for the rest of us? Dr. Dobb’s J. (2009). http://www.drdobbs.com/high-performance-computing/221601028
Scarpazza, D.P., Braudaway, G.W.: Workload characterization and optimization of high-performance text indexing on the Cell processor. In: IEEE International Symposium on Workload Characterization (IISWC’09), Austin, TX, USA (2009)
Scarpazza, D.P., Villa, O., Petrini, F.: Peak-performance DFA-based string matching on the Cell processor. In: Third Intl. Workshop on System Management Techniques, Processes, and Services (SMTPS), held in conjunction with IPDPS (2007)
Scarpazza, D.P., Villa, O., Petrini, F.: High-speed string searching against large dictionaries on the Cell/B.E. processor. In: 22nd IEEE International Parallel & Distributed Processing Symposium (IPDPS’08), Miami, FL (2008)
Seiler, L., Carmean, D., Sprangle, E., Forsyth, T., Abrash, M., Dubey, P., Junkins, S., Lake, A., Sugerman, J., Cavin, R., Espasa, R., Grochowski, E., Juan, T., Hanrahan, P.: Larrabee: a many-core x86 architecture for visual computing. In: ACM SIGGRAPH 2008, pp. 1–15. ACM, New York, NY, USA (2008)
Smith, R., Goyal, N., Ormont, J., Sankaralingam, K., Estan, C.: Evaluating GPUs for Network Packet Signature Matching. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS’09), Boston, MA, (April 2009)
Sourdis, I., Pnevmatikatos, D.: Fast, large-scale string match for a 10 Gbps FPGA-based network intrusion. In: 13th Conference on Field Programmable Logic and Applications (FPL’03) (2003)
Sourdis, I., Pnevmatikatos, D.: Pre-decoded CAMs for efficient and high-speed NIDS pattern matching (2004)
Suresh, D.C., Guo, Z., Buyukkurt, B., Najjar, W.A.: Automatic compilation framework for bloom filter based intrusion detection. In: Second International Workshop on Reconfigurable Computing: Architectures and Applications (ARC’06), pp. 413–418 (2006)
Thurston, A.D.: Parsing computer languages with an automaton compiled from a single regular expression. In: Ibarra, O.H., Yen, H.-C. (eds.) CIAA, vol. 4094 of Lecture Notes in Computer Science, pp. 285–286. Springer (2006)
van Lunteren, J.: High-performance pattern-matching for intrusion detection. In: 25th IEEE International Conference on Computer Communications (INFOCOM 2006), pp. 1–13 (2006)
Vasiliadis, G., Antonatos, S., Polychronakis, M., Markatos, E.P., Ioannidis, S.: Gnort: High performance network intrusion detection using graphics processors. In: 11th International Symposium on Recent Advances in Intrusion Detection (RAID’08), vol. 5230 of Lecture Notes in Computer Science, pp. 116–134. Springer, Cambridge, MA (2008)
Villa, O., Chavarria, D., Maschhoff, K.: Input-independent, scalable and fast string matching on the Cray XMT. In: 23nd IEEE International Parallel & Distributed Processing Symposium (IPDPS’09) (2009)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Scarpazza, D.P. Top-Performance Tokenization and Small-Ruleset Regular Expression Matching. Int J Parallel Prog 39, 3–32 (2011). https://doi.org/10.1007/s10766-010-0147-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-010-0147-0