Top-Performance Tokenization and Small-Ruleset Regular Expression Matching

Scarpazza, Daniele Paolo

doi:10.1007/s10766-010-0147-0

Top-Performance Tokenization and Small-Ruleset Regular Expression Matching

A Quantitative Performance Analysis and Optimization Study on the Cell/B.E. Processor

Published: 07 September 2010

Volume 39, pages 3–32, (2011)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Daniele Paolo Scarpazza^1,2

191 Accesses
4 Citations
Explore all metrics

Abstract

In the last decade, the volume of unstructured data that Internet and enterprise applications create and consume has been growing at impressive rates. The tools we use to process these data are search engines, business analytics suites, natural-language processors and XML processors. These tools rely on tokenization, a form of regular expression matching aimed at extracting words and keywords in a character stream. The further growth of unstructured data-processing paradigms depends critically on the availability of high-performance tokenizers. Despite the impressive amount of parallelism that the multi-core revolution has made available (in terms of multiple threads and wider SIMD units), most applications employ tokenizers that do not exploit this parallelism. I present a technique to design tokenizers that exploit multiple threads and wide SIMD units to process multiple independent streams of data at a high throughput. The technique benefits indefinitely from any future scaling in the number of threads or SIMD width. I show the approach’s viability by presenting a family of tokenizer kernels optimized for the Cell/B.E. processor that deliver a performance seen, so far, only on dedicated hardware. These kernels deliver a peak throughput of 14.30 Gbps per chip, and a typical throughput of 9.76 Gbps on Wikipedia input. Also, they achieve almost-ideal resource utilization (99.2%). The approach is applicable to any SIMD enabled processor and matches well the trend toward wider SIMD units in contemporary architecture design.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Abrash, M.: A first look at the Larrabee new instructions (LRBni). Dr. Dobb’s J. (2009). http://www.drdobbs.com/high-performance-computing/216402188
Aho A.V., Corasick M.J.: Efficient string matching: an aid to bibliographic search. Commun. ACM 18(6), 333–340 (1975)
Article MATH MathSciNet Google Scholar
Antonatos, S., Anagnostakis, K., Polychronakis, M., Markatos, E.: Performance analysis of content matching intrusion detection systems. In: 4th IEEE/IPSJ Symposium on Applications and the Internet (SAINT 2004) (2004)
Berk, E.: Jlex: a lexical analyzer generator for Java (2003)
Bloom B.H.: Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)
Article MATH Google Scholar
Bu L., Chandy J.A.: A CAM-based keyword match processor architecture. Microelectron. J. 37(8), 828–836 (2006)
Article Google Scholar
Bumbulis P., Cowan D.D.: RE2C: a more versatile scanner generator. ACM Lett. Program. Lang. Syst. 2(1–4), 70–84 (1993)
Article Google Scholar
Cameron, R.D.: A case study in SIMD text processing with parallel bit streams—UTF-8 to UTF-16 transcoding. In: 2008 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’08), pp. 91–98. Salt Lake City, Utah (2008)
Cameron, R.D.: Method and apparatus for processing character streams, U.S. Patent 7400271 (2008)
Cavium Networks. Cavium networks debuts NITROX DPI family of layer 7 content processors with market leading performance (press release) (2000)
Chang, C., Paige, R.: From regular expressions to DFAs using compressed NFAs. In: Apostolico, A., Crochemore, M., Galil, Z., Manber, U. (eds.) CPM ’92, Lecture Notes in Computer Science, No. 644, pp. 88–108. Springer (1992)
Degener, J.: ANSI C grammar, lex specification, http://www.lysator.liu.se/c/ANSI-C-grammar-l.html
Dharmapurikar S., Krishnamurthy P., Sproull T.S., Lockwood J.W.: Deep packet inspection using parallel bloom filters. IEEE Micro 24(1), 52–61 (2004)
Article Google Scholar
Firasta, N., Buxton, M., Jinbo, P., Nasri, K., Kuo, S.: Intel AVX: new frontiers in performance improvements and energy efficiency (2008)
The Apache Software Foundation. Lucene, http://lucene.apache.org
Goyal, N., Ormont, J., Smith, R., Sankaralingam, K., Estan, C.: Signature matching in network processing using SIMD/GPU architectures. Technical Report 1628, University of Wisconsin at Madison (2008)
Grosch J.: Efficient generation of lexical analysers. Softw. Pract. Exp. 19(11), 1089–1103 (1989)
Article Google Scholar
Gschwind M., Hofstee H.P., Flachs B., Hopkins M., Watanabe Y., Yamazaki T.: Synergistic processing in Cell’s multicore architecture. IEEE Micro 26(2), 10–24 (2006)
Article Google Scholar
Hofstee, H.P.: Efficient processor architecture and the Cell processor. In: Conference on High Performance Computing Architectures (HPCA’05) (2005)
Hutchings, B.L., Franklin, R., Carver, D.: Assisting network intrusion detection with reconfigurable hardware. In: 10th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’02), p. 111. IEEE Computer Society, Washington, DC, USA (2002)
IDC Corporation. The expanding digital universe. White Pap. (2007)
Intel Corp. Tera-scale research prototype—connecting 80 simple cores on a single test chip (2006)
Intel Corp. Intel SSE4 programming reference, reference number: D91561-001 (2007)
Intel Corp. Single-chip cloud computer; techresearch.intel.com/articles/tera-scale/1826.htm (2009)
Iorio, F., van Lunteren, J.: Fast pattern matching on the Cell broadband engine. In: 2008 Workshop on Cell Systems and Applications (WCSA), affiliated with the 2008 International Symposium on Computer Architecture (ISCA’08), Beijing, China (2008)
Kahle, J.A., Day, M.N., Hofstee, H.P., Johns, C.R., Maeurer, T.R., Shippy D.: Introduction to the Cell multiprocessor. IBM J. Res. Dev. 589–604 (2005)
Kistler, M., Perrone, M., Petrini, F.: Cell processor interconnection network: built for speed. IEEE Micro, 25(3) (2006)
Lee, J., Hwang, S.H., Park, N., Lee, S.-W., Jun, S., Kim, Y.S.: A high performance NIDS using FPGA-based regular expression matching. In: 2007 ACM Symposium on Applied Computing (SAC ’07), pp. 1187–1191. ACM (2007)
Lee, T.-H.: Generalized Aho-Corasick algorithm for signature based anti-virus applications. In: 16th IEEE Internatinal Conference on Computer Communications and Networks (ICCCN’07), pp. 792–797. Honolulu, Hawaii, USA (2007)
Lockwood, J.W., Naufel, N., Turner, J.S., Taylor, D.E.: Reprogrammable network packet processing on the field programmable port extender (FPX). In: ACM International Symposium on Field Programmable Gate Arrays (FPGA 2001), pp. 87–93 (2001)
Moscola, J., Lockwood, J., Loui, R., Pachos, M.: Implementation of a content-scanning module for an internet firewall. In: IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM), pp. 31–38. Napa, CA, USA, (2003)
Nickolls J., Buck I., Garland M., Skadron K.: Scalable parallel programming with CUDA. ACM Queue 6(2), 40–53 (2008)
Article Google Scholar
Nicola, M., John, J.: XML parsing: a threat to database performance. In: Conference on Information and Knowledge Management (CIKM). ACM (2003)
Paxson, V.: Flex—a fast lexical analyzer generator (1988)
Perkins, E., Kostoulas, M., Heifets, A., Matsa, M., Mendelsohn, N.: Performance analysis of XML APIs. In: XML 2005 Conference and Exposition (2005)
RMI—Raza Microelectronics, Inc. XLR700 processor series, next generation multiprocessing, product brief (2008)
Scarpazza, D.P.: Is Larrabee for the rest of us? Dr. Dobb’s J. (2009). http://www.drdobbs.com/high-performance-computing/221601028
Scarpazza, D.P., Braudaway, G.W.: Workload characterization and optimization of high-performance text indexing on the Cell processor. In: IEEE International Symposium on Workload Characterization (IISWC’09), Austin, TX, USA (2009)
Scarpazza, D.P., Villa, O., Petrini, F.: Peak-performance DFA-based string matching on the Cell processor. In: Third Intl. Workshop on System Management Techniques, Processes, and Services (SMTPS), held in conjunction with IPDPS (2007)
Scarpazza, D.P., Villa, O., Petrini, F.: High-speed string searching against large dictionaries on the Cell/B.E. processor. In: 22nd IEEE International Parallel & Distributed Processing Symposium (IPDPS’08), Miami, FL (2008)
Seiler, L., Carmean, D., Sprangle, E., Forsyth, T., Abrash, M., Dubey, P., Junkins, S., Lake, A., Sugerman, J., Cavin, R., Espasa, R., Grochowski, E., Juan, T., Hanrahan, P.: Larrabee: a many-core x86 architecture for visual computing. In: ACM SIGGRAPH 2008, pp. 1–15. ACM, New York, NY, USA (2008)
Smith, R., Goyal, N., Ormont, J., Sankaralingam, K., Estan, C.: Evaluating GPUs for Network Packet Signature Matching. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS’09), Boston, MA, (April 2009)
Sourdis, I., Pnevmatikatos, D.: Fast, large-scale string match for a 10 Gbps FPGA-based network intrusion. In: 13th Conference on Field Programmable Logic and Applications (FPL’03) (2003)
Sourdis, I., Pnevmatikatos, D.: Pre-decoded CAMs for efficient and high-speed NIDS pattern matching (2004)
Suresh, D.C., Guo, Z., Buyukkurt, B., Najjar, W.A.: Automatic compilation framework for bloom filter based intrusion detection. In: Second International Workshop on Reconfigurable Computing: Architectures and Applications (ARC’06), pp. 413–418 (2006)
Thurston, A.D.: Parsing computer languages with an automaton compiled from a single regular expression. In: Ibarra, O.H., Yen, H.-C. (eds.) CIAA, vol. 4094 of Lecture Notes in Computer Science, pp. 285–286. Springer (2006)
van Lunteren, J.: High-performance pattern-matching for intrusion detection. In: 25th IEEE International Conference on Computer Communications (INFOCOM 2006), pp. 1–13 (2006)
Vasiliadis, G., Antonatos, S., Polychronakis, M., Markatos, E.P., Ioannidis, S.: Gnort: High performance network intrusion detection using graphics processors. In: 11th International Symposium on Recent Advances in Intrusion Detection (RAID’08), vol. 5230 of Lecture Notes in Computer Science, pp. 116–134. Springer, Cambridge, MA (2008)
Villa, O., Chavarria, D., Maschhoff, K.: Input-independent, scalable and fast string matching on the Cray XMT. In: 23nd IEEE International Parallel & Distributed Processing Symposium (IPDPS’09) (2009)

Download references

Author information

Authors and Affiliations

Business Analytics and Math Department, IBM T. J. Watson Research Center, Yorktown Heights, NY, 10598, USA
Daniele Paolo Scarpazza
D. E. Shaw Research, 120 West 45th street, 39th Floor, New York, NY 10036, USA
Daniele Paolo Scarpazza

Authors

Daniele Paolo Scarpazza
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Daniele Paolo Scarpazza.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Scarpazza, D.P. Top-Performance Tokenization and Small-Ruleset Regular Expression Matching. Int J Parallel Prog 39, 3–32 (2011). https://doi.org/10.1007/s10766-010-0147-0

Download citation

Received: 13 January 2010
Accepted: 06 July 2010
Published: 07 September 2010
Issue Date: February 2011
DOI: https://doi.org/10.1007/s10766-010-0147-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Top-Performance Tokenization and Small-Ruleset Regular Expression Matching

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Performance Analysis of Existing SIMD Architectures

VIP: A SIMD vectorized analytical query engine

Optimizing Packed String Matching on AVX2 Platform

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Top-Performance Tokenization and Small-Ruleset Regular Expression Matching

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Performance Analysis of Existing SIMD Architectures

VIP: A SIMD vectorized analytical query engine

Optimizing Packed String Matching on AVX2 Platform

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now