skip to main content
10.1145/3445814.3446705acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article
Public Access

Scalable FSM parallelization via path fusion and higher-order speculation

Published: 17 April 2021 Publication History

Abstract

Finite-state machine (FSM) is a fundamental computation model used by many applications. However, FSM execution is known to be “embarrassingly sequential” due to the state dependences among transitions. Existing solutions leverage enumerative or speculative parallelization to break the dependences. However, the efficiency of both parallelization schemes highly depends on the properties of the FSM and its inputs. For those exhibiting unfavorable properties, the former suffers from the overhead of maintaining multiple execution paths, while the latter is bottlenecked by the serial reprocessing among the misspeculation cases. Either way, the FSM parallelization scalability is seriously compromised.
This work addresses the above scalability challenges with two novel techniques. First, for enumerative parallelization, it proposes path fusion. Inspired by the classic NFA to DFA conversion, it maps a vector of states in the original FSM to a new (fused) state. In this way, path fusion can reduce multiple FSM execution paths into a single path, minimizing the overhead of path maintenance. Second, for speculative parallelization, this work introduces higher-order speculation to avoid the serial reprocessing during validations. This is a generalized speculation model that allows speculated states to be validated speculatively. Finally, this work integrates different schemes of FSM parallelization into a framework—BoostFSM, which automatically selects the best based on the relevant properties of the FSM. Evaluation using real-world FSMs with diverse characteristics shows that BoostFSM can raise the average speedup from 3.1× and 15.4× of the existing speculative and enumerative parallelization schemes, respectively, to 25.8× on a 64-core machine.

References

[1]
[n.d.]. regex2dfa. https://github.com/kpdyer/regex2dfa.
[2]
Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jeffrey D. Ullman. 2006. Compilers: Principles, Techniques, and Tools (2nd Edition). Addison-Wesley Longman Publishing Co., Inc., USA. isbn:0321486811
[3]
Pritpal S Ahuja, Kevin Skadron, Margaret Martonosi, and Douglas W Clark. 1998. Multipath execution: Opportunities and limits. In Proceedings of the 12th International Conference on Supercomputing. 101?108.
[4]
Sotiris Apostolakis, Ziyang Xu, Greg Chan, Simone Campanoni, and David I August. 2020. Perspective: A sensible approach to speculative automatic parallelization. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems. 351?367.
[5]
Krste Asanovic, Ras Bodik, Bryan Christopher Catanzaro, Joseph James Gebis, Parry Husbands, Kurt Keutzer, David A Patterson, William Lester Plishker, John Shalf, Samuel Webb Williams, et al\mbox. 2006. The landscape of parallel computing research: A view from berkeley. Technical Report. Technical Report UCB/EECS-2006-183, EECS Department, University of California, Berkeley.
[6]
Matteo Avalle, Fulvio Risso, and Riccardo Sisto. 2015. Scalable algorithms for NFA multi-striding and NFA-based deep packet inspection on GPUs. IEEE/ACM Transactions on Networking 24, 3 (2015), 1704?1717.
[7]
Robert D Cameron, Thomas C Shermer, Arrvindh Shriraman, Kenneth S Herdy, Dan Lin, Benjamin R Hull, and Meng Lin. 2014. Bitwise data parallelism in regular expression matching. In 2014 23rd International Conference on Parallel Architecture and Compilation Techniques (PACT). IEEE, 139?150.
[8]
Niccolo' Cascarano, Pierluigi Rolando, Fulvio Risso, and Riccardo Sisto. 2010. iNFAnt: NFA pattern matching on GPGPU devices. ACM SIGCOMM Computer Communication Review 40, 5 (2010), 20?26.
[9]
Marcelo Cintra, Jos\'e F Martínez, and Josep Torrellas. 2000. Architectural support for scalable speculative parallelization in shared-memory multiprocessors. In Proceedings of the 27th Annual International Symposium on Computer Architecture. 13?24.
[10]
Romain E Cledat, Tushar Kumar, and Santosh Pande. 2011. Efficiently speeding up sequential computation through the N-way programming model. In Proceedings of the 2011 ACM International Conference on Object Oriented Programming Systems Languages and Applications. 537?554.
[11]
Sutapa Datta and Subhasis Mukhopadhyay. 2015. A grammar inference approach for predicting kinase specific phosphorylation sites. PloS one 10, 4 (2015), e0122294.
[12]
Yanlei Diao, Peter Fischer, Michael J Franklin, and Raymond To. 2002. Yfilter: Efficient and scalable filtering of XML documents. In Proceedings 18th International Conference on Data Engineering. IEEE, 341?342.
[13]
Chen Ding, Xipeng Shen, Kirk Kelsey, Chris Tice, Ruke Huang, and Chengliang Zhang. 2007. Software behavior oriented parallelization. In Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation. 223?234.
[14]
Yuanwei Fang, Tung T Hoang, Michela Becchi, and Andrew A Chien. 2015. Fast support for unstructured data processing: the unified automata processor. In Proceedings of the 48th International Symposium on Microarchitecture. 533?545.
[15]
Todd J Green, Gerome Miklau, Makoto Onizuka, and Dan Suciu. 2003. Processing XML streams with deterministic automata. In International Conference on Database Theory. Springer, 173?189.
[16]
W Daniel Hillis and Guy L Steele Jr. 1986. Data parallel algorithms. Commun. ACM 29, 12 (1986), 1170?1183.
[17]
David Jefferson and Peter Reiher. 1991. Supercritical speedup. ACM SIGSIM Simulation Digest 21, 3 (1991), 159?168.
[18]
David R Jefferson. 1985. Virtual time. ACM Transactions on Programming Languages and Systems (TOPLAS) 7, 3 (1985), 404?425.
[19]
Mark C Jeffrey, Suvinay Subramanian, Cong Yan, Joel Emer, and Daniel Sanchez. 2015. A scalable architecture for ordered parallelism. In Proceedings of the 48th International Symposium on Microarchitecture. 228?241.
[20]
Lin Jiang, Junqiao Qiu, and Zhijia Zhao. 2020. Scalable Structural Index Construction for JSON Analytics. Proceedings of the VLDB Endowment 14, 4 (2020), 694?707.
[21]
Lin Jiang, Xiaofan Sun, Umar Farooq, and Zhijia Zhao. 2019. Scalable Processing of Contemporary Semi-Structured Data on Commodity Parallel Processors-A Compilation-based Approach. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. 79?92.
[22]
Lin Jiang and Zhijia Zhao. 2017. Grammar-aware Parallelization for Scalable XPath Querying. In Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 371?383.
[23]
Peng Jiang and Gagan Agrawal. 2017. Combining SIMD and Many/Multi-core parallelism for finite state machines with enumerative speculation. In Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 179?191.
[24]
C. Jones, R. Liu, L. Meyerovich, K. Asanovic, and R. Bodik. 2009. Parallelizing the web browser. In HotPar.
[25]
Kirk Kelsey, Tongxin Bai, Chen Ding, and Chengliang Zhang. 2009. Fast track: A software system for speculative program optimization. In 2009 International Symposium on Code Generation and Optimization. IEEE, 157?168.
[26]
Shmuel Tomi Klein and Yair Wiseman. 2003. Parallel Huffman decoding with applications to JPEG files. Comput. J. 46, 5 (2003), 487?497.
[27]
Sailesh Kumar, Sarang Dharmapurikar, Fang Yu, Patrick Crowley, and Jonathan Turner. 2006. Algorithms to accelerate multiple regular expressions matching for deep packet inspection. In ACM SIGCOMM Computer Communication Review, Vol. 36. ACM, 339?350.
[28]
Yinan Li, Nikos R Katsipoulakis, Badrish Chandramouli, Jonathan Goldstein, and Donald Kossmann. 2017. Mison: a fast JSON parser for data analytics. Proceedings of the VLDB Endowment 10, 10 (2017), 1118?1129.
[29]
Mikko H Lipasti and John Paul Shen. 1996. Exceeding the dataflow limit via value prediction. In Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29. IEEE, 226?237.
[30]
Hongyuan Liu, Sreepathi Pai, and Adwait Jog. 2020. Why GPUs are slow at executing NFAs and how to make them faster. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems. 251?265.
[31]
Saeed Maleki, Madanlal Musuvathi, and Todd Mytkowicz. 2014. Parallelizing dynamic programming through rank convergence. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 219?232.
[32]
Mojtaba Mehrara, Jeff Hao, Po-Chun Hsu, and Scott Mahlke. 2009. Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory. In Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation. 166?176.
[33]
Todd Mytkowicz, Madanlal Musuvathi, and Wolfram Schulte. 2014. Data-parallel finite-state machines. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems. 529?542.
[34]
Marziyeh Nourian, Xiang Wang, Xiaodong Yu, Wu-chun Feng, and Michela Becchi. 2017. Demystifying automata processing: GPUs, FPGAs or Micron's AP?. In Proceedings of the International Conference on Supercomputing. 1?11.
[35]
Peter Ogden, David Thomas, and Peter Pietzuch. 2013. Scalable XML query processing using parallel pushdown transducers. Proceedings of the VLDB Endowment 6, 14 (2013), 1738?1749.
[36]
Yinfei Pan, Ying Zhang, Kenneth Chiu, and Wei Lu. 2007. Parallel XML parsing using meta-DFAs. In e-Science and Grid Computing, IEEE International Conference on. IEEE, 237?244.
[37]
Leo Porter, Bumyong Choi, and Dean M Tullsen. 2009. Mapping out a path from hardware transactional memory to speculative multithreading. In 2009 18th International Conference on Parallel Architectures and Compilation Techniques. IEEE, 313?324.
[38]
Manohar K Prabhu and Kunle Olukotun. 2003. Using thread-level speculation to simplify manual parallelization. In Proceedings of the Ninth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 1?12.
[39]
Prakash Prabhu, Ganesan Ramalingam, and Kapil Vaswani. 2010. Safe programmable speculative parallelism. In Proceedings of the 31st ACM SIGPLAN Conference on Programming Language Design and Implementation. 50?61.
[40]
Junqiao Qiu, Lin Jiang, and Zhijia Zhao. 2020. Challenging Sequential Bitstream Processing via Principled Bitwise Speculation. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems. 607?621.
[41]
Junqiao Qiu, Zhijia Zhao, and Bin Ren. 2016. MicroSpec: Speculation-centric fine-grained parallelization for FSM computations. In 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT). IEEE, 221?233.
[42]
Junqiao Qiu, Zhijia Zhao, Bo Wu, Abhinav Vishnu, and Shuaiwen Leon Song. 2017. Enabling Scalability-Sensitive Speculative Parallelization for FSM Computations. In Proceedings of the International Conference on Supercomputing (Chicago, Illinois) (ICS ?17). Association for Computing Machinery, New York, NY, USA, Article 2, 10 pages. isbn:9781450350204 https://doi.org/10.1145/3079079.3079082
[43]
Carlos García Qui\ nones, Carlos Madriles, Jes\'us S\'anchez, Pedro Marcuello, Antonio Gonz\'alez, and Dean M Tullsen. 2005. Mitosis compiler: an infrastructure for speculative threading based on pre-computation slices. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation. 269?279.
[44]
Hany E Ramadan, Christopher J Rossbach, and Emmett Witchel. 2008. Dependence-aware transactional memory for increased concurrency. In 2008 41st IEEE/ACM International Symposium on Microarchitecture. IEEE, 246?257.
[45]
Arun Raman, Hanjun Kim, Thomas R Mason, Thomas B Jablin, and David I August. 2010. Speculative parallelization using software multi-threaded transactions. In Proceedings of the fifteenth International Conference on Architectural Support for Programming Languages and Operating Systems. 65?76.
[46]
Lawrence Rauchwerger and David A Padua. 1999. The LRPD test: Speculative run-time parallelization of loops with privatization and reduction parallelization. IEEE Transactions on Parallel and Distributed Systems 10, 2 (1999), 160?180.
[47]
Veselin Raychev, Madanlal Musuvathi, and Todd Mytkowicz. 2015. Parallelizing user-defined aggregations using symbolic execution. In Proceedings of the 25th Symposium on Operating Systems Principles. 153?167.
[48]
Martin Roesch et al\mbox. 1999. Snort: Lightweight Intrusion Detection for Networks. In LISA, Vol. 99. 229?238.
[49]
Indranil Roy and Srinivas Aluru. 2014. Finding motifs in biological sequences using the Micron automata processor. In Parallel and Distributed Processing Symposium, 2014 IEEE 28th International. IEEE, 415?424.
[50]
Amir Hossein Nodehi Sabet, Junqiao Qiu, Zhijia Zhao, and Sriram Krishnamoorthy. 2020. Reliability Analysis for Unreliable FSM Computations. ACM Transactions on Architecture and Code Optimization (TACO) 17, 2 (2020), 1?23.
[51]
Elaheh Sadredini, Reza Rahimi, Marzieh Lenjani, Mircea Stan, and Kevin Skadron. 2020. FlexAmata: A universal and efficient adaption of applications to spatial automata processing accelerators. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems. 219?234.
[52]
Priti Shankar, Amitava Dasgupta, Kaustubh Deshmukh, and B Sundar Rajan. 2003. On viewing block codes as finite automata. Theoretical Computer Science 290, 3 (2003), 1775?1797.
[53]
Randy Smith, Cristian Estan, Somesh Jha, and Shijin Kong. 2008. Deflating the big bang: fast and scalable deep packet inspection with extended finite automata. In ACM SIGCOMM Computer Communication Review, Vol. 38. ACM, 207?218.
[54]
J Gregory Steffan and Todd C Mowry. 1998. The potential for using thread-level data speculation to facilitate automatic parallelization. In Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture. IEEE, 2?13.
[55]
Arun Subramaniyan and Reetuparna Das. 2017. Parallel automata processor. In 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA). IEEE, 600?612.
[56]
Arun Subramaniyan, Jingcheng Wang, Ezhil RM Balasubramanian, David Blaauw, Dennis Sylvester, and Reetuparna Das. 2017. Cache automaton. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture. 259?272.
[57]
Chen Tian, Min Feng, and Rajiv Gupta. 2010. Speculative parallelization using state separation and multiple value prediction. In Proceedings of the 2010 International Symposium on Memory Management. 63?72.
[58]
Chen Tian, Min Feng, Vijay Nagarajan, and Rajiv Gupta. 2008. Copy or discard execution model for speculative parallelization on multicores. In 2008 41st IEEE/ACM International Symposium on Microarchitecture. IEEE, 330?341.
[59]
Giorgos Vasiliadis, Spiros Antonatos, Michalis Polychronakis, Evangelos P Markatos, and Sotiris Ioannidis. 2008. Gnort: High performance network intrusion detection using graphics processors. In International Workshop on Recent Advances in Intrusion Detection. Springer, 116?134.
[60]
Steven Wallace, Brad Calder, and Dean M Tullsen. 1998. Threaded multiple path execution. In Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No. 98CB36235). IEEE, 238?249.
[61]
Ke Wang, Kevin Angstadt, Chunkun Bo, Nathan Brunelle, Elaheh Sadredini, Tommy Tracy, Jack Wadden, Mircea Stan, and Kevin Skadron. 2016. An overview of Micron's automata processor. In Proceedings of the Eleventh IEEE/ACM/IFIP International Conference on Hardware/Software Co-design and System Synthesis. 1?3.
[62]
Ke Wang, Yanjun Qi, Jeffrey J Fox, Mircea R Stan, and Kevin Skadron. 2015. Association rule mining with the Micron Automata Processor. In Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International. IEEE, 689?699.
[63]
Yang Xia, Peng Jiang, and Gagan Agrawal. 2020. Scaling out speculative execution of finite-state machines with parallel merge. In Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 160?172.
[64]
Fang Yu, Zhifeng Chen, Yanlei Diao, TV Lakshman, and Randy H Katz. 2006. Fast and memory-efficient regular expression matching for deep packet inspection. In Proceedings of the 2006 ACM/IEEE Symposium on Architecture for Networking and Communications Systems. ACM, 93?102.
[65]
Xiaodong Yu and Michela Becchi. 2013. GPU acceleration of regular expression matching for large datasets: exploring the implementation space. In Proceedings of the ACM International Conference on Computing Frontiers. 1?10.
[66]
Zhijia Zhao and Xipeng Shen. 2015. On-the-Fly Principled Speculation for FSM Parallelization. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems (Istanbul, Turkey) (ASPLOS ?15). Association for Computing Machinery, New York, NY, USA, 619?630. isbn:9781450328357 https://doi.org/10.1145/2694344.2694369
[67]
Zhijia Zhao, Bo Wu, and Xipeng Shen. 2014. Challenging the ?Embarrassingly Sequential?: Parallelizing Finite State Machine-Based Computations through Principled Speculation. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (Salt Lake City, Utah, USA) (ASPLOS ?14). Association for Computing Machinery, New York, NY, USA, 543?558. isbn:9781450323055 https://doi.org/10.1145/2541940.2541989
[68]
Craig Zilles and Gurindar Sohi. 2002. Master/slave speculative parallelization. In 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002.(MICRO-35). Proceedings. IEEE, 85?96.
[69]
Yuan Zu, Ming Yang, Zhonghu Xu, Lin Wang, Xin Tian, Kunyang Peng, and Qunfeng Dong. 2012. GPU-based NFA implementation for memory efficient high speed regular expression matching. In Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming. 129?140.

Cited By

View all
  • (2025)Combining MLIR Dialects with Domain-Specific Architecture for Efficient Regular Expression MatchingProceedings of the 23rd ACM/IEEE International Symposium on Code Generation and Optimization10.1145/3696443.3708916(255-270)Online publication date: 1-Mar-2025
  • (2024)One Automaton to Rule Them All: Beyond Multiple Regular Expressions ExecutionProceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO57630.2024.10444810(193-206)Online publication date: 2-Mar-2024
  • (2023)Search-Based Regular Expression Inference on a GPUProceedings of the ACM on Programming Languages10.1145/35912747:PLDI(1317-1339)Online publication date: 6-Jun-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ASPLOS '21: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems
April 2021
1090 pages
ISBN:9781450383172
DOI:10.1145/3445814
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 April 2021

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. FSM
  2. Finite State Machine
  3. Parallelization
  4. Scalability
  5. Speculation

Qualifiers

  • Research-article

Funding Sources

Conference

ASPLOS '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)278
  • Downloads (Last 6 weeks)30
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Combining MLIR Dialects with Domain-Specific Architecture for Efficient Regular Expression MatchingProceedings of the 23rd ACM/IEEE International Symposium on Code Generation and Optimization10.1145/3696443.3708916(255-270)Online publication date: 1-Mar-2025
  • (2024)One Automaton to Rule Them All: Beyond Multiple Regular Expressions ExecutionProceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO57630.2024.10444810(193-206)Online publication date: 2-Mar-2024
  • (2023)Search-Based Regular Expression Inference on a GPUProceedings of the ACM on Programming Languages10.1145/35912747:PLDI(1317-1339)Online publication date: 6-Jun-2023
  • (2023)Asynchronous Automata Processing on GPUsProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35794537:1(1-27)Online publication date: 2-Mar-2023
  • (2023)Parallel Pattern Matching over Brotli Compressed Network Traffic2023 IEEE 22nd International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)10.1109/TrustCom60117.2023.00079(477-484)Online publication date: 1-Nov-2023
  • (2023)YARB: a Methodology to Characterize Regular Expression Matching on Heterogeneous Systems2023 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS46773.2023.10181547(1-5)Online publication date: 21-May-2023
  • (2022)GSpecPal: Speculation-Centric Finite State Machine Parallelization on GPUs2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS53621.2022.00053(481-491)Online publication date: May-2022
  • (2022)A GPU-accelerated Data Transformation Framework Rooted in Pushdown Transducers2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC)10.1109/HiPC56025.2022.00038(215-225)Online publication date: Dec-2022

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media