skip to main content
10.1145/3018743.3018760acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
research-article
Public Access

Combining SIMD and Many/Multi-core Parallelism for Finite State Machines with Enumerative Speculation

Published:26 January 2017Publication History

ABSTRACT

Finite State Machine (FSM) is the key kernel behind many popular applications, including regular expression matching, text tokenization, and Huffman decoding. Parallelizing FSMs is extremely difficult because of the strong dependencies and unpredictable memory accesses. Previous efforts have largely focused on multi-core parallelization, and used different approaches, including {\em speculative} and {\em enumerative} execution, both of which have been effective but also have limitations. With increasing width and improving flexibility in SIMD instruction sets, this paper focuses on combining SIMD and multi/many-core parallelism for FSMs. We have developed a novel strategy, called {\em enumerative speculation}. Instead of speculating on a single state as in speculative execution or enumerating all possible states as in enumerative execution, our strategy speculates transitions from several possible states, reducing the prediction overheads of speculation approach and the large amount of redundant work in the enumerative approach. A simple lookback approach produces a set of guessed states to achieve high speculation success rates in our enumerative speculation. We evaluate our method with four popular FSM applications: Huffman decoding, regular expression matching, HTML tokenization, and Div7. We obtain up to 2.5x speedup using SIMD on one core and up to 95x combining SIMD with 60 cores of an Intel Xeon Phi. On a single core, we outperform the best single-state speculative execution version by an average of 1.6x, and in combining SIMD and many-core parallelism, outperform enumerative execution by an average of 2x.

References

  1. K. Asanovic, R. Bodik, J. Demmel, T. Keaveny, K. Keutzer, J. Kubiatowicz, N. Morgan, D. Patterson, K. Sen, J. Wawrzynek, D. Wessel, and K. Yelick. A view of the parallel computing landscape. Commun. ACM, 52(10):56--67, Oct. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. G. E. Blelloch. Prefix Sums and Their Applications. Technical Report Carnegie Mellon University-CS-90--190, School of Computer Science, Carnegie Mellon University, Nov. 1990.Google ScholarGoogle Scholar
  3. L. Chen, P. Jiang, and G. Agrawal. Exploiting Recent SIMD Architectural Advances for Irregular Applications. In Proceedings of the 2016 International Symposium on Code Generation and Optimization, CGO 2016, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. F. Franchetti and M. Puschel. A SIMD Vectorizing Compiler for Digital Signal Processing Algorithms. In Proceedings of the 2002 IEEE International Parallel and Distributed Processing Symposium (IPDPS 2002), pages 20 --26. IEEE, 2002. Google ScholarGoogle ScholarCross RefCross Ref
  5. C. Garca, R. Lario, M. Prieto, L. Piuel, and F. Tirado. Vectorization of Multigrid Codes Using SIMD ISA Extensions. Proceedings of the 2003 IEEE Inernational Parallel and Distributed Processing Symposium (IPDPS 2003), 0:58a, 2003.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. W. D. Hillis and G. L. Steele, Jr. Data Parallel Algorithms. Commun. ACM, 29(12), Dec. 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Holub and S.vStekr. On Parallel Implementations of Deterministic Finite Automata. In Proceedings of the 14th International Conference on Implementation and Application of Automata, CIAA '09, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. N. Ide, M. Hirano, Y. Endo, S. Yoshioka, H. Murakami, A. Kunimatsu, T. Sato, T. Kamei, T. Okada, and M. Suzuoki. 2.44-GFLOPS 300-MHz Floating-Point Vector-Processing Unit for High-Performance 3D Graphics Computing. IEEE Journal of Solid-State Circuits, 35(7):1025 --1033, Jul 2000. Google ScholarGoogle ScholarCross RefCross Ref
  9. P. Jiang, L. Chen, and G. Agrawal. Reusing data reorganization for efficient simd parallelization of adaptive irregular applications. In Proceedings of the 2016 International Conference on Supercomputing, ICS '16, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. F. Khorasani, K. Vora, R. Gupta, and L. N. Bhuyan. CuSha: Vertex-centric Graph Processing on GPUs. In Proceedings of the 23rd International Symposium on High-performance Parallel and Distributed Computing, HPDC '14, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Kim and H. Han. Efficient simd code generation for irregular kernels. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '12, pages 55--64, New York, NY, USA, 2012. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. T. Klein and Y. Wiseman. Parallel Huffman Decoding with Applications to JPEG Files. THE COMPUTER JOURNAL, 46:487--497, 2003. Google ScholarGoogle ScholarCross RefCross Ref
  13. R. E. Ladner and M. J. Fischer. Parallel prefix computation. J. ACM, 27(4), Oct. 1980. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. X. Liu, M. Smelyanskiy, E. Chow, and P. Dubey. Efficient Sparse Matrix-vector Multiplication on x86-based Many-core Processors. ICS '13, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. D. Luchaup, R. Smith, C. Estan, and S. Jha. Speculative Parallel Pattern Matching. IEEE Transactions on Information Forensics and Security, 6(2):438--451, June 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. Merrill, M. Garland, and A. Grimshaw. Scalable gpu graph traversal. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '12, pages 117--128, New York, NY, USA, 2012. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. T. Mytkowicz, M. Musuvathi, and W. Schulte. Data-parallel finite-state machines. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '14, pages 529--542, New York, NY, USA, 2014. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Y. Pan, Y. Zhang, and K. Chiu. Simultaneous Transducers for Data-Parallel XML Parsing. In Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE International Symposium on, pages 1--12, 2008.Google ScholarGoogle Scholar
  19. S. J. Pennycook, C. J. Hughes, M. Smelyanskiy, and S. A. Jarvis. Exploring simd for molecular dynamics, using intel xeon processors and intel xeon phi coprocessors. In Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing, IPDPS '13, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. P. Prabhu, G. Ramalingam, and K. Vaswani. Safe Programmable Speculative Parallelism. In Proceedings of Programming Language Design and Implementation (PLDI). Association for Computing Machinery, Inc., June 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. Qiu, Z. Zhao, and B. Ren. Microspec: Speculation-centric fine-grained parallelization for fsm computations. In Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, PACT '16, pages 221--233, New York, NY, USA, 2016. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. T. Rognes and E. Seeberg. Six-Fold Speed-up of Smithwaterman Sequence Database Searches Using Parallel Processing on Common Microprocessors. 16(8):699--706, 2000.Google ScholarGoogle Scholar
  23. E. Saule and U. V. Catalyurek. An early evaluation of the scalability of graph algorithms on the intel mic architecture. In Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum, IPDPSW '12, pages 1629--1639, Washington, DC, USA, 2012. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. W. T. Tang, R. Zhao, M. Lu, Y. Liang, H. P. Huynh, X. Li, and R. S. M. Goh. Optimizing and Auto-tuning Scale-free Sparse Matrix-vector Multiplication on Intel Xeon Phi. In Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO '15, 2015. Google ScholarGoogle ScholarCross RefCross Ref
  25. Z. Zhao and X. Shen. On-the-Fly Principled Speculation for FSM Parallelization. SIGARCH Comput. Archit. News, 43(1), Mar. 2015.Google ScholarGoogle Scholar
  26. Z. Zhao, B. Wu, and X. Shen. Challenging the "Embarrassingly Sequential": Parallelizing Finite State Machine-based Computations Through Principled Speculation. SIGPLAN Not., 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Combining SIMD and Many/Multi-core Parallelism for Finite State Machines with Enumerative Speculation

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          PPoPP '17: Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
          January 2017
          476 pages
          ISBN:9781450344937
          DOI:10.1145/3018743

          Copyright © 2017 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 26 January 2017

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          PPoPP '17 Paper Acceptance Rate29of132submissions,22%Overall Acceptance Rate230of1,014submissions,23%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader