skip to main content
research-article

Exploiting Stable Data Dependency in Stream Processing Acceleration on FPGAs

Published: 13 July 2017 Publication History

Abstract

With the unique feature of fine-grained parallelism, field-programmable gate arrays (FPGAs) show great potential for streaming algorithm acceleration. However, the lack of a design framework, restrictions on FPGAs, and ineffective tools impede the utilization of FPGAs in practice. In this study, we provide a design paradigm to support streaming algorithm acceleration on FPGAs. We first propose an abstract model to describe streaming algorithms with homogeneous sub-functions (HSF) and stable data dependency (SDD), which we call the HSF-SDD model. Using this model, we then develop an FPGA framework, PE-Ring, that has the advantages of (1) fully exploiting algorithm parallelism to achieve high performance, (2) leveraging block RAM to serve large scale parameters, and (3) enabling flexible parameter adjustments. Based on the proposed model and framework, we finally implement a specific converter to generate the register-transfer level representation of the PE-Ring. Experimental results show that our method outperforms ordinary FPGA design tools by one to two orders of magnitude. Experiments also demonstrate the scalability of the PE-Ring.

References

[1]
Pankaj K. Agarwal, Graham Cormode, Zengfeng Huang, Jeff M. Phillips, Zhewei Wei, and Ke Yi. 2013. Mergeable summaries. ACM Trans. Database Syst. 38, 4 (2013), 26.
[2]
Arvind Arasu, Ken Eguro, Manas Joglekar, Raghav Kaushik, Donald Kossmann, and Ravi Ramamurthy. 2015. Transaction processing on confidential data using cipherbase. In Proceedings of the 2015 IEEE 31st International Conference on Data Engineering. IEEE, 435--446.
[3]
Subi Arumugam, Alin Dobra, Christopher M. Jermaine, Niketan Pansare, and Luis Perez. 2010. The DataPath system: A data-centric analytic processing engine for large data warehouses. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. ACM, 519--530.
[4]
Jeff A. Bilmes et al. 1998. A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. Int. Comput. Sci. Inst. 4, 510 (1998), 126.
[5]
Hui Ding, Goce Trajcevski, Peter Scheuermann, Xiaoyue Wang, and Eamonn Keogh. 2008. Querying and mining of time series data: Experimental comparison of representations and distance measures. Proc. VLDB Endow. 1, 2 (2008), 1542--1552.
[6]
Ce Guo, Haohuan Fu, and Wayne Luk. 2012. A fully-pipelined expectation-maximization engine for Gaussian mixture models. In Proceedings of the 2012 International Conference on Field-Programmable Technology (FPT’12). IEEE, 182--189.
[7]
Informix. 2015. Informix-subsequence similarity search. Retrieved October 25, 2016 from https://crl.ptopenlab.com:8800/accelerator/accelerator/4/.
[8]
Changhoon Kim, Matthew Caesar, Alexandre Gerber, and Jennifer Rexford. 2009. Revisiting route caching: The world should be flat. In Proceedings of the International Conference on Passive and Active Network Measurement. Springer, 3--12.
[9]
NSL Phani Kumar, Sanjiv Satoor, and Ian Buck. 2009. Fast parallel expectation maximization for gaussian mixture models on GPUs using CUDA. In Proceedings of the 11th IEEE International Conference on High Performance Computing and Communications, 2009 (HPCC’09). IEEE, 103--109.
[10]
Chris Lattner and Vikram Adve. 2004. LLVM: A compilation framework for lifelong program analysis 8 transformation. In Proceedings of the International Symposium on Code Generation and Optimization, 2004 (CGO’04). IEEE, 75--86.
[11]
Oskar Mencer. 2006. ASC: A stream compiler for computing with FPGAs. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 25, 9 (2006), 1603--1617.
[12]
Ahmed Metwally, Divyakant Agrawal, and Amr El Abbadi. 2006. An integrated efficient solution for computing frequent and top-k elements in data streams. ACM Trans. Database Syst. 31, 3 (2006), 1095--1133.
[13]
Rene Mueller, Jens Teubner, and Gustavo Alonso. 2010. Glacier: A query-to-hardware compiler. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. ACM, 1159--1162.
[14]
Rene Mueller, Jens Teubner, and Gustavo Alonso. 2012. Sorting networks on FPGAs. VLDB J. 21, 1 (2012), 1--23.
[15]
Netezza. 2011. Retrieved October 25, 2016 from http://www.ibm.com/software/data/netezza.
[16]
OpenCL. 2013. Retieved October 25, 2016 from https://www.altera.com/products/design-software/embedded-software-devel opers/opencl/overview.html.
[17]
RIFFA. 2013. http://riffa.ucsd.edu/. (2013).
[18]
Doruk Sart, Abdullah Mueen, Walid Najjar, Eamonn Keogh, and Vit Niennattrakul. 2010. Accelerating dynamic time warping subsequence search with GPUs and FPGAs. In Proceedings of the 2010 IEEE International Conference on Data Mining. IEEE, 1001--1006.
[19]
Bharat Sukhwani, Hong Min, Mathew Thoennes, Parijat Dube, Balakrishna Iyer, Bernard Brezzo, Donna Dillenberger, and Sameh Asaad. 2012. Database analytics acceleration using FPGAs. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques. ACM, 411--420.
[20]
Takashi Takenaka, Masamichi Takagi, and Hiroaki Inoue. 2012. A scalable complex event processing framework for combination of SQL-based continuous queries and C/C++ functions. In Proceedings of the 22nd International Conference on Field Programmable Logic and Applications (FPL). IEEE, 237--242.
[21]
Jens Teubner and Rene Mueller. 2011. How soccer players would do stream joins. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data. ACM, 625--636.
[22]
Jens Teubner, Rene Muller, and Gustavo Alonso. 2011. Frequent item computation on a chip. IEEE Trans. Knowl. Data Eng. 23, 8 (2011), 1169--1181.
[23]
UCRSuite. 2012. Retrieved October 25, 2016 from http://www.cs.ucr.edu/%7eeamonn/UCRsuite.html.
[24]
Vivado. 2012. Retrieved October 25, 2016 http://www.xilinx.com/products/design-tools/vivado.html.
[25]
Haixun Wang and Carlo Zaniolo. 1999. User-defined aggregates in database languages. In Proceedings of the International Symposium on Database Programming Languages. Springer, 43--60.
[26]
Zilong Wang, Sitao Huang, Lanjun Wang, Hao Li, Yu Wang, and Huazhong Yang. 2013. Accelerating subsequence similarity search based on dynamic time warping distance with FPGA. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays. ACM, 53--62.
[27]
WebDocs. 2003. October 25, 2016 from http://fimi.ua.ac.be/data/.
[28]
Xuechao Wei, Yun Liang, Tao Wang, Songwu Lu, and Jason Cong. 2017. Throughput optimization for streaming applications on CPU-FPGA heterogeneous systems. In Proceedings of the 22th Asia and South Pacific Design Automation Conference (ASP-DAC).
[29]
Louis Woods, Gustavo Alonso, and Jens Teubner. 2015. Parallelizing data processing on FPGAs with shifter lists. ACM Trans. Reconfig. Technol. Syst. 8, 2 (2015), 7.
[30]
Tian Zhang, Raghu Ramakrishnan, and Miron Livny. 1996. BIRCH: An efficient data clustering method for very large databases. In ACM Sigmod Record, Vol. 25. ACM, 103--114.
[31]
Wei Zuo, Yun Liang, Peng Li, Kyle Rupnow, Deming Chen, and Jason Cong. 2013. Improving high level synthesis optimization opportunity through polyhedral transformations. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays. ACM, 9--18.

Cited By

View all
  • (2022)Characterizing the Effect of Deadline Misses on Time-Triggered Task ChainsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.319914641:11(3957-3968)Online publication date: 1-Nov-2022

Index Terms

  1. Exploiting Stable Data Dependency in Stream Processing Acceleration on FPGAs

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Embedded Computing Systems
      ACM Transactions on Embedded Computing Systems  Volume 16, Issue 4
      Special Issue on Secure and Fault-Tolerant Embedded Computing and Regular Papers
      November 2017
      614 pages
      ISSN:1539-9087
      EISSN:1558-3465
      DOI:10.1145/3092956
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Journal Family

      Publication History

      Published: 13 July 2017
      Accepted: 01 April 2017
      Revised: 01 February 2017
      Received: 01 October 2016
      Published in TECS Volume 16, Issue 4

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Stream processing
      2. algorithm model
      3. high-level synthesis

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)2
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 20 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2022)Characterizing the Effect of Deadline Misses on Time-Triggered Task ChainsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.319914641:11(3957-3968)Online publication date: 1-Nov-2022

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media