skip to main content
10.1145/2554688.2554771acmconferencesArticle/Chapter ViewAbstractPublication PagesfpgaConference Proceedingsconference-collections
research-article

Combining computation and communication optimizations in system synthesis for streaming applications

Published: 26 February 2014 Publication History

Abstract

Data streaming is a widely-used technique to exploit task-level parallelism in many application domains such as video processing, signal processing and wireless communication. In this paper we propose an efficient system-level synthesis flow to map streaming applications onto FPGAs with consideration of simultaneous computation and communication optimizations. The throughput of a streaming system is significantly impacted by not only the performance and number of replicas of the computation kernels, but also the buffer size allocated for the communications between kernels. In general, module selection/replication and buffer size optimization were addressed separately in previous work. Our approach combines these optimizations together in system scheduling which minimizes the area cost for both logic and memory under the required throughput constraint. We first propose an integer linear program (ILP) based solution to the combined problem which has the optimal quality of results. Then we propose an iterative algorithm which can achieve the near-optimal quality of results but has a significant improvement on the algorithm scalability for large and complex designs. The key contribution is that we have a polynomial-time algorithm for an exact schedulability checking problem and a polynomial-time algorithm to improve the system performance with better module implementation and buffer size optimization. Experimental results show that compared to the separate scheme of module select/replication and buffer size optimization, the combined optimization scheme can gain 62% area saving on average under the same performance requirements. Moreover, our heuristic can achieve 2 to 3 orders of magnitude of speed-up in runtime, with less than 10% area overhead compared to the optimal solution by ILP.

References

[1]
E. Lee and D. Messerschmitt, "Synchronous data flow," Proceedings of the IEEE, vol. 75, no. 9, pp. 1235--1245, 1987.
[2]
M. I. Gordon, W. Thies, and S. Amarasinghe, "Exploiting coarse-grained task, data, and pipeline parallelism in stream programs," SIGOPS, vol. 40, pp. 151--162, 2006.
[3]
M. I. Gordon et al., "A stream compiler for communication-exposed architectures," SIGARCH, 2002.
[4]
M. Kudlur and S. Mahlke, "Orchestrating the execution of stream programs on multicore platforms," SIGPLAN, vol. 43, pp. 114--124, 2008.
[5]
A. Hormati et al., "Flextream: Adaptive compilation of streaming applications for heterogeneous architectures," in PACT, 2009, pp. 214--223.
[6]
S. Liao et al., "Data and computation transformations for Brook streaming applications on multiprocessors," in CGO, 2006.
[7]
A. Udupa, R. Govindarajan, and M. J. Thazhuthaveetil, "Synergistic execution of stream programs on multicores with accelerators," SIGPLAN, 2009.
[8]
J. Zhai, M. Bamakhrama, and T. Stefanov, "Exploiting just-enough parallelism when mapping streaming applications in hard real-time systems," in DAC'13, 2013, pp. 1--8.
[9]
A. Hagiescu et al., "A computing origami: Folding streams in FPGAs," in DAC, 2009, pp. 282 --287.
[10]
M. Ishikawa and G. De Micheli, "A module selection algorithm for high-level synthesis," in ISCAS, 1991.
[11]
I. Ahmad, M. Dhodhi, and C. Chen, "Integrated scheduling, allocation and module selection for design-space exploration in high-level synthesis," CDT, 1995.
[12]
K. Ito, L. Lucke, and K. Parhi, "ILP-based cost-optimal DSP synthesis with module selection and data format conversion," VLSI, 1998.
[13]
W. Sun, M. J. Wirthlin, and S. Neuendorffer, "FPGA pipeline synthesis design exploration using module selection and resource sharing," TCAD, 2007.
[14]
D. Chen, J. Cong, and J. Xu, "Optimal module and voltage assignment for low-power," in ASP-DAC, 2005.
[15]
H. Javaid et al., "Optimal synthesis of latency and throughput constrained pipelined mpsocs targeting streaming applications," in CODES-ISSS'10, 2010, pp. 75--84.
[16]
J. Cong et al., "Combining module selection and replication for throughput-driven streaming programs," in DATE, 2012.
[17]
S. Bhattacharyya, P. Murthy, and E. Lee, "Software synthesis from dataflow graphs," in Kluwer, 1996.
[18]
S. Stuijk, M. Geilen, and T. Basten, "Exploring trade-offs in buffer requirements and throughput constraints for synchronous dataflow graphs," in DAC, 2006.
[19]
R. Govindarajan, G. R. Gao, and P. Desai, "Minimizing buffer requirements under rate-optimal schedule in regular dataflow networks," Journal of VLSI Signal Processing, 1994.
[20]
Q. Ning and G. R. Gao, "A novel framework of register allocation for software pipelining," in Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, ser. POPL. ACM, 1993.
[21]
M. Geilen, T. Basten, and S. Stuijk, "Minimising buffer requirements of synchronous dataflow graphs with model checking," in DAC, 2005.
[22]
W. Liu et al., "An efficient technique for analysis of minimal buffer requirements of synchronous dataflow graphs with model checking," in CODES-ISSS'09, 2009, pp. 61--70.
[23]
Y. Chen and H. Zhou, "Buffer minimization in pipeline SDF scheduling on multi-core platforms," in ASP-DAC, 2012.
[24]
B. Kienhuis, E. Deprettere, K. Vissers, and P. van der Wolf, "An approach for quantitative analysis of application-specific dataflow architectures," 1997.
[25]
Q. Liu, G. Constantinides, K. Masselos, and P. Cheung, "Combining data reuse with data-level parallelization for fpga-targeted hardware compilation: A geometric programming framework," Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, 2009.
[26]
E. A. Lee and D. G. Messerschmitt, "Static scheduling of synchronous data flow programs for digital signal processing," Computers, IEEE Transactions on, 1987.
[27]
S. Sriram and S. Bhattacharyya, "Embedded multiprocessors scheduling and synchronoization," in Marcel Dekker, Inc., New York, 2000.
[28]
J. Cong and Z. Zhang, "An efficient and versatile scheduling algorithm based on sdc formulation," in DAC'06.
[29]
K. Singh et al., "Timing optimization of combinational logic," in ICCAD'88, nov 1988, pp. 282--285.
[30]
D. Koch and J. Torresen, "FPGASort: a high performance sorting architecture exploiting run-time reconfiguration on fpgas for large problem sorting," in FPGA'11.
[31]
"GNU Linear Programming Kit," http://www.gnu.org/software/glpk.

Cited By

View all
  • (2023)Streaming Task Graph Scheduling for Dataflow ArchitecturesProceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3588195.3592999(225-237)Online publication date: 7-Aug-2023
  • (2023)Callipepla: Stream Centric Instruction Set and Mixed Precision for Accelerating Conjugate Gradient SolverProceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3543622.3573182(247-258)Online publication date: 12-Feb-2023
  • (2023)An Efficient Scheduling Algorithm for Stream Computing2023 IEEE 15th International Conference on ASIC (ASICON)10.1109/ASICON58565.2023.10396024(1-4)Online publication date: 24-Oct-2023
  • Show More Cited By

Index Terms

  1. Combining computation and communication optimizations in system synthesis for streaming applications

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      FPGA '14: Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays
      February 2014
      272 pages
      ISBN:9781450326711
      DOI:10.1145/2554688
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 26 February 2014

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. buffer size optimization
      2. fpga
      3. module duplication
      4. module selection
      5. streaming applications
      6. system-level synthesis

      Qualifiers

      • Research-article

      Conference

      FPGA'14
      Sponsor:

      Acceptance Rates

      FPGA '14 Paper Acceptance Rate 30 of 110 submissions, 27%;
      Overall Acceptance Rate 125 of 627 submissions, 20%

      Upcoming Conference

      FPGA '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)18
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 20 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Streaming Task Graph Scheduling for Dataflow ArchitecturesProceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3588195.3592999(225-237)Online publication date: 7-Aug-2023
      • (2023)Callipepla: Stream Centric Instruction Set and Mixed Precision for Accelerating Conjugate Gradient SolverProceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3543622.3573182(247-258)Online publication date: 12-Feb-2023
      • (2023)An Efficient Scheduling Algorithm for Stream Computing2023 IEEE 15th International Conference on ASIC (ASICON)10.1109/ASICON58565.2023.10396024(1-4)Online publication date: 24-Oct-2023
      • (2022)Fast Energy-Optimal Multikernel DNN-Like Application Allocation on Multi-FPGA PlatformsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2021.307695841:4(1186-1190)Online publication date: Apr-2022
      • (2021)DYNAMAPThe 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays10.1145/3431920.3439286(183-193)Online publication date: 17-Feb-2021
      • (2021)Energy-Efficient System Design of Asymmetric Multiprocessor for Real-Time Streaming Applications2021 International Conference on Intelligent Technology and Embedded Systems (ICITES)10.1109/ICITES53477.2021.9637078(44-51)Online publication date: 31-Oct-2021
      • (2020)Power-Optimal Mapping of CNN Applications to Cloud-Based Multi-FPGA PlatformsIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2020.299828467:12(3073-3077)Online publication date: Dec-2020
      • (2020)Predictive Compositional Method to Design and Reoptimize Complex Behavioral DataflowsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2020.296644739:10(2615-2627)Online publication date: Oct-2020
      • (2019)Exact and Heuristic Allocation of Multi-kernel Applications to Multi-FPGA PlatformsProceedings of the 56th Annual Design Automation Conference 201910.1145/3316781.3317821(1-6)Online publication date: 2-Jun-2019
      • (2019)SkyCastle: A Resource-Aware Multi-Loop Scheduler for High-Level Synthesis2019 International Conference on Field-Programmable Technology (ICFPT)10.1109/ICFPT47387.2019.00013(36-44)Online publication date: Dec-2019
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media