skip to main content
10.1145/2435264.2435300acmconferencesArticle/Chapter ViewAbstractPublication PagesfpgaConference Proceedingsconference-collections
research-article

Dynafuse: dynamic dependence analysis for FPGA pipeline fusion and locality optimizations

Published: 11 February 2013 Publication History

Abstract

Although high-level synthesis improves FPGA productivity by enabling designers to use high-level code, the resulting performance is often significantly worse than register-transfer-level designs. One cause of such limited optimization is that high-level synthesis tools are restricted by multiple possible dependencies due to the undecidability of alias analysis. In this paper, we introduce the Dynafuse optimization, which analyzes dependencies dynamically to resolve aliases and enable runtime circuit optimizations. To resolve aliases, Dynafuse provides a specialized software data structure that dynamically determines definition-use chains between FPGA functions. In addition, Dynafuse statically creates a reconfigurable overlay network that uses detected dependencies to dynamically adjust connections between functions and memories in order to fuse pipelines and exploit data locality. Experimental results show that Dynafuse sped up two existing FPGA applications by 1.6-1.8x when exploiting locality and by 3-5x when fusing pipelines. Furthermore, the speedup from pipeline fusion increases linearly with the number of fused functions, which suggests larger applications will experience larger improvements.

References

[1]
U. Banerjee. An introduction to a formal theory of dependence analysis. The Journal of Supercomputing, 2:133--149, 1988.
[2]
U. Banerjee, R. Eigenmann, A. Nicolau, and D. Padua. Automatic program parallelization. Proceedings of the IEEE, 81(2):211 --243, Feb 1993.
[3]
Brisk, A.K. Verma, and P. Ienne. "Optimal polynomial-time interprocedural register allocation for high-level synthesis and ASIP design." InComputer-Aided Design, 2007. ICCAD 2007. IEEE/ACM International Conference on, pp. 172--179. IEEE, 2007.
[4]
K. Beyls and E. D'Hollander. Discovery of locality-improving refactorings by reuse path analysis. In M. Gerndt and D. Kranzlmüller, editors, High Performance Computing and Communications, volume 4208 of Lecture Notes in Computer Science, pages 220--229. Springer Berlin / Heidelberg, 2006.
[5]
A. Canis, J. Choi, M. Aldham, V. Zhang, A. Kammoona, J.H. Anderson, S. Brown, and T. Czajkowski. "LegUp: high-level synthesis for FPGA-based processor/accelerator systems." InProceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays, pp. 33--36. ACM, 2011.
[6]
J. Cong. A new generation of C-based synthesis tool and domain specific computing, in Proc. IEEE Int. SoC Conf., vol. 6507, Newport Beach, CA, USA, Sept. 2008, pp. 386--386.
[7]
J. Coole, G. Stitt. Intermediate fabrics: Virtual architectures for circuit portability and fast placement and routing, Hardware/Software Codesign and System Synthesis (CODES+ISSS), 2010 IEEE/ACM/IFIP International Conference on, vol., no., pages 13--22, 24--29 Oct. 2010
[8]
H. Devos, K. Beyls, M. Christiaens, J. Van Campenhout, E. D'Hollander, and D. Stroobandt. Finding and applying loop transformations for generating optimized fpga implementations. In P. Stenström, editor, Transactions on High-Performance Embedded Architectures and Compilers I, volume 4050 of Lecture Notes in Computer Science, pages 159--178. Springer Berlin / Heidelberg, 2007.
[9]
P. Eles, Z. Peng, K. Kuchinski, and A. Doboli, System level hardware/software partitioning based on simulated annealing and tabu search, Des. Autom. Embed. Syst., vol. 2, no. 1, pp. 5--32, Jan. 1997.
[10]
J. Fowers, G. Brown, P. Cooke, and G. Stitt, A performance and energy comparison of FPGAs, GPUs, and multicores for sliding-window applications, FPGA, pp. 47--56, 2012.
[11]
M. Fujita and H. Nakamura. The standard specc language. In Proceedings of the 14th international symposium on Systems synthesis, ISSS '01, pages 81--86, New York, NY, USA, 2001. ACM.
[12]
D. Galloway. The transmogrifier c hardware description language and compiler for fpgas. In IEEE Symposuim on FPGAs for Custom Computing Machines, pages 136--144, 1995.
[13]
M. I. Gordon, W. Thies, and S. Amarasinghe. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. SIGPLAN Not., 41(11):151--162, Oct 2006.
[14]
J. Greco, G. Cieslewski, A. Jacobs, I. Troxel, and A. George. Hardware/software interface for high-performance space computing with fpga coprocessors. In Aerospace Conference, 2006 IEEE.
[15]
S. Gupta, R. K. Gupta, N. D. Dutt, and A. Nicolau. Coordinated parallelizing compiler optimizations and high-level synthesis. ACM Trans. Des. Autom. Electron. Syst., 9(4):441--470, Oct 2004.
[16]
M. Hind, M. Burke, P. Carini, and J. Choi. "Interprocedural pointer alias analysis." ACM Transactions on Programming Languages and Systems (TOPLAS) 21, no. 4 (1999): 848--894.
[17]
B. Holland, K. Nagarajan, and A. D. George, Rat: Rc amenability test for rapid performance prediction. ACM Trans. Reconfigurable Technol. Syst., vol. 1, no. 4, pp. 1--31, 2009.
[18]
Impulse Accelerated Technologies, Impulse C, 2003. http://impulseaccelerated.com.
[19]
N. Kapre, N. Mehta, M. deLorimier, R. Rubin, H. Barnor, M. J. Wilson, M. Wrighton, and A. DeHon, Packet-switched vs. time-multiplexed FPGA overlay networks. In Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines, 2006.
[20]
D. C. Ku and G. De Micheli. Hardware c - a language for hardware design. Technical report, Defense Technical Information Center OAI-PMH Repository {http://stinet.dtic.mil/oai/oai} (United States), 1998.
[21]
D. Kulkarni,W. A. Najjar, R. Rinker, and F. J. Kurdahi, Fast area estimation to support compiler optimizations in FPGA-based reconfigurable systems, in FCCM '02: Proceedings of the 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines. Washington, DC, USA: IEEE Computer Society, 2002, p. 239.
[22]
F. Lin, X. Dong, B. M. Chen, K.-Y. Lum, and T. H. Lee, A robust realtime embedded vision system on an unmanned rotorcraft for ground target following. IEEE Transactions on Industrial Electronics, vol. 59, pp. 1038--1049, February 2012.
[23]
S. Loo, B. Wells, N. Freije, and J. Kulick. Handel-c for rapid prototyping of vlsi coprocessors for real time systems. In System Theory, 2002. Proceedings of the Thirty-Fourth Southeastern Symposium on, pages 6 -- 10, 2002.
[24]
W. Najjar, W. Bohm, B. Draper, J. Hammes, R. Rinker, J. Beveridge, M. Chawathe, and C. Ross. High-level language abstraction for reconfigurable computing. Computer, 36(8):63--69, Aug 2003.
[25]
P. Panda. Systemc - a modeling platform supporting multiple design abstractions. In System Synthesis, 2001. Proceedings. The 14th International Symposium on, pages 75 -- 80, 2001.
[26]
P. Petersen and D. Padua. Static and dynamic evaluation of data dependence analysis techniques. Parallel and Distributed Systems, IEEE Transactions on, 7(11):1121 -- 1132, Nov 1996.
[27]
W. Pfeiffer and N. J. Wright, Modeling and predicting application performance on parallel computers using hpc challenge benchmarks. In 22nd IEEE International Parallel and Distributed Processing Symposium, Hyatt Regency Hotel, Miami, FL, 2008, 2008.
[28]
W. Pugh. The omega test: a fast and practical integer programming algorithm for dependence analysis. In Proceedings of the 1991 ACM/IEEE conference on Supercomputing, Supercomputing '91, pages 4--13, New York, NY, USA, 1991. ACM.
[29]
L. Semeria and G. De Micheli. Spc: synthesis of pointers in c application of pointer analysis to the behavioral synthesis from c. In Computer-Aided Design, 1998. ICCAD 98. Digest of Technical Papers. 1998 IEEE/ACM International Conference on, pages 340--346, Nov 1998.
[30]
SRC Computers, SRC Carte Programming Environment, 2012. http://www.srccomp.com/techpubs/carte.asp.
[31]
G. Stitt, R. Lysecky, F. Vahid. Dynamic Hardware/Software Partitioning: A First Approach. Design Automation Conference (DAC), 2003.
[32]
C. Stroud, R. Munoz, and D. Pierce. Behavioral model synthesis with cones. Design Test of Computers, IEEE, 5(3):22 --30, June 1988.
[33]
F. Vahid, G. Stitt, and R. Lysecky. Warp processing: Dynamic translation of binaries to fpga circuits. Computer, 41(7):40 --46, July 2008.
[34]
J. Villarreal, A. Park, W. Najjar, and R. Halstead. Designing modular hardware accelerators in c with roccc 2.0. In Field-Programmable Custom Computing Machines, Annual IEEE Symposium on, pages 127--134, Los Alamitos, CA, USA, 2010. IEEE Computer Society.
[35]
K. Wakabayashi. C-based synthesis experiences with a behavior synthesizer, "cyber". In Design, Automation and Test in Europe Conference and Exhibition 1999. Proceedings, pages 390 --393, March 1999.
[36]
J. R. Wernsing and G. Stitt. Elastic computing: A portable optimization framework for hybrid computers. Parallel Comput., 38(8):438--464, Aug 2012.
[37]
Xilinx Inc. AutoESL high-level synthesis tool. 2011. http://www.xilinx.com/tools/autoesl.htm.
[38]
Xilinx Inc. Vivado high-level synthesis. 2012. http://www.xilinx.com/products/design-tools/vivado/integration/esl-design/hls/index.htm.

Cited By

View all
  • (2014)A framework for dynamic parallelization of FPGA-accelerated applicationsProceedings of the 17th International Workshop on Software and Compilers for Embedded Systems10.1145/2609248.2609256(1-10)Online publication date: 10-Jun-2014
  • (2014)Mission control: A performance metric and analysis of control logic for pipelined architectures on FPGAs2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14)10.1109/ReConFig.2014.7032539(1-6)Online publication date: Dec-2014
  • (2014)A High Memory Bandwidth FPGA Accelerator for Sparse Matrix-Vector Multiplication2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines10.1109/FCCM.2014.23(36-43)Online publication date: May-2014

Index Terms

  1. Dynafuse: dynamic dependence analysis for FPGA pipeline fusion and locality optimizations

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      FPGA '13: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
      February 2013
      294 pages
      ISBN:9781450318877
      DOI:10.1145/2435264
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 11 February 2013

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. fpga
      2. high-level synthesis
      3. pipeline fusion

      Qualifiers

      • Research-article

      Conference

      FPGA '13
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 125 of 627 submissions, 20%

      Upcoming Conference

      FPGA '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)10
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 18 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2014)A framework for dynamic parallelization of FPGA-accelerated applicationsProceedings of the 17th International Workshop on Software and Compilers for Embedded Systems10.1145/2609248.2609256(1-10)Online publication date: 10-Jun-2014
      • (2014)Mission control: A performance metric and analysis of control logic for pipelined architectures on FPGAs2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14)10.1109/ReConFig.2014.7032539(1-6)Online publication date: Dec-2014
      • (2014)A High Memory Bandwidth FPGA Accelerator for Sparse Matrix-Vector Multiplication2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines10.1109/FCCM.2014.23(36-43)Online publication date: May-2014

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media