research-article

Dynafuse: dynamic dependence analysis for FPGA pipeline fusion and locality optimizations

Authors:

Greg StittAuthors Info & Claims

FPGA '13: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays

Pages 201 - 210

https://doi.org/10.1145/2435264.2435300

Published: 11 February 2013 Publication History

Abstract

Although high-level synthesis improves FPGA productivity by enabling designers to use high-level code, the resulting performance is often significantly worse than register-transfer-level designs. One cause of such limited optimization is that high-level synthesis tools are restricted by multiple possible dependencies due to the undecidability of alias analysis. In this paper, we introduce the Dynafuse optimization, which analyzes dependencies dynamically to resolve aliases and enable runtime circuit optimizations. To resolve aliases, Dynafuse provides a specialized software data structure that dynamically determines definition-use chains between FPGA functions. In addition, Dynafuse statically creates a reconfigurable overlay network that uses detected dependencies to dynamically adjust connections between functions and memories in order to fuse pipelines and exploit data locality. Experimental results show that Dynafuse sped up two existing FPGA applications by 1.6-1.8x when exploiting locality and by 3-5x when fusing pipelines. Furthermore, the speedup from pipeline fusion increases linearly with the number of fused functions, which suggests larger applications will experience larger improvements.

References

[1]

U. Banerjee. An introduction to a formal theory of dependence analysis. The Journal of Supercomputing, 2:133--149, 1988.

[2]

U. Banerjee, R. Eigenmann, A. Nicolau, and D. Padua. Automatic program parallelization. Proceedings of the IEEE, 81(2):211 --243, Feb 1993.

[3]

Brisk, A.K. Verma, and P. Ienne. "Optimal polynomial-time interprocedural register allocation for high-level synthesis and ASIP design." InComputer-Aided Design, 2007. ICCAD 2007. IEEE/ACM International Conference on, pp. 172--179. IEEE, 2007.

Digital Library

[4]

K. Beyls and E. D'Hollander. Discovery of locality-improving refactorings by reuse path analysis. In M. Gerndt and D. Kranzlmüller, editors, High Performance Computing and Communications, volume 4208 of Lecture Notes in Computer Science, pages 220--229. Springer Berlin / Heidelberg, 2006.

Digital Library

[5]

A. Canis, J. Choi, M. Aldham, V. Zhang, A. Kammoona, J.H. Anderson, S. Brown, and T. Czajkowski. "LegUp: high-level synthesis for FPGA-based processor/accelerator systems." InProceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays, pp. 33--36. ACM, 2011.

Digital Library

[6]

J. Cong. A new generation of C-based synthesis tool and domain specific computing, in Proc. IEEE Int. SoC Conf., vol. 6507, Newport Beach, CA, USA, Sept. 2008, pp. 386--386.

[7]

J. Coole, G. Stitt. Intermediate fabrics: Virtual architectures for circuit portability and fast placement and routing, Hardware/Software Codesign and System Synthesis (CODES+ISSS), 2010 IEEE/ACM/IFIP International Conference on, vol., no., pages 13--22, 24--29 Oct. 2010

Digital Library

[8]

H. Devos, K. Beyls, M. Christiaens, J. Van Campenhout, E. D'Hollander, and D. Stroobandt. Finding and applying loop transformations for generating optimized fpga implementations. In P. Stenström, editor, Transactions on High-Performance Embedded Architectures and Compilers I, volume 4050 of Lecture Notes in Computer Science, pages 159--178. Springer Berlin / Heidelberg, 2007.

Digital Library

[9]

P. Eles, Z. Peng, K. Kuchinski, and A. Doboli, System level hardware/software partitioning based on simulated annealing and tabu search, Des. Autom. Embed. Syst., vol. 2, no. 1, pp. 5--32, Jan. 1997.

Digital Library

[10]

J. Fowers, G. Brown, P. Cooke, and G. Stitt, A performance and energy comparison of FPGAs, GPUs, and multicores for sliding-window applications, FPGA, pp. 47--56, 2012.

Digital Library

[11]

M. Fujita and H. Nakamura. The standard specc language. In Proceedings of the 14th international symposium on Systems synthesis, ISSS '01, pages 81--86, New York, NY, USA, 2001. ACM.

Digital Library

[12]

D. Galloway. The transmogrifier c hardware description language and compiler for fpgas. In IEEE Symposuim on FPGAs for Custom Computing Machines, pages 136--144, 1995.

Digital Library

[13]

M. I. Gordon, W. Thies, and S. Amarasinghe. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. SIGPLAN Not., 41(11):151--162, Oct 2006.

Digital Library

[14]

J. Greco, G. Cieslewski, A. Jacobs, I. Troxel, and A. George. Hardware/software interface for high-performance space computing with fpga coprocessors. In Aerospace Conference, 2006 IEEE.

[15]

S. Gupta, R. K. Gupta, N. D. Dutt, and A. Nicolau. Coordinated parallelizing compiler optimizations and high-level synthesis. ACM Trans. Des. Autom. Electron. Syst., 9(4):441--470, Oct 2004.

Digital Library

[16]

M. Hind, M. Burke, P. Carini, and J. Choi. "Interprocedural pointer alias analysis." ACM Transactions on Programming Languages and Systems (TOPLAS) 21, no. 4 (1999): 848--894.

Digital Library

[17]

B. Holland, K. Nagarajan, and A. D. George, Rat: Rc amenability test for rapid performance prediction. ACM Trans. Reconfigurable Technol. Syst., vol. 1, no. 4, pp. 1--31, 2009.

Digital Library

[18]

Impulse Accelerated Technologies, Impulse C, 2003. http://impulseaccelerated.com.

[19]

N. Kapre, N. Mehta, M. deLorimier, R. Rubin, H. Barnor, M. J. Wilson, M. Wrighton, and A. DeHon, Packet-switched vs. time-multiplexed FPGA overlay networks. In Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines, 2006.

Digital Library

[20]

D. C. Ku and G. De Micheli. Hardware c - a language for hardware design. Technical report, Defense Technical Information Center OAI-PMH Repository {http://stinet.dtic.mil/oai/oai} (United States), 1998.

[21]

D. Kulkarni,W. A. Najjar, R. Rinker, and F. J. Kurdahi, Fast area estimation to support compiler optimizations in FPGA-based reconfigurable systems, in FCCM '02: Proceedings of the 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines. Washington, DC, USA: IEEE Computer Society, 2002, p. 239.

Digital Library

[22]

F. Lin, X. Dong, B. M. Chen, K.-Y. Lum, and T. H. Lee, A robust realtime embedded vision system on an unmanned rotorcraft for ground target following. IEEE Transactions on Industrial Electronics, vol. 59, pp. 1038--1049, February 2012.

[23]

S. Loo, B. Wells, N. Freije, and J. Kulick. Handel-c for rapid prototyping of vlsi coprocessors for real time systems. In System Theory, 2002. Proceedings of the Thirty-Fourth Southeastern Symposium on, pages 6 -- 10, 2002.

[24]

W. Najjar, W. Bohm, B. Draper, J. Hammes, R. Rinker, J. Beveridge, M. Chawathe, and C. Ross. High-level language abstraction for reconfigurable computing. Computer, 36(8):63--69, Aug 2003.

Digital Library

[25]

P. Panda. Systemc - a modeling platform supporting multiple design abstractions. In System Synthesis, 2001. Proceedings. The 14th International Symposium on, pages 75 -- 80, 2001.

Digital Library

[26]

P. Petersen and D. Padua. Static and dynamic evaluation of data dependence analysis techniques. Parallel and Distributed Systems, IEEE Transactions on, 7(11):1121 -- 1132, Nov 1996.

Digital Library

[27]

W. Pfeiffer and N. J. Wright, Modeling and predicting application performance on parallel computers using hpc challenge benchmarks. In 22nd IEEE International Parallel and Distributed Processing Symposium, Hyatt Regency Hotel, Miami, FL, 2008, 2008.

[28]

W. Pugh. The omega test: a fast and practical integer programming algorithm for dependence analysis. In Proceedings of the 1991 ACM/IEEE conference on Supercomputing, Supercomputing '91, pages 4--13, New York, NY, USA, 1991. ACM.

Digital Library

[29]

L. Semeria and G. De Micheli. Spc: synthesis of pointers in c application of pointer analysis to the behavioral synthesis from c. In Computer-Aided Design, 1998. ICCAD 98. Digest of Technical Papers. 1998 IEEE/ACM International Conference on, pages 340--346, Nov 1998.

Digital Library

[30]

SRC Computers, SRC Carte Programming Environment, 2012. http://www.srccomp.com/techpubs/carte.asp.

[31]

G. Stitt, R. Lysecky, F. Vahid. Dynamic Hardware/Software Partitioning: A First Approach. Design Automation Conference (DAC), 2003.

Digital Library

[32]

C. Stroud, R. Munoz, and D. Pierce. Behavioral model synthesis with cones. Design Test of Computers, IEEE, 5(3):22 --30, June 1988.

Digital Library

[33]

F. Vahid, G. Stitt, and R. Lysecky. Warp processing: Dynamic translation of binaries to fpga circuits. Computer, 41(7):40 --46, July 2008.

Digital Library

[34]

J. Villarreal, A. Park, W. Najjar, and R. Halstead. Designing modular hardware accelerators in c with roccc 2.0. In Field-Programmable Custom Computing Machines, Annual IEEE Symposium on, pages 127--134, Los Alamitos, CA, USA, 2010. IEEE Computer Society.

Digital Library

[35]

K. Wakabayashi. C-based synthesis experiences with a behavior synthesizer, "cyber". In Design, Automation and Test in Europe Conference and Exhibition 1999. Proceedings, pages 390 --393, March 1999.

Digital Library

[36]

J. R. Wernsing and G. Stitt. Elastic computing: A portable optimization framework for hybrid computers. Parallel Comput., 38(8):438--464, Aug 2012.

Digital Library

[37]

Xilinx Inc. AutoESL high-level synthesis tool. 2011. http://www.xilinx.com/tools/autoesl.htm.

[38]

Xilinx Inc. Vivado high-level synthesis. 2012. http://www.xilinx.com/products/design-tools/vivado/integration/esl-design/hls/index.htm.

Cited By

Fowers JLiu JStitt GCorporaal HStuijk S(2014)A framework for dynamic parallelization of FPGA-accelerated applicationsProceedings of the 17th International Workshop on Software and Compilers for Embedded Systems10.1145/2609248.2609256(1-10)Online publication date: 10-Jun-2014
https://dl.acm.org/doi/10.1145/2609248.2609256
Skalicky SLopez SLukowiak MWood C(2014)Mission control: A performance metric and analysis of control logic for pipelined architectures on FPGAs2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14)10.1109/ReConFig.2014.7032539(1-6)Online publication date: Dec-2014
https://doi.org/10.1109/ReConFig.2014.7032539
Fowers JOvtcharov KStrauss KChung EStitt G(2014)A High Memory Bandwidth FPGA Accelerator for Sparse Matrix-Vector Multiplication2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines10.1109/FCCM.2014.23(36-43)Online publication date: May-2014
https://doi.org/10.1109/FCCM.2014.23

Index Terms

Dynafuse: dynamic dependence analysis for FPGA pipeline fusion and locality optimizations
1. Hardware
  1. Electronic design automation
    1. High-level and register-transfer level synthesis
      1. Datapath optimization
    2. Logic synthesis
      1. Circuit optimization

Recommendations

From software to accelerators with LegUp high-level synthesis
CASES '13: Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems

Embedded system designers can achieve energy and performance benefits by using dedicated hardware accelerators. However, implementing custom hardware accelerators for an application can be difficult and time intensive. LegUp is an open-source high-level ...
Bit-level optimization for high-level synthesis and FPGA-based acceleration
FPGA '10: Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays

Automated hardware design from behavior-level abstraction has drawn wide interest in FPGA-based acceleration and configurable computing research field. However, for many high-level programming languages, such as C/C++, the description of bitwise access ...
Hardware Coprocessor Synthesis from an ANSI C Specification

Editor's note:This article shows how design space exploration can be realized through high-level synthesis. It presents a case study of a hardware implementation of the Advanced Encryption Standard (AES) Rijndael algorithm. Starting from the algorithmic ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

FPGA '13: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays

February 2013

294 pages

ISBN:9781450318877

DOI:10.1145/2435264

General Chair:
Brad Hutchings
Brigham Young University, USA
,
Program Chair:
Vaughn Betz
University of Toronto, Canada

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGDA: ACM Special Interest Group on Design Automation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 February 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

FPGA '13

Sponsor:

SIGDA

FPGA '13: The 2013 ACM/SIGDA International Symposium on Field Programmable Gate Arrays

February 11 - 13, 2013

California, Monterey, USA

Acceptance Rates

Overall Acceptance Rate 125 of 627 submissions, 20%

Upcoming Conference

FPGA '25

Sponsor:
sigda

The 2025 ACM/SIGDA International Symposium on Field Programmable Gate Arrays

February 27 - March 1, 2025

Monterey , CA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
264
Total Downloads

Downloads (Last 12 months)10
Downloads (Last 6 weeks)0

Reflects downloads up to 18 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Fowers JLiu JStitt GCorporaal HStuijk S(2014)A framework for dynamic parallelization of FPGA-accelerated applicationsProceedings of the 17th International Workshop on Software and Compilers for Embedded Systems10.1145/2609248.2609256(1-10)Online publication date: 10-Jun-2014
https://dl.acm.org/doi/10.1145/2609248.2609256
Skalicky SLopez SLukowiak MWood C(2014)Mission control: A performance metric and analysis of control logic for pipelined architectures on FPGAs2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14)10.1109/ReConFig.2014.7032539(1-6)Online publication date: Dec-2014
https://doi.org/10.1109/ReConFig.2014.7032539
Fowers JOvtcharov KStrauss KChung EStitt G(2014)A High Memory Bandwidth FPGA Accelerator for Sparse Matrix-Vector Multiplication2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines10.1109/FCCM.2014.23(36-43)Online publication date: May-2014
https://doi.org/10.1109/FCCM.2014.23

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten