ABSTRACT
Although high-level synthesis improves FPGA productivity by enabling designers to use high-level code, the resulting performance is often significantly worse than register-transfer-level designs. One cause of such limited optimization is that high-level synthesis tools are restricted by multiple possible dependencies due to the undecidability of alias analysis. In this paper, we introduce the Dynafuse optimization, which analyzes dependencies dynamically to resolve aliases and enable runtime circuit optimizations. To resolve aliases, Dynafuse provides a specialized software data structure that dynamically determines definition-use chains between FPGA functions. In addition, Dynafuse statically creates a reconfigurable overlay network that uses detected dependencies to dynamically adjust connections between functions and memories in order to fuse pipelines and exploit data locality. Experimental results show that Dynafuse sped up two existing FPGA applications by 1.6-1.8x when exploiting locality and by 3-5x when fusing pipelines. Furthermore, the speedup from pipeline fusion increases linearly with the number of fused functions, which suggests larger applications will experience larger improvements.
- U. Banerjee. An introduction to a formal theory of dependence analysis. The Journal of Supercomputing, 2:133--149, 1988.Google ScholarCross Ref
- U. Banerjee, R. Eigenmann, A. Nicolau, and D. Padua. Automatic program parallelization. Proceedings of the IEEE, 81(2):211 --243, Feb 1993.Google ScholarCross Ref
- Brisk, A.K. Verma, and P. Ienne. "Optimal polynomial-time interprocedural register allocation for high-level synthesis and ASIP design." InComputer-Aided Design, 2007. ICCAD 2007. IEEE/ACM International Conference on, pp. 172--179. IEEE, 2007. Google ScholarDigital Library
- K. Beyls and E. D'Hollander. Discovery of locality-improving refactorings by reuse path analysis. In M. Gerndt and D. Kranzlmüller, editors, High Performance Computing and Communications, volume 4208 of Lecture Notes in Computer Science, pages 220--229. Springer Berlin / Heidelberg, 2006. Google ScholarDigital Library
- A. Canis, J. Choi, M. Aldham, V. Zhang, A. Kammoona, J.H. Anderson, S. Brown, and T. Czajkowski. "LegUp: high-level synthesis for FPGA-based processor/accelerator systems." InProceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays, pp. 33--36. ACM, 2011. Google ScholarDigital Library
- J. Cong. A new generation of C-based synthesis tool and domain specific computing, in Proc. IEEE Int. SoC Conf., vol. 6507, Newport Beach, CA, USA, Sept. 2008, pp. 386--386.Google Scholar
- J. Coole, G. Stitt. Intermediate fabrics: Virtual architectures for circuit portability and fast placement and routing, Hardware/Software Codesign and System Synthesis (CODES+ISSS), 2010 IEEE/ACM/IFIP International Conference on , vol., no., pages 13--22, 24--29 Oct. 2010 Google ScholarDigital Library
- H. Devos, K. Beyls, M. Christiaens, J. Van Campenhout, E. D'Hollander, and D. Stroobandt. Finding and applying loop transformations for generating optimized fpga implementations. In P. Stenström, editor, Transactions on High-Performance Embedded Architectures and Compilers I, volume 4050 of Lecture Notes in Computer Science, pages 159--178. Springer Berlin / Heidelberg, 2007. Google ScholarDigital Library
- P. Eles, Z. Peng, K. Kuchinski, and A. Doboli, System level hardware/software partitioning based on simulated annealing and tabu search, Des. Autom. Embed. Syst., vol. 2, no. 1, pp. 5--32, Jan. 1997.Google ScholarDigital Library
- J. Fowers, G. Brown, P. Cooke, and G. Stitt, A performance and energy comparison of FPGAs, GPUs, and multicores for sliding-window applications, FPGA, pp. 47--56, 2012. Google ScholarDigital Library
- M. Fujita and H. Nakamura. The standard specc language. In Proceedings of the 14th international symposium on Systems synthesis, ISSS '01, pages 81--86, New York, NY, USA, 2001. ACM. Google ScholarDigital Library
- D. Galloway. The transmogrifier c hardware description language and compiler for fpgas. In IEEE Symposuim on FPGAs for Custom Computing Machines, pages 136--144, 1995. Google ScholarDigital Library
- M. I. Gordon, W. Thies, and S. Amarasinghe. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. SIGPLAN Not., 41(11):151--162, Oct 2006. Google ScholarDigital Library
- J. Greco, G. Cieslewski, A. Jacobs, I. Troxel, and A. George. Hardware/software interface for high-performance space computing with fpga coprocessors. In Aerospace Conference, 2006 IEEE.Google ScholarCross Ref
- S. Gupta, R. K. Gupta, N. D. Dutt, and A. Nicolau. Coordinated parallelizing compiler optimizations and high-level synthesis. ACM Trans. Des. Autom. Electron. Syst., 9(4):441--470, Oct 2004. Google ScholarDigital Library
- M. Hind, M. Burke, P. Carini, and J. Choi. "Interprocedural pointer alias analysis." ACM Transactions on Programming Languages and Systems (TOPLAS) 21, no. 4 (1999): 848--894. Google ScholarDigital Library
- B. Holland, K. Nagarajan, and A. D. George, Rat: Rc amenability test for rapid performance prediction. ACM Trans. Reconfigurable Technol. Syst., vol. 1, no. 4, pp. 1--31, 2009. Google ScholarDigital Library
- Impulse Accelerated Technologies, Impulse C, 2003. http://impulseaccelerated.com.Google Scholar
- N. Kapre, N. Mehta, M. deLorimier, R. Rubin, H. Barnor, M. J. Wilson, M. Wrighton, and A. DeHon, Packet-switched vs. time-multiplexed FPGA overlay networks. In Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines, 2006. Google ScholarDigital Library
- D. C. Ku and G. De Micheli. Hardware c - a language for hardware design. Technical report, Defense Technical Information Center OAI-PMH Repository {http://stinet.dtic.mil/oai/oai} (United States), 1998.Google Scholar
- D. Kulkarni,W. A. Najjar, R. Rinker, and F. J. Kurdahi, Fast area estimation to support compiler optimizations in FPGA-based reconfigurable systems, in FCCM '02: Proceedings of the 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines. Washington, DC, USA: IEEE Computer Society, 2002, p. 239. Google ScholarDigital Library
- F. Lin, X. Dong, B. M. Chen, K.-Y. Lum, and T. H. Lee, A robust realtime embedded vision system on an unmanned rotorcraft for ground target following. IEEE Transactions on Industrial Electronics, vol. 59, pp. 1038--1049, February 2012.Google ScholarCross Ref
- S. Loo, B. Wells, N. Freije, and J. Kulick. Handel-c for rapid prototyping of vlsi coprocessors for real time systems. In System Theory, 2002. Proceedings of the Thirty-Fourth Southeastern Symposium on, pages 6 -- 10, 2002.Google ScholarCross Ref
- W. Najjar, W. Bohm, B. Draper, J. Hammes, R. Rinker, J. Beveridge, M. Chawathe, and C. Ross. High-level language abstraction for reconfigurable computing. Computer, 36(8):63--69, Aug 2003. Google ScholarDigital Library
- P. Panda. Systemc - a modeling platform supporting multiple design abstractions. In System Synthesis, 2001. Proceedings. The 14th International Symposium on, pages 75 -- 80, 2001. Google ScholarDigital Library
- P. Petersen and D. Padua. Static and dynamic evaluation of data dependence analysis techniques. Parallel and Distributed Systems, IEEE Transactions on, 7(11):1121 -- 1132, Nov 1996. Google ScholarDigital Library
- W. Pfeiffer and N. J. Wright, Modeling and predicting application performance on parallel computers using hpc challenge benchmarks. In 22nd IEEE International Parallel and Distributed Processing Symposium, Hyatt Regency Hotel, Miami, FL, 2008, 2008.Google ScholarCross Ref
- W. Pugh. The omega test: a fast and practical integer programming algorithm for dependence analysis. In Proceedings of the 1991 ACM/IEEE conference on Supercomputing, Supercomputing '91, pages 4--13, New York, NY, USA, 1991. ACM. Google ScholarDigital Library
- L. Semeria and G. De Micheli. Spc: synthesis of pointers in c application of pointer analysis to the behavioral synthesis from c. In Computer-Aided Design, 1998. ICCAD 98. Digest of Technical Papers. 1998 IEEE/ACM International Conference on, pages 340--346, Nov 1998. Google ScholarDigital Library
- SRC Computers, SRC Carte Programming Environment, 2012. http://www.srccomp.com/techpubs/carte.asp.Google Scholar
- G. Stitt, R. Lysecky, F. Vahid. Dynamic Hardware/Software Partitioning: A First Approach. Design Automation Conference (DAC), 2003. Google ScholarDigital Library
- C. Stroud, R. Munoz, and D. Pierce. Behavioral model synthesis with cones. Design Test of Computers, IEEE, 5(3):22 --30, June 1988. Google ScholarDigital Library
- F. Vahid, G. Stitt, and R. Lysecky. Warp processing: Dynamic translation of binaries to fpga circuits. Computer, 41(7):40 --46, July 2008. Google ScholarDigital Library
- J. Villarreal, A. Park, W. Najjar, and R. Halstead. Designing modular hardware accelerators in c with roccc 2.0. In Field-Programmable Custom Computing Machines, Annual IEEE Symposium on, pages 127--134, Los Alamitos, CA, USA, 2010. IEEE Computer Society. Google ScholarDigital Library
- K. Wakabayashi. C-based synthesis experiences with a behavior synthesizer, "cyber". In Design, Automation and Test in Europe Conference and Exhibition 1999. Proceedings, pages 390 --393, March 1999. Google ScholarDigital Library
- J. R. Wernsing and G. Stitt. Elastic computing: A portable optimization framework for hybrid computers. Parallel Comput., 38(8):438--464, Aug 2012. Google ScholarDigital Library
- Xilinx Inc. AutoESL high-level synthesis tool. 2011. http://www.xilinx.com/tools/autoesl.htm.Google Scholar
- Xilinx Inc. Vivado high-level synthesis. 2012. http://www.xilinx.com/products/design-tools/vivado/integration/esl-design/hls/index.htm.Google Scholar
Index Terms
- Dynafuse: dynamic dependence analysis for FPGA pipeline fusion and locality optimizations
Recommendations
From software to accelerators with LegUp high-level synthesis
CASES '13: Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded SystemsEmbedded system designers can achieve energy and performance benefits by using dedicated hardware accelerators. However, implementing custom hardware accelerators for an application can be difficult and time intensive. LegUp is an open-source high-level ...
Bit-level optimization for high-level synthesis and FPGA-based acceleration
FPGA '10: Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arraysAutomated hardware design from behavior-level abstraction has drawn wide interest in FPGA-based acceleration and configurable computing research field. However, for many high-level programming languages, such as C/C++, the description of bitwise access ...
Hardware Coprocessor Synthesis from an ANSI C Specification
Editor's note:This article shows how design space exploration can be realized through high-level synthesis. It presents a case study of a hardware implementation of the Advanced Encryption Standard (AES) Rijndael algorithm. Starting from the algorithmic ...
Comments