skip to main content
10.1145/2435264.2435300acmconferencesArticle/Chapter ViewAbstractPublication PagesfpgaConference Proceedingsconference-collections
research-article

Dynafuse: dynamic dependence analysis for FPGA pipeline fusion and locality optimizations

Published:11 February 2013Publication History

ABSTRACT

Although high-level synthesis improves FPGA productivity by enabling designers to use high-level code, the resulting performance is often significantly worse than register-transfer-level designs. One cause of such limited optimization is that high-level synthesis tools are restricted by multiple possible dependencies due to the undecidability of alias analysis. In this paper, we introduce the Dynafuse optimization, which analyzes dependencies dynamically to resolve aliases and enable runtime circuit optimizations. To resolve aliases, Dynafuse provides a specialized software data structure that dynamically determines definition-use chains between FPGA functions. In addition, Dynafuse statically creates a reconfigurable overlay network that uses detected dependencies to dynamically adjust connections between functions and memories in order to fuse pipelines and exploit data locality. Experimental results show that Dynafuse sped up two existing FPGA applications by 1.6-1.8x when exploiting locality and by 3-5x when fusing pipelines. Furthermore, the speedup from pipeline fusion increases linearly with the number of fused functions, which suggests larger applications will experience larger improvements.

References

  1. U. Banerjee. An introduction to a formal theory of dependence analysis. The Journal of Supercomputing, 2:133--149, 1988.Google ScholarGoogle ScholarCross RefCross Ref
  2. U. Banerjee, R. Eigenmann, A. Nicolau, and D. Padua. Automatic program parallelization. Proceedings of the IEEE, 81(2):211 --243, Feb 1993.Google ScholarGoogle ScholarCross RefCross Ref
  3. Brisk, A.K. Verma, and P. Ienne. "Optimal polynomial-time interprocedural register allocation for high-level synthesis and ASIP design." InComputer-Aided Design, 2007. ICCAD 2007. IEEE/ACM International Conference on, pp. 172--179. IEEE, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. K. Beyls and E. D'Hollander. Discovery of locality-improving refactorings by reuse path analysis. In M. Gerndt and D. Kranzlmüller, editors, High Performance Computing and Communications, volume 4208 of Lecture Notes in Computer Science, pages 220--229. Springer Berlin / Heidelberg, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Canis, J. Choi, M. Aldham, V. Zhang, A. Kammoona, J.H. Anderson, S. Brown, and T. Czajkowski. "LegUp: high-level synthesis for FPGA-based processor/accelerator systems." InProceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays, pp. 33--36. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Cong. A new generation of C-based synthesis tool and domain specific computing, in Proc. IEEE Int. SoC Conf., vol. 6507, Newport Beach, CA, USA, Sept. 2008, pp. 386--386.Google ScholarGoogle Scholar
  7. J. Coole, G. Stitt. Intermediate fabrics: Virtual architectures for circuit portability and fast placement and routing, Hardware/Software Codesign and System Synthesis (CODES+ISSS), 2010 IEEE/ACM/IFIP International Conference on , vol., no., pages 13--22, 24--29 Oct. 2010 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. H. Devos, K. Beyls, M. Christiaens, J. Van Campenhout, E. D'Hollander, and D. Stroobandt. Finding and applying loop transformations for generating optimized fpga implementations. In P. Stenström, editor, Transactions on High-Performance Embedded Architectures and Compilers I, volume 4050 of Lecture Notes in Computer Science, pages 159--178. Springer Berlin / Heidelberg, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. P. Eles, Z. Peng, K. Kuchinski, and A. Doboli, System level hardware/software partitioning based on simulated annealing and tabu search, Des. Autom. Embed. Syst., vol. 2, no. 1, pp. 5--32, Jan. 1997.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Fowers, G. Brown, P. Cooke, and G. Stitt, A performance and energy comparison of FPGAs, GPUs, and multicores for sliding-window applications, FPGA, pp. 47--56, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Fujita and H. Nakamura. The standard specc language. In Proceedings of the 14th international symposium on Systems synthesis, ISSS '01, pages 81--86, New York, NY, USA, 2001. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. D. Galloway. The transmogrifier c hardware description language and compiler for fpgas. In IEEE Symposuim on FPGAs for Custom Computing Machines, pages 136--144, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. I. Gordon, W. Thies, and S. Amarasinghe. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. SIGPLAN Not., 41(11):151--162, Oct 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Greco, G. Cieslewski, A. Jacobs, I. Troxel, and A. George. Hardware/software interface for high-performance space computing with fpga coprocessors. In Aerospace Conference, 2006 IEEE.Google ScholarGoogle ScholarCross RefCross Ref
  15. S. Gupta, R. K. Gupta, N. D. Dutt, and A. Nicolau. Coordinated parallelizing compiler optimizations and high-level synthesis. ACM Trans. Des. Autom. Electron. Syst., 9(4):441--470, Oct 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Hind, M. Burke, P. Carini, and J. Choi. "Interprocedural pointer alias analysis." ACM Transactions on Programming Languages and Systems (TOPLAS) 21, no. 4 (1999): 848--894. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. B. Holland, K. Nagarajan, and A. D. George, Rat: Rc amenability test for rapid performance prediction. ACM Trans. Reconfigurable Technol. Syst., vol. 1, no. 4, pp. 1--31, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Impulse Accelerated Technologies, Impulse C, 2003. http://impulseaccelerated.com.Google ScholarGoogle Scholar
  19. N. Kapre, N. Mehta, M. deLorimier, R. Rubin, H. Barnor, M. J. Wilson, M. Wrighton, and A. DeHon, Packet-switched vs. time-multiplexed FPGA overlay networks. In Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D. C. Ku and G. De Micheli. Hardware c - a language for hardware design. Technical report, Defense Technical Information Center OAI-PMH Repository {http://stinet.dtic.mil/oai/oai} (United States), 1998.Google ScholarGoogle Scholar
  21. D. Kulkarni,W. A. Najjar, R. Rinker, and F. J. Kurdahi, Fast area estimation to support compiler optimizations in FPGA-based reconfigurable systems, in FCCM '02: Proceedings of the 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines. Washington, DC, USA: IEEE Computer Society, 2002, p. 239. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. F. Lin, X. Dong, B. M. Chen, K.-Y. Lum, and T. H. Lee, A robust realtime embedded vision system on an unmanned rotorcraft for ground target following. IEEE Transactions on Industrial Electronics, vol. 59, pp. 1038--1049, February 2012.Google ScholarGoogle ScholarCross RefCross Ref
  23. S. Loo, B. Wells, N. Freije, and J. Kulick. Handel-c for rapid prototyping of vlsi coprocessors for real time systems. In System Theory, 2002. Proceedings of the Thirty-Fourth Southeastern Symposium on, pages 6 -- 10, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  24. W. Najjar, W. Bohm, B. Draper, J. Hammes, R. Rinker, J. Beveridge, M. Chawathe, and C. Ross. High-level language abstraction for reconfigurable computing. Computer, 36(8):63--69, Aug 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. P. Panda. Systemc - a modeling platform supporting multiple design abstractions. In System Synthesis, 2001. Proceedings. The 14th International Symposium on, pages 75 -- 80, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. P. Petersen and D. Padua. Static and dynamic evaluation of data dependence analysis techniques. Parallel and Distributed Systems, IEEE Transactions on, 7(11):1121 -- 1132, Nov 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. W. Pfeiffer and N. J. Wright, Modeling and predicting application performance on parallel computers using hpc challenge benchmarks. In 22nd IEEE International Parallel and Distributed Processing Symposium, Hyatt Regency Hotel, Miami, FL, 2008, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  28. W. Pugh. The omega test: a fast and practical integer programming algorithm for dependence analysis. In Proceedings of the 1991 ACM/IEEE conference on Supercomputing, Supercomputing '91, pages 4--13, New York, NY, USA, 1991. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. L. Semeria and G. De Micheli. Spc: synthesis of pointers in c application of pointer analysis to the behavioral synthesis from c. In Computer-Aided Design, 1998. ICCAD 98. Digest of Technical Papers. 1998 IEEE/ACM International Conference on, pages 340--346, Nov 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. SRC Computers, SRC Carte Programming Environment, 2012. http://www.srccomp.com/techpubs/carte.asp.Google ScholarGoogle Scholar
  31. G. Stitt, R. Lysecky, F. Vahid. Dynamic Hardware/Software Partitioning: A First Approach. Design Automation Conference (DAC), 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. C. Stroud, R. Munoz, and D. Pierce. Behavioral model synthesis with cones. Design Test of Computers, IEEE, 5(3):22 --30, June 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. F. Vahid, G. Stitt, and R. Lysecky. Warp processing: Dynamic translation of binaries to fpga circuits. Computer, 41(7):40 --46, July 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. J. Villarreal, A. Park, W. Najjar, and R. Halstead. Designing modular hardware accelerators in c with roccc 2.0. In Field-Programmable Custom Computing Machines, Annual IEEE Symposium on, pages 127--134, Los Alamitos, CA, USA, 2010. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. K. Wakabayashi. C-based synthesis experiences with a behavior synthesizer, "cyber". In Design, Automation and Test in Europe Conference and Exhibition 1999. Proceedings, pages 390 --393, March 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. J. R. Wernsing and G. Stitt. Elastic computing: A portable optimization framework for hybrid computers. Parallel Comput., 38(8):438--464, Aug 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Xilinx Inc. AutoESL high-level synthesis tool. 2011. http://www.xilinx.com/tools/autoesl.htm.Google ScholarGoogle Scholar
  38. Xilinx Inc. Vivado high-level synthesis. 2012. http://www.xilinx.com/products/design-tools/vivado/integration/esl-design/hls/index.htm.Google ScholarGoogle Scholar

Index Terms

  1. Dynafuse: dynamic dependence analysis for FPGA pipeline fusion and locality optimizations

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        FPGA '13: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
        February 2013
        294 pages
        ISBN:9781450318877
        DOI:10.1145/2435264

        Copyright © 2013 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 11 February 2013

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate125of627submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader