skip to main content
10.1145/2463209.2488747acmconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article

Throughput-oriented kernel porting onto FPGAs

Published:29 May 2013Publication History

ABSTRACT

Reconfigurable devices are often employed in heterogeneous systems due to their low power and parallel processing advantages. An important usability requirement is the support of a homogeneous programming interface. Nevertheless, homogeneous programming interfaces do not eliminate the need for code tweaking to enable efficient mapping of the computation across heterogeneous architectures. In this work we propose a code optimization framework which analyzes and restructures CUDA kernels that are optimized for GPU devices in order to facilitate synthesis of high-throughput custom accelerators on FPGAs. The proposed framework enables efficient performance porting without manual code tweaking or annotation by the user. A hierarchical region graph in tandem with code motions and graph coloring of array variables is employed to restructure the kernel for high throughput execution on FPGAs.

References

  1. AMD Fusion family of APUs: Enabling a superior, immersive PC experience. White Paper. http://sites.amd.com/us/Documents/48423B\_fusion\_whitepaper\_WEB.pdf, Mar. 2010.Google ScholarGoogle Scholar
  2. The OpenCL specification. http://www.khronos.org/registry/cl/specs/opencl-1.1.pdf, Sept. 2010.Google ScholarGoogle Scholar
  3. The OpenACC application programming interface. http://www.openacc.org/sites/default/files/OpenACC.1.0\_0.pdf, Nov. 2011.Google ScholarGoogle Scholar
  4. Vivado design suite user guide: High-level synthesis. UG902(v2012.2). http://www.xilinx.com/support/documentation/sw\_manuals/xilinx2012\_2/ug902-vivado-high-level-synthesis.pdf, July 2012.Google ScholarGoogle Scholar
  5. R. Allen and K. Kennedy. Optimizing compilers for modern architectures. Morgan Kaufmann, first edition, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. W. Blume and R. Eigenmann. The range test: A dependence test for symbolic, non-linear expression. In Proc. ACM/IEEE Conf. on Supercomputing (SC'94), Nov. 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. Briggs, K. D. Cooper, and L. Torczon. Improvements to graph coloring register allocation. ACM Transactions on Prog. Languages and Systems, 16(3):428--455, May 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. Chaitin. Register allocation and spilling via graph coloring. ACM SIGPLAN Notices - Best of PLDI 1979--1999, 39(4):66--74, Apr. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Dave, H. Bae, S. J. Min, S. Lee, R. Eigenmann, and S. Midkiff. Cetus: A source-to-source compiler infrastructure for multicores. IEEE Computer, 42(12):36--42, Dec. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Girkar and C. Polychronopoulos. Extracting task-level parallelism. ACM Transactions on Prog. Languages and Systems, 17(4):600--634, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Z. Guo, E. Z. Zhang, and X. Shen. Correctly treating synchronizations in compiling fine-grained spmd-threaded programs for cpu. In Proc. ACM Int'l Conference on Parallel Architectures and Compilation Techniques (PACT'11), Sept. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Gupta, R. Gupta, and N. Dutt. Coordinated parallelizing compiler optimizations and high-level synthesis. ACM Transactions on Design Automation of Electronic Systems, 9(4):441--470, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Gurumani, K. Rupnow, Y. Liang, H. Cholakkail, and D. Chen. High level synthesis of multiple dependent CUDA kernels for FPGAs. In Proc. IEEE/ACM Asia and South Pacific Design Automation Conference, Jan. 2013.Google ScholarGoogle ScholarCross RefCross Ref
  14. The Convey HC-1: The world's first hybrid core computer. Datasheet. http://www.conveycomputer.com/Resources/HC-1\%20Data\%20Sheet.pdf, 2009.Google ScholarGoogle Scholar
  15. CUDA: Parallel programming and computing platform. http://www.nvidia.com/object/cuda_home_new.html, 2012.Google ScholarGoogle Scholar
  16. Zynq-7000 all programmable SoC. http://www.xilinx.com/products/silicon-devices/soc/zynq-7000/index.htm, 2012.Google ScholarGoogle Scholar
  17. Tegra super processors. http://www.nvidia.com/object/tegra-4-processor.html, 2013.Google ScholarGoogle Scholar
  18. S. Muchnick. Advanced compiler design and implementation. Morgan Kaufmann, first edition, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Owaida, N. Bellas, K. Daloukas, and C. Antonopoulos. Synthesis of platform architectures from opencl programs. In Proc. IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'11), May 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Papakonstantinou, K. Gururaj, J. Stratton, D. Chen, J. Cong, and W. Hwu. FCUDA: enabling efficient compilation of cuda kernels onto FPGAs. In Proc. IEEE Symposium on Application Specific Processors, June 2009.Google ScholarGoogle ScholarCross RefCross Ref
  21. A. Papakonstantinou, K. Gururaj, J. Stratton, D. Chen, J. Cong, and W. Hwu. Efficient compilation of CUDA kernels for high-performance computing on FPGAs. ACM Transactions in Embedded Computing Systems, Vol. 13, 2014.Google ScholarGoogle Scholar
  22. A. Papakonstantinou, Y. Liang, J. Stratton, K. Gururaj, D. Chen, W. Hwu, and J. Cong. Multilevel granularity parallelism synthesis on FPGAs. In Proc. IEEE Int'l Symposium on Field-Programmable Custom Computing Machines, May 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Stratton, V. Grover, J. Marathe, B. Aarts, M. Murphy, Z. Hu, and W. Hwu. Efficient compilation of fine-grained SPMD-threaded programs for multicore cpus. In Proc. ACM Int'l Symposium on Code Generation and Optimization (CGO'10), Feb. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Z. Y. Zhang, F. W. Jiang, G. Han, C. Yang, and J. Cong. Autopilot: A platform-based ESL synthesis system. In P. Coussy and A. Moraviec, editors, High-Level Synthesis: From Algorithm to Digital Circuit, chapter 6, pages 99--112. Springer, 2008.Google ScholarGoogle Scholar

Index Terms

  1. Throughput-oriented kernel porting onto FPGAs

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          DAC '13: Proceedings of the 50th Annual Design Automation Conference
          May 2013
          1285 pages
          ISBN:9781450320719
          DOI:10.1145/2463209

          Copyright © 2013 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 29 May 2013

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate1,770of5,499submissions,32%

          Upcoming Conference

          DAC '24
          61st ACM/IEEE Design Automation Conference
          June 23 - 27, 2024
          San Francisco , CA , USA

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader