skip to main content
10.1145/3489517.3530681acmconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article

Accelerator design with decoupled hardware customizations: benefits and challenges: invited

Published: 23 August 2022 Publication History

Abstract

The past decade has witnessed increasing adoption of high-level synthesis (HLS) to implement specialized hardware accelerators targeting either FPGAs or ASICs. However, current HLS programming models entangle algorithm specifications with hardware customization techniques, which lowers both the productivity and portability of the accelerator design. To tackle this problem, recent efforts such as HeteroCL propose to decouple algorithm definition from essential hardware customization techniques in compute, data type, and memory, increasing productivity, portability, and performance.
While the decoupling of the algorithm and customizations provides benefits to the compilation/synthesis process, they also create new hurdles for the programmers to productively debug and validate the correctness of the optimized design. In this work, using HeteroCL and realistic machine learning applications as case studies, we first explain the key advantages of the decoupled programming model brought to a programmer to rapidly develop high-performance accelerators. Using the same case studies, we will further show how seemingly benign usage of the customization primitives can lead to new challenges to verification. We will then outline the research opportunities and discuss some of our recent efforts as the first step to enable a robust and viable verification solution in the future.

References

[1]
A. Adams, K. Ma, L. Anderson, R. Baghdadi, T.-M. Li, M. Gharbi, B. Steiner, and S. e. a. Johnson. Learning to Optimize Halide with Tree Search and Random Programs. ACM Trans. on Graphics (TOG), 2019.
[2]
W. Bao, S. Krishnamoorthy, and L.-N. e. a. Pouchet. PolyCheck: Dynamic Verification of Iteration Space Transformations on Affine Programs. ACM SIGPLAN Symp. on Principles of Programming Languages (POPL), 2016.
[3]
M.-W. Benabderrahmane, L.-N. Pouchet, A. Cohen, and C. Bastoul. The Polyhedral Model is More Widely Applicable than You Think. Int'l Conf. on Compiler Construction (CC), 2010.
[4]
T. Chen, T. Moreau, Z. Jiang, L. Zheng, E. Yan, M. Cowan, H. Shen, L. Wang, Y. Hu, L. Ceze, C. Guestrin, and A. Krishnamurthy. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. USENIX Conf. on Operating Systems Design and Implementation (OSDI), 2018.
[5]
T. Chen, L. Zheng, E. Yan, Z. Jiang, T. Moreau, L. Ceze, C. Guestrin, and A. Krishnamurthy. Learning to Optimize Tensor Programs. Int'l Conference on Neural Information Processing Systems (NeurIPS), 2018.
[6]
J. Cong, B. Liu, S. Neuendorffer, J. Noguera, K. Vissers, and Z. Zhang. High-Level Synthesis for FPGAs: From Prototyping to Deployment. IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 2011.
[7]
P. Feautrier. Dataflow Analysis of Array and Scalar References. Int'l Journal of Parallel Program (JPP), 1991.
[8]
Y. Lai, E. Ustun, S. Xiang, Z. Fang, H. Rong, and Z. Zhang. Programming and Synthesis for Software-defined FPGA Acceleration: Status and Future Prospects. ACM Trans. on Reconfigurable Technology and Systems (TRETS), 2021.
[9]
Y.-H. Lai, Y. Chi, Y. Hu, J. Wang, C. H. Yu, Y. Zhou, J. Cong, and Z. Zhang. HeteroCL: A Multi-paradigm Programming Infrastructure for Software-defined Re-configurable Computing. Int'l Symp. on Field-Programmable Gate Arrays (FPGA), 2019.
[10]
Y.-H. Lai, H. Rong, S. Zheng, W. Zhang, X. Cui, Y. Jia, J.Wang, B. Sullivan, Z. Zhang, Y. Liang, et al. SuSy: A Programming Model for Productive Construction of High-Performance Systolic Arrays on FPGAs. Int'l Conf. on Computer-Aided Design (ICCAD), 2020.
[11]
C. Lattner, J. A. Pienaar, M. Amini, U. Bondhugula, R. Riddle, A. Cohen, T. Shpeisman, A. Davis, N. Vasilache, and O. Zinenko. MLIR: A Compiler Infrastructure for the End of Moore's Law. arXiv, 2020.
[12]
X. Leroy. Formal Verification of a Realistic Compiler. Commun. ACM, 2009.
[13]
T. Moreau, T. Chen, Z. Jiang, L. Ceze, C. Guestrin, and A. Krishnamurthy. VTA: An Open Hardware-Software Stack for Deep Learning. CoRR, 2018.
[14]
L.-N. Pouchet, U. Bondhugula, C. Bastoul, A. Cohen, J. Ramanujam, P. Sadayappan, and N. Vasilache. Loop Transformations: Convexity, Pruning and Optimization. ACM SIGPLAN Symp. on Principles of Programming Languages (POPL), 2011.
[15]
L.-N. Pouchet, P. Zhang, P. Sadayappan, and J. Cong. Polyhedral-Based Data Reuse Optimization for Configurable Computing. Int'l Symp. on Field-Programmable Gate Arrays (FPGA), 2013.
[16]
J. Ragan-Kelley, C. Barnes, A. Adams, S. Paris, F. Durand, and S. Amarasinghe. Halide: A Language and Compiler for Optimizing Parallelism, Locality, and Recomputation in Image Processing Pipelines. ACM SIGPLAN Notices, 2013.
[17]
A. Sohrabizadeh, Y. Bai, Y. Sun, and J. Cong. Automated Accelerator Optimization Aided by Graph Neural Networks. Design Automation Conf. (DAC), 2022.
[18]
N. Srivastava, H. Rong, P. Barua, G. Feng, H. Cao, Z. Zhang, D. Albonesi, V. Sarkar, W. Chen, P. Petersen, et al. T2S-Tensor: Productively Generating High-performance Spatial Hardware for Dense Tensor Computations. IEEE Symp. on Field Programmable Custom Computing Machines (FCCM), 2019.
[19]
Q. Sun, T. Chen, S. Liu, J. Miao, J. Chen, H. Yu, and B. Yu. Correlated Multi-objective Multi-fidelity Optimization for HLS Directives Design. Design, Automation, and Test in Europe (DATE), 2021.
[20]
S. Xiang, Y. Lai, Y. Zhou, H. Chen, N. Zhang, D. Pal, and Z. Zhang. HeteroFlow: An Accelerator Programming Model with Decoupled Data Placement for Software-Defined FPGAs. Int'l Symp. on Field-Programmable Gate Arrays (FPGA), 2022.
[21]
K. Zhan, J. Guo, B. Song, W. Zhang, and Z. Bao. UltraNet: An FPGA-based Object Detection for the DAC-SDC 2020. https://github.com/heheda365/ultra_net. Accessed: April 18, 2022.
[22]
J. Zhao, S. Nagarakatte, M. M. Martin, and S. Zdancewic. Formalizing the LLVM Intermediate Representation for Verified Program Transformations. ACM SIGPLAN Symp. on Principles of Programming Languages (POPL), 2012.
[23]
L. Zheng, C. Jia, M. Sun, Z. Wu, C. H. Yu, A. Haj-Ali, Y. Wang, J. Yang, D. Zhuo, K. Sen, J. E. Gonzalez, and I. Stoica. Ansor: Generating High-Performance Tensor Programs for Deep Learning. OSDI, 2020.
[24]
S. Zheng, Y. Liang, S. Wang, R. Chen, and K. Sheng. FlexTensor: An Automatic Schedule Exploration and Optimization Framework for Tensor Computation on Heterogeneous System. Int'l Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2020.
[25]
Y. Zhou, U. Gupta, S. Dai, R. Zhao, N. Srivastava, H. Jin, J. Featherston, Y.-H. Lai, G. Liu, G. A. Velasquez, W. Wang, and Z. Zhang. Rosetta: A Realistic High-Level Synthesis Benchmark Suite for Software Programmable FPGAs. Int'l Symp. on Field-Programmable Gate Arrays (FPGA), 2018.

Cited By

View all
  • (2025)ARIES: An Agile MLIR-Based Compilation Flow for Reconfigurable Devices with AI EnginesProceedings of the 2025 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3706628.3708870(92-102)Online publication date: 27-Feb-2025
  • (2024)SPARTA: High-Level Synthesis of Parallel Multi-Threaded AcceleratorsACM Transactions on Reconfigurable Technology and Systems10.1145/367703518:1(1-30)Online publication date: 12-Jul-2024
  • (2024)Allo: A Programming Model for Composable Accelerator DesignProceedings of the ACM on Programming Languages10.1145/36564018:PLDI(593-620)Online publication date: 20-Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DAC '22: Proceedings of the 59th ACM/IEEE Design Automation Conference
July 2022
1462 pages
ISBN:9781450391429
DOI:10.1145/3489517
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 August 2022

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

DAC '22
Sponsor:
DAC '22: 59th ACM/IEEE Design Automation Conference
July 10 - 14, 2022
California, San Francisco

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25
62nd ACM/IEEE Design Automation Conference
June 22 - 26, 2025
San Francisco , CA , USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)56
  • Downloads (Last 6 weeks)6
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)ARIES: An Agile MLIR-Based Compilation Flow for Reconfigurable Devices with AI EnginesProceedings of the 2025 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3706628.3708870(92-102)Online publication date: 27-Feb-2025
  • (2024)SPARTA: High-Level Synthesis of Parallel Multi-Threaded AcceleratorsACM Transactions on Reconfigurable Technology and Systems10.1145/367703518:1(1-30)Online publication date: 12-Jul-2024
  • (2024)Allo: A Programming Model for Composable Accelerator DesignProceedings of the ACM on Programming Languages10.1145/36564018:PLDI(593-620)Online publication date: 20-Jun-2024
  • (2024)Understanding the Potential of FPGA-based Spatial Acceleration for Large Language Model InferenceACM Transactions on Reconfigurable Technology and Systems10.1145/365617718:1(1-29)Online publication date: 17-Dec-2024
  • (2024)Formal Verification of Source-to-Source Transformations for HLSProceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3626202.3637563(97-107)Online publication date: 1-Apr-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media