research-article

Accelerator design with decoupled hardware customizations: benefits and challenges: invited

Authors:

Niansong Zhang,

Hongzheng Chen,

Pasquale Cocchini,

Louis-Noël Pouchet,

Zhiru ZhangAuthors Info & Claims

DAC '22: Proceedings of the 59th ACM/IEEE Design Automation Conference

Pages 1351 - 1354

https://doi.org/10.1145/3489517.3530681

Published: 23 August 2022 Publication History

Abstract

The past decade has witnessed increasing adoption of high-level synthesis (HLS) to implement specialized hardware accelerators targeting either FPGAs or ASICs. However, current HLS programming models entangle algorithm specifications with hardware customization techniques, which lowers both the productivity and portability of the accelerator design. To tackle this problem, recent efforts such as HeteroCL propose to decouple algorithm definition from essential hardware customization techniques in compute, data type, and memory, increasing productivity, portability, and performance.

While the decoupling of the algorithm and customizations provides benefits to the compilation/synthesis process, they also create new hurdles for the programmers to productively debug and validate the correctness of the optimized design. In this work, using HeteroCL and realistic machine learning applications as case studies, we first explain the key advantages of the decoupled programming model brought to a programmer to rapidly develop high-performance accelerators. Using the same case studies, we will further show how seemingly benign usage of the customization primitives can lead to new challenges to verification. We will then outline the research opportunities and discuss some of our recent efforts as the first step to enable a robust and viable verification solution in the future.

References

[1]

A. Adams, K. Ma, L. Anderson, R. Baghdadi, T.-M. Li, M. Gharbi, B. Steiner, and S. e. a. Johnson. Learning to Optimize Halide with Tree Search and Random Programs. ACM Trans. on Graphics (TOG), 2019.

[2]

W. Bao, S. Krishnamoorthy, and L.-N. e. a. Pouchet. PolyCheck: Dynamic Verification of Iteration Space Transformations on Affine Programs. ACM SIGPLAN Symp. on Principles of Programming Languages (POPL), 2016.

[3]

M.-W. Benabderrahmane, L.-N. Pouchet, A. Cohen, and C. Bastoul. The Polyhedral Model is More Widely Applicable than You Think. Int'l Conf. on Compiler Construction (CC), 2010.

[4]

T. Chen, T. Moreau, Z. Jiang, L. Zheng, E. Yan, M. Cowan, H. Shen, L. Wang, Y. Hu, L. Ceze, C. Guestrin, and A. Krishnamurthy. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. USENIX Conf. on Operating Systems Design and Implementation (OSDI), 2018.

[5]

T. Chen, L. Zheng, E. Yan, Z. Jiang, T. Moreau, L. Ceze, C. Guestrin, and A. Krishnamurthy. Learning to Optimize Tensor Programs. Int'l Conference on Neural Information Processing Systems (NeurIPS), 2018.

[6]

J. Cong, B. Liu, S. Neuendorffer, J. Noguera, K. Vissers, and Z. Zhang. High-Level Synthesis for FPGAs: From Prototyping to Deployment. IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 2011.

[7]

P. Feautrier. Dataflow Analysis of Array and Scalar References. Int'l Journal of Parallel Program (JPP), 1991.

[8]

Y. Lai, E. Ustun, S. Xiang, Z. Fang, H. Rong, and Z. Zhang. Programming and Synthesis for Software-defined FPGA Acceleration: Status and Future Prospects. ACM Trans. on Reconfigurable Technology and Systems (TRETS), 2021.

[9]

Y.-H. Lai, Y. Chi, Y. Hu, J. Wang, C. H. Yu, Y. Zhou, J. Cong, and Z. Zhang. HeteroCL: A Multi-paradigm Programming Infrastructure for Software-defined Re-configurable Computing. Int'l Symp. on Field-Programmable Gate Arrays (FPGA), 2019.

Digital Library

[10]

Y.-H. Lai, H. Rong, S. Zheng, W. Zhang, X. Cui, Y. Jia, J.Wang, B. Sullivan, Z. Zhang, Y. Liang, et al. SuSy: A Programming Model for Productive Construction of High-Performance Systolic Arrays on FPGAs. Int'l Conf. on Computer-Aided Design (ICCAD), 2020.

Digital Library

[11]

C. Lattner, J. A. Pienaar, M. Amini, U. Bondhugula, R. Riddle, A. Cohen, T. Shpeisman, A. Davis, N. Vasilache, and O. Zinenko. MLIR: A Compiler Infrastructure for the End of Moore's Law. arXiv, 2020.

[12]

X. Leroy. Formal Verification of a Realistic Compiler. Commun. ACM, 2009.

Digital Library

[13]

T. Moreau, T. Chen, Z. Jiang, L. Ceze, C. Guestrin, and A. Krishnamurthy. VTA: An Open Hardware-Software Stack for Deep Learning. CoRR, 2018.

[14]

L.-N. Pouchet, U. Bondhugula, C. Bastoul, A. Cohen, J. Ramanujam, P. Sadayappan, and N. Vasilache. Loop Transformations: Convexity, Pruning and Optimization. ACM SIGPLAN Symp. on Principles of Programming Languages (POPL), 2011.

[15]

L.-N. Pouchet, P. Zhang, P. Sadayappan, and J. Cong. Polyhedral-Based Data Reuse Optimization for Configurable Computing. Int'l Symp. on Field-Programmable Gate Arrays (FPGA), 2013.

[16]

J. Ragan-Kelley, C. Barnes, A. Adams, S. Paris, F. Durand, and S. Amarasinghe. Halide: A Language and Compiler for Optimizing Parallelism, Locality, and Recomputation in Image Processing Pipelines. ACM SIGPLAN Notices, 2013.

Digital Library

[17]

A. Sohrabizadeh, Y. Bai, Y. Sun, and J. Cong. Automated Accelerator Optimization Aided by Graph Neural Networks. Design Automation Conf. (DAC), 2022.

Digital Library

[18]

N. Srivastava, H. Rong, P. Barua, G. Feng, H. Cao, Z. Zhang, D. Albonesi, V. Sarkar, W. Chen, P. Petersen, et al. T2S-Tensor: Productively Generating High-performance Spatial Hardware for Dense Tensor Computations. IEEE Symp. on Field Programmable Custom Computing Machines (FCCM), 2019.

[19]

Q. Sun, T. Chen, S. Liu, J. Miao, J. Chen, H. Yu, and B. Yu. Correlated Multi-objective Multi-fidelity Optimization for HLS Directives Design. Design, Automation, and Test in Europe (DATE), 2021.

[20]

S. Xiang, Y. Lai, Y. Zhou, H. Chen, N. Zhang, D. Pal, and Z. Zhang. HeteroFlow: An Accelerator Programming Model with Decoupled Data Placement for Software-Defined FPGAs. Int'l Symp. on Field-Programmable Gate Arrays (FPGA), 2022.

Digital Library

[21]

K. Zhan, J. Guo, B. Song, W. Zhang, and Z. Bao. UltraNet: An FPGA-based Object Detection for the DAC-SDC 2020. https://github.com/heheda365/ultra_net. Accessed: April 18, 2022.

[22]

J. Zhao, S. Nagarakatte, M. M. Martin, and S. Zdancewic. Formalizing the LLVM Intermediate Representation for Verified Program Transformations. ACM SIGPLAN Symp. on Principles of Programming Languages (POPL), 2012.

[23]

L. Zheng, C. Jia, M. Sun, Z. Wu, C. H. Yu, A. Haj-Ali, Y. Wang, J. Yang, D. Zhuo, K. Sen, J. E. Gonzalez, and I. Stoica. Ansor: Generating High-Performance Tensor Programs for Deep Learning. OSDI, 2020.

[24]

S. Zheng, Y. Liang, S. Wang, R. Chen, and K. Sheng. FlexTensor: An Automatic Schedule Exploration and Optimization Framework for Tensor Computation on Heterogeneous System. Int'l Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2020.

[25]

Y. Zhou, U. Gupta, S. Dai, R. Zhao, N. Srivastava, H. Jin, J. Featherston, Y.-H. Lai, G. Liu, G. A. Velasquez, W. Wang, and Z. Zhang. Rosetta: A Realistic High-Level Synthesis Benchmark Suite for Software Programmable FPGAs. Int'l Symp. on Field-Programmable Gate Arrays (FPGA), 2018.

Cited By

Zhuang JXiang SChen HZhang NYang ZMao TZhang ZZhou PPutnam ALi J(2025)ARIES: An Agile MLIR-Based Compilation Flow for Reconfigurable Devices with AI EnginesProceedings of the 2025 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3706628.3708870(92-102)Online publication date: 27-Feb-2025
https://dl.acm.org/doi/10.1145/3706628.3708870
Gozzi GFiorito MCurzel SBarone CCastellana VMinutoli MTumeo AFerrandi F(2024)SPARTA: High-Level Synthesis of Parallel Multi-Threaded AcceleratorsACM Transactions on Reconfigurable Technology and Systems10.1145/367703518:1(1-30)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3677035
Chen HZhang NXiang SZeng ZDai MZhang Z(2024)Allo: A Programming Model for Composable Accelerator DesignProceedings of the ACM on Programming Languages10.1145/36564018:PLDI(593-620)Online publication date: 20-Jun-2024
https://dl.acm.org/doi/10.1145/3656401
Show More Cited By

Recommendations

Hardware Accelerator Design Based on Rough Set Philosophy
Rough Sets and Knowledge Technology
Abstract
This paper presents a design of hardware accelerator for algorithms of rough set theory. A hardware implementation of incremental reduct generation and rule induction is proposed in this paper. Incremental reduct generation algorithm is based on ...
DBHI: A Tool for Decoupled Functional Hardware-Software Co-Design on SoCs
FPGA '20: Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

This paper presents a system-level co-simulation and co-verification workflow to ease the transition from a software-only procedure, executed in a General Purpose processor, to the integration of a custom hardware accelerator developed in a Hardware ...
Elliptic Curve Cryptography hardware accelerator for high-performance secure servers

Security threats affecting electronics communications in the current world make necessary the encryption and authentication of every transaction. The increasing levels of security required are leading to an overload of transaction servers due to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

DAC '22: Proceedings of the 59th ACM/IEEE Design Automation Conference

July 2022

1462 pages

ISBN:9781450391429

DOI:10.1145/3489517

General Chair:
Rob Oshana
NXP

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGDA: ACM Special Interest Group on Design Automation
IEEE CEDA

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 August 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Conference

DAC '22

Sponsor:

SIGDA

DAC '22: 59th ACM/IEEE Design Automation Conference

July 10 - 14, 2022

California, San Francisco

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25

Sponsor:
sigda

62nd ACM/IEEE Design Automation Conference

June 22 - 26, 2025

San Francisco , CA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
277
Total Downloads

Downloads (Last 12 months)56
Downloads (Last 6 weeks)6

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhuang JXiang SChen HZhang NYang ZMao TZhang ZZhou PPutnam ALi J(2025)ARIES: An Agile MLIR-Based Compilation Flow for Reconfigurable Devices with AI EnginesProceedings of the 2025 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3706628.3708870(92-102)Online publication date: 27-Feb-2025
https://dl.acm.org/doi/10.1145/3706628.3708870
Gozzi GFiorito MCurzel SBarone CCastellana VMinutoli MTumeo AFerrandi F(2024)SPARTA: High-Level Synthesis of Parallel Multi-Threaded AcceleratorsACM Transactions on Reconfigurable Technology and Systems10.1145/367703518:1(1-30)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3677035
Chen HZhang NXiang SZeng ZDai MZhang Z(2024)Allo: A Programming Model for Composable Accelerator DesignProceedings of the ACM on Programming Languages10.1145/36564018:PLDI(593-620)Online publication date: 20-Jun-2024
https://dl.acm.org/doi/10.1145/3656401
Chen HZhang JDu YXiang SYue ZZhang NCai YZhang Z(2024)Understanding the Potential of FPGA-based Spatial Acceleration for Large Language Model InferenceACM Transactions on Reconfigurable Technology and Systems10.1145/365617718:1(1-29)Online publication date: 17-Dec-2024
https://dl.acm.org/doi/10.1145/3656177
Pouchet LTucker EZhang NChen HPal DRodríguez GZhang ZZhang ZPutnam A(2024)Formal Verification of Source-to-Source Transformations for HLSProceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3626202.3637563(97-107)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1145/3626202.3637563

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten