research-article

DeepBurning-GL: an automated framework for generating graph neural network accelerators

Authors:

Shengwen Liang,

Xiaowei LiAuthors Info & Claims

ICCAD '20: Proceedings of the 39th International Conference on Computer-Aided Design

Article No.: 72, Pages 1 - 9

https://doi.org/10.1145/3400302.3415645

Published: 17 December 2020 Publication History

Abstract

Building FPGA-based graph learning accelerators is very time-consuming due to the low-level RTL programming and the complicated design flow of FPGA development. It also requires the architecture and hardware expertise from the Graph Neural Network (GNN) application developers to tailor efficient accelerator designs on FPGAs. This work proposes an automation framework, DeepBurning-GL, which is compatible with state-of-the-art graph learning frameworks such as Deep Graph Library so that the developers can easily generate application-specific GNN accelerators from the software-described models. First, DeepBurning-GL employs a GNN performance analyzer to locate the performance bottleneck of specific GNN applications and decide the major design architectures and parameters that meet the user-specified constraints. Second, DeepBurning-GL provides a series of pre-built design templates such as computing templates and memory templates, which can be parameterized and fused to generate the final accelerator design. It also includes an optimizer that conducts automatic optimization by adjusting the accelerator architectural parameters. In evaluation, we use DeepBurning-GL to generate customized accelerators on three different FPGA platforms for various GNN models and workloads. The experimental results show that the generated accelerators achieve 179.4X and 40.1X energy-efficiency boost over the CPU and GPU solutions on average and deliver a 6.28X speedup and 6.73X energy-efficiency improvement on average compared to the latest GNN accelerator HyGCN on Alveo U50.

References

[1]

Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. 2014. DianNao: A Small-Footprint High-Throughput Accelerator for Ubiquitous Machine-Learning. <u>SIGPLAN Not.</u> 49, 4 (Feb. 2014), 269--284.

Digital Library

[2]

Matthias Fey and Jan Eric Lenssen. 2019. Fast Graph Representation Learning with PyTorch Geometric. arXiv:1903.02428 [cs.LG]

[3]

Y. Guan, H. Liang, N. Xu, W. Wang, S. Shi, X. Chen, G. Sun, W. Zhang, and J. Cong. 2017. FP-DNN: An Automated Framework for Mapping Deep Neural Networks onto FPGAs with RTL-HLS Hybrid Templates. In <u>2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).</u> 152--159.

[4]

Tae Jun Ham, Lisa Wu, Narayanan Sundaram, Nadathur Satish, and Margaret Martonosi. 2016. Graphicionado: A High-Performance and Energy-Efficient Accelerator for Graph Analytics. In <u>The 49th Annual IEEE/ACM International Symposium on Microarchitecture</u> (Taipei, Taiwan) <u>(MICRO-49).</u> IEEE Press, Article 56, 13 pages.

[5]

Cong Hao, Xiaofan Zhang, Yuhong Li, Sitao Huang, Jinjun Xiong, Kyle Rupnow, Wen-mei Hwu, and Deming Chen. 2019. FPGA/DNN Co-Design: An Efficient Design Methodology for IoT Intelligence on the Edge. In <u>Proceedings of the 56th Annual Design Automation Conference 2019</u> (Las Vegas, NV, USA) <u>(DAC '19).</u> Association for Computing Machinery, New York, NY, USA, Article 206, 6 pages.

Digital Library

[6]

Xiao Huang, Qingquan Song, Yuening Li, and Xia Hu. 2019. Graph Recurrent Networks With Attributed Random Walks. In <u>Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery Data Mining</u> (Anchorage, AK, USA) <u>(KDD '19).</u> Association for Computing Machinery, New York, NY, USA, 732--740.

Digital Library

[7]

S. Liang, Y. Wang, C. Liu, L. He, H. LI, D. Xu, and X. Li. 2020. EnGN: A High-Throughput and Energy-Efficient Accelerator for Large Graph Neural Networks. <u>IEEE Trans. Comput.</u> (2020), 1--1.

[8]

Shengwen Liang, Ying Wang, Youyou Lu, Zhe Yang, Huawei Li, and Xiaowei Li. 2019. Cognitive SSD: A Deep Learning Engine for in-Storage Data Retrieval. In <u>Proceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference</u> (Renton, WA, USA) <u>(USENIX ATC '19).</u> USENIX Association, USA, 395--410.

[9]

Cheng Liu, Ho-Cheung Ng, and Hayden Kwok-Hay So. 2015. Automatic Nested Loop Acceleration on FPGAs Using Soft CGRA Overlay. <u>CoRR</u> abs/1509.00042 (2015). arXiv:1509.00042 http://arxiv.org/abs/1509.00042

[10]

C. Liu, H. Ng, and H. K. So. 2015. QuickDough: A rapid FPGA loop accelerator design framework using soft CGRA overlay. In <u>2015 International Conference on Field Programmable Technology (FPT).</u> 56--63.

[11]

Y. Ma, Y. Cao, S. Vrudhula, and J. Seo. 2017. An automatic RTL compiler for high-throughput FPGA implementation of diverse deep convolutional neural networks. In <u>2017 27th International Conference on Field Programmable Logic and Applications (FPL).</u> 1--8.

[12]

Y. Ma, Y. Cao, S. Vrudhula, and J. Seo. 2017. An automatic RTL compiler for high-throughput FPGA implementation of diverse deep convolutional neural networks. In <u>2017 27th International Conference on Field Programmable Logic and Applications (FPL).</u> 1--8.

[13]

Yao Ma, Ziyi Guo, Zhaocun Ren, Jiliang Tang, and Dawei Yin. 2020. Streaming Graph Neural Networks. In <u>Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval</u> (Virtual Event, China) <u>(SIGIR '20).</u> Association for Computing Machinery, New York, NY, USA, 719--728.

Digital Library

[14]

Namyong Park, Andrey Kan, Xin Luna Dong, Tong Zhao, and Christos Faloutsos. 2019. Estimating Node Importance in Knowledge Graphs Using Graph Neural Networks. In <u>Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery Data Mining</u> (Anchorage, AK, USA) <u>(KDD '19).</u> Association for Computing Machinery, New York, NY, USA, 596--606.

Digital Library

[15]

R. Pinkham, S. Zeng, and Z. Zhang. 2020. QuickNN: Memory and Performance Optimization of k-d Tree Based Nearest Neighbor Search for 3D Point Clouds. In <u>2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).</u> 180--192.

[16]

Jiezhong Qiu, Jian Tang, Hao Ma, Yuxiao Dong, Kuansan Wang, and Jie Tang. 2018. DeepInf: Social Influence Prediction with Deep Learning. In <u>Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery Data Mining</u> (London, United Kingdom) <u>(KDD '18).</u> Association for Computing Machinery, New York, NY, USA, 2110--2119.

Digital Library

[17]

Hardik Sharma, Jongse Park, Divya Mahajan, Emmanuel Amaro, Joon Kyung Kim, Chenkai Shao, Asit Mishra, and Hadi Esmaeilzadeh. 2016. From High-Level Deep Neural Models to FPGAs. In <u>The 49th Annual IEEE/ACM International Symposium on Microarchitecture</u> (Taipei, Taiwan) <u>(MICRO-49).</u> IEEE Press, Article 17, 12 pages.

[18]

Lili Song, Ying Wang, Yinhe Han, Xin Zhao, Bosheng Liu, and Xiaowei Li. 2016. C-Brain: A Deep Learning Accelerator That Tames the Diversity of CNNs through Adaptive Data-Level Parallelization. In <u>Proceedings of the 53rd Annual Design Automation Conference</u> (Austin, Texas) <u>(DAC '16).</u> Association for Computing Machinery, New York, NY, USA, Article 123, 6 pages.

Digital Library

[19]

S. I. Venieris and C. Bouganis. 2016. fpgaConvNet: A Framework for Mapping Convolutional Neural Networks on FPGAs. In <u>2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).</u> 40--47.

[20]

Minjie Wang, Lingfan Yu, Da Zheng, Quan Gan, Yu Gai, Zihao Ye, Mufei Li, Jinjing Zhou, Qi Huang, Chao Ma, Ziyue Huang, Qipeng Guo, Hao Zhang, Haibin Lin, Junbo Zhao, Jinyang Li, Alexander Smola, and Zheng Zhang. 2019. Deep Graph Library: Towards Efficient and Scalable Deep Learning on Graphs. arXiv:1909.01315 [cs.LG]

[21]

Xiao Wang, Houye Ji, Chuan Shi, Bai Wang, Yanfang Ye, Peng Cui, and Philip S Yu. 2019. Heterogeneous Graph Attention Network. In <u>The World Wide Web Conference</u> (San Francisco, CA, USA) <u>(WWW '19).</u> Association for Computing Machinery, New York, NY, USA, 2022--2032.

Digital Library

[22]

Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E. Sarma, Michael M. Bronstein, and Justin M. Solomon. 2019. Dynamic Graph CNN for Learning on Point Clouds. <u>ACM Trans. Graph.</u> 38, 5, Article 146 (Oct. 2019), 12 pages.

Digital Library

[23]

Ying Wang, Jie Xu, Yinhe Han, Huawei Li, and Xiaowei Li. 2016. DeepBurning: Automatic Generation of FPGA-Based Learning Accelerators for the Neural Network Family. In <u>Proceedings of the 53rd Annual Design Automation Conference</u> (Austin, Texas) <u>(DAC '16).</u> Association for Computing Machinery, New York, NY, USA, Article 110, 6 pages.

Digital Library

[24]

Pengfei Xu, Xiaofan Zhang, Cong Hao, Yang Zhao, Yongan Zhang, Yue Wang, Chaojian Li, Zetong Guan, Deming Chen, and Yingyan Lin. 2020. AutoDNNchip: An Automated DNN Chip Predictor and Builder for Both FPGAs and ASICs. In <u>The 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays</u> (Seaside, CA, USA) <u>(FPGA '20).</u> Association for Computing Machinery, New York, NY, USA, 40--50.

Digital Library

[25]

M. Yan, Z. Chen, L. Deng, X. Ye, Z. Zhang, D. Fan, and Y. Xie. 2020. Characterizing and Understanding GCNs on GPU. <u>IEEE Computer Architecture Letters</u> (2020), 1--1.

[26]

Mingyu Yan, Lei Deng, Xing Hu, Ling Liang, Yujing Feng, Xiaochun Ye, Zhimin Zhang, Dongrui Fan, and Yuan Xie. 2020. HyGCN: A GCN Accelerator with Hybrid Architecture. ArXiv abs/2001.02514 (2020).

[27]

Hongxia Yang. 2019. AliGraph: A Comprehensive Graph Neural Network Platform. In <u>Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery Data Mining</u> (Anchorage, AK, USA) <u>(KDD '19).</u> Association for Computing Machinery, New York, NY, USA, 3165--3166.

Digital Library

[28]

Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L. Hamilton, and Jure Leskovec. 2018. Graph Convolutional Neural Networks for Web-Scale Recommender Systems. In <u>Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery Data Mining</u> (London, United Kingdom) <u>(KDD '18).</u> Association for Computing Machinery, New York, NY, USA, 974--983.

Digital Library

[29]

Hanqing Zeng and Viktor Prasanna. 2019. GraphACT: Accelerating GCN Training on CPU-FPGA Heterogeneous Platforms. <u>ArXiv</u> abs/2001.02498 (2019).

[30]

Chen Zhang, Zhenman Fang, Peipei Zhou, Peichen Pan, and Jason Cong. 2016. Caffeine: Towards Uniformed Representation and Acceleration for Deep Convolutional Neural Networks. In <u>Proceedings of the 35th International Conference on Computer-Aided Design</u> (Austin, Texas) <u>(ICCAD '16).</u> Association for Computing Machinery, New York, NY, USA, Article 12, 8 pages.

Digital Library

[31]

Xiaofan Zhang, Junsong Wang, Chao Zhu, Yonghua Lin, Jinjun Xiong, Wen-mei Hwu, and Deming Chen. 2018. DNNBuilder: An Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs. In <u>Proceedings of the International Conference on Computer-Aided Design</u> (San Diego, California) <u>(ICCAD '18).</u> Association for Computing Machinery, New York, NY, USA, Article 56, 8 pages.

Digital Library

Cited By

葛旭金学马慧邹天(2024)YOLOv7-BW: 基于遥感图像的密集小目标高效检测器智能机器人10.52810/JIR.2024.0041:1(39-54)Online publication date: 30-May-2024
https://doi.org/10.52810/JIR.2024.004
蒋子(2024)机器学习模型在心血管疾病中的应用智能机器人10.52810/JIR.2024.0031:1(26-38)Online publication date: 7-May-2024
https://doi.org/10.52810/JIR.2024.003
Kose HNunez-Yanez JPiechocki RPope J(2024)A Survey of Computationally Efficient Graph Neural Networks for Reconfigurable SystemsInformation10.3390/info1507037715:7(377)Online publication date: 28-Jun-2024
https://doi.org/10.3390/info15070377
Show More Cited By

Index Terms

DeepBurning-GL: an automated framework for generating graph neural network accelerators
1. Hardware
  1. Integrated circuits
    1. Reconfigurable logic and FPGAs

Recommendations

Nuclear Reactor Simulations on OpenCL FPGA Platform
FPGA '19: Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Field-programmable gate arrays (FPGAs) are becoming a promising choice as a heterogeneous computing component for scientific computing when floating-point optimized architectures are added to the current FPGAs. The maturing high-level synthesis (HLS) ...
DeepBurning: automatic generation of FPGA-based learning accelerators for the neural network family
DAC '16: Proceedings of the 53rd Annual Design Automation Conference

Recent advances in Neural Networks (NN) are enabling more and more innovative applications. As an energy-efficient hardware solution, machine learning accelerators for CNNs or traditional ANNs are also gaining popularity in the area of embedded vision, ...
Base64 Encoding on OpenCL FPGA Platform
FPGA '19: Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Base64 encoding has many applications on the Web. Previous studies are focused on improving the efficiency of Base64 encoding on central processing units (CPUs). As field-programmable gate arrays (FPGAs) are becoming promising heterogeneous computing ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICCAD '20: Proceedings of the 39th International Conference on Computer-Aided Design

November 2020

1396 pages

ISBN:9781450380263

DOI:10.1145/3400302

General Chair:
Yuan Xie
Univ. of California, Santa Barbara, CA

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGDA: ACM Special Interest Group on Design Automation

In-Cooperation

IEEE CAS
IEEE CEDA
IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 December 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Strategic Priority Research Program of Chinese Academy of Sciences
National Natural Science Foundation of China
YESS hip program
National Key Research and Development Program of China

Conference

ICCAD '20

Sponsor:

SIGDA

ICCAD '20: IEEE/ACM International Conference on Computer-Aided Design

November 2 - 5, 2020

Virtual Event, USA

Acceptance Rates

Overall Acceptance Rate 457 of 1,762 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

22
Total Citations
View Citations
369
Total Downloads

Downloads (Last 12 months)69
Downloads (Last 6 weeks)1

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

葛旭金学马慧邹天(2024)YOLOv7-BW: 基于遥感图像的密集小目标高效检测器智能机器人10.52810/JIR.2024.0041:1(39-54)Online publication date: 30-May-2024
https://doi.org/10.52810/JIR.2024.004
蒋子(2024)机器学习模型在心血管疾病中的应用智能机器人10.52810/JIR.2024.0031:1(26-38)Online publication date: 7-May-2024
https://doi.org/10.52810/JIR.2024.003
Kose HNunez-Yanez JPiechocki RPope J(2024)A Survey of Computationally Efficient Graph Neural Networks for Reconfigurable SystemsInformation10.3390/info1507037715:7(377)Online publication date: 28-Jun-2024
https://doi.org/10.3390/info15070377
Zhao CFaber CChamberlain RZhang X(2024)HLPerf: Demystifying the Performance of HLS-based Graph Neural Networks with Dataflow ArchitecturesACM Transactions on Reconfigurable Technology and Systems10.1145/365562718:1(1-26)Online publication date: 17-Dec-2024
https://dl.acm.org/doi/10.1145/3655627
Neu MKarle CSchmidt PHöfer JHarbaum TBecker J(2024)A Dynamically Pipelined Dataflow Architecture for Graph Convolutions in Real-Time Event Interpretation2024 IEEE 37th International System-on-Chip Conference (SOCC)10.1109/SOCC62300.2024.10737798(1-6)Online publication date: 16-Sep-2024
https://doi.org/10.1109/SOCC62300.2024.10737798
Zhang BZeng HPrasanna V(2023)GraphAGILE: An FPGA-Based Overlay Accelerator for Low-Latency GNN InferenceIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.328788334:9(2580-2597)Online publication date: Sep-2023
https://doi.org/10.1109/TPDS.2023.3287883
Zhao XWang YLiu CShi CTu KZhang L(2023)Network Pruning for Bit-Serial AcceleratorsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.320395542:5(1597-1609)Online publication date: May-2023
https://doi.org/10.1109/TCAD.2022.3203955
Zhang BPrasanna V(2023)Dynasparse: Accelerating GNN Inference through Dynamic Sparsity Exploitation2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS54959.2023.00032(233-244)Online publication date: May-2023
https://doi.org/10.1109/IPDPS54959.2023.00032
Fu YZhang YYu ZLi SYe ZLi CWan CLin Y(2023)GPT4AIGChip: Towards Next-Generation AI Accelerator Design Automation via Large Language Models2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD)10.1109/ICCAD57390.2023.10323953(1-9)Online publication date: 28-Oct-2023
https://doi.org/10.1109/ICCAD57390.2023.10323953
Zhang BKannan RPrasanna VBusart C(2023)Accelerating GNN-Based SAR Automatic Target Recognition on HBM-Enabled FPGA2023 IEEE High Performance Extreme Computing Conference (HPEC)10.1109/HPEC58863.2023.10363615(1-7)Online publication date: 25-Sep-2023
https://doi.org/10.1109/HPEC58863.2023.10363615
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten