skip to main content
10.1145/3400302.3415645acmconferencesArticle/Chapter ViewAbstractPublication PagesiccadConference Proceedingsconference-collections
research-article

DeepBurning-GL: an automated framework for generating graph neural network accelerators

Published: 17 December 2020 Publication History

Abstract

Building FPGA-based graph learning accelerators is very time-consuming due to the low-level RTL programming and the complicated design flow of FPGA development. It also requires the architecture and hardware expertise from the Graph Neural Network (GNN) application developers to tailor efficient accelerator designs on FPGAs. This work proposes an automation framework, DeepBurning-GL, which is compatible with state-of-the-art graph learning frameworks such as Deep Graph Library so that the developers can easily generate application-specific GNN accelerators from the software-described models. First, DeepBurning-GL employs a GNN performance analyzer to locate the performance bottleneck of specific GNN applications and decide the major design architectures and parameters that meet the user-specified constraints. Second, DeepBurning-GL provides a series of pre-built design templates such as computing templates and memory templates, which can be parameterized and fused to generate the final accelerator design. It also includes an optimizer that conducts automatic optimization by adjusting the accelerator architectural parameters. In evaluation, we use DeepBurning-GL to generate customized accelerators on three different FPGA platforms for various GNN models and workloads. The experimental results show that the generated accelerators achieve 179.4X and 40.1X energy-efficiency boost over the CPU and GPU solutions on average and deliver a 6.28X speedup and 6.73X energy-efficiency improvement on average compared to the latest GNN accelerator HyGCN on Alveo U50.

References

[1]
Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. 2014. DianNao: A Small-Footprint High-Throughput Accelerator for Ubiquitous Machine-Learning. <u>SIGPLAN Not.</u> 49, 4 (Feb. 2014), 269--284.
[2]
Matthias Fey and Jan Eric Lenssen. 2019. Fast Graph Representation Learning with PyTorch Geometric. arXiv:1903.02428 [cs.LG]
[3]
Y. Guan, H. Liang, N. Xu, W. Wang, S. Shi, X. Chen, G. Sun, W. Zhang, and J. Cong. 2017. FP-DNN: An Automated Framework for Mapping Deep Neural Networks onto FPGAs with RTL-HLS Hybrid Templates. In <u>2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).</u> 152--159.
[4]
Tae Jun Ham, Lisa Wu, Narayanan Sundaram, Nadathur Satish, and Margaret Martonosi. 2016. Graphicionado: A High-Performance and Energy-Efficient Accelerator for Graph Analytics. In <u>The 49th Annual IEEE/ACM International Symposium on Microarchitecture</u> (Taipei, Taiwan) <u>(MICRO-49).</u> IEEE Press, Article 56, 13 pages.
[5]
Cong Hao, Xiaofan Zhang, Yuhong Li, Sitao Huang, Jinjun Xiong, Kyle Rupnow, Wen-mei Hwu, and Deming Chen. 2019. FPGA/DNN Co-Design: An Efficient Design Methodology for IoT Intelligence on the Edge. In <u>Proceedings of the 56th Annual Design Automation Conference 2019</u> (Las Vegas, NV, USA) <u>(DAC '19).</u> Association for Computing Machinery, New York, NY, USA, Article 206, 6 pages.
[6]
Xiao Huang, Qingquan Song, Yuening Li, and Xia Hu. 2019. Graph Recurrent Networks With Attributed Random Walks. In <u>Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery Data Mining</u> (Anchorage, AK, USA) <u>(KDD '19).</u> Association for Computing Machinery, New York, NY, USA, 732--740.
[7]
S. Liang, Y. Wang, C. Liu, L. He, H. LI, D. Xu, and X. Li. 2020. EnGN: A High-Throughput and Energy-Efficient Accelerator for Large Graph Neural Networks. <u>IEEE Trans. Comput.</u> (2020), 1--1.
[8]
Shengwen Liang, Ying Wang, Youyou Lu, Zhe Yang, Huawei Li, and Xiaowei Li. 2019. Cognitive SSD: A Deep Learning Engine for in-Storage Data Retrieval. In <u>Proceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference</u> (Renton, WA, USA) <u>(USENIX ATC '19).</u> USENIX Association, USA, 395--410.
[9]
Cheng Liu, Ho-Cheung Ng, and Hayden Kwok-Hay So. 2015. Automatic Nested Loop Acceleration on FPGAs Using Soft CGRA Overlay. <u>CoRR</u> abs/1509.00042 (2015). arXiv:1509.00042 http://arxiv.org/abs/1509.00042
[10]
C. Liu, H. Ng, and H. K. So. 2015. QuickDough: A rapid FPGA loop accelerator design framework using soft CGRA overlay. In <u>2015 International Conference on Field Programmable Technology (FPT).</u> 56--63.
[11]
Y. Ma, Y. Cao, S. Vrudhula, and J. Seo. 2017. An automatic RTL compiler for high-throughput FPGA implementation of diverse deep convolutional neural networks. In <u>2017 27th International Conference on Field Programmable Logic and Applications (FPL).</u> 1--8.
[12]
Y. Ma, Y. Cao, S. Vrudhula, and J. Seo. 2017. An automatic RTL compiler for high-throughput FPGA implementation of diverse deep convolutional neural networks. In <u>2017 27th International Conference on Field Programmable Logic and Applications (FPL).</u> 1--8.
[13]
Yao Ma, Ziyi Guo, Zhaocun Ren, Jiliang Tang, and Dawei Yin. 2020. Streaming Graph Neural Networks. In <u>Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval</u> (Virtual Event, China) <u>(SIGIR '20).</u> Association for Computing Machinery, New York, NY, USA, 719--728.
[14]
Namyong Park, Andrey Kan, Xin Luna Dong, Tong Zhao, and Christos Faloutsos. 2019. Estimating Node Importance in Knowledge Graphs Using Graph Neural Networks. In <u>Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery Data Mining</u> (Anchorage, AK, USA) <u>(KDD '19).</u> Association for Computing Machinery, New York, NY, USA, 596--606.
[15]
R. Pinkham, S. Zeng, and Z. Zhang. 2020. QuickNN: Memory and Performance Optimization of k-d Tree Based Nearest Neighbor Search for 3D Point Clouds. In <u>2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).</u> 180--192.
[16]
Jiezhong Qiu, Jian Tang, Hao Ma, Yuxiao Dong, Kuansan Wang, and Jie Tang. 2018. DeepInf: Social Influence Prediction with Deep Learning. In <u>Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery Data Mining</u> (London, United Kingdom) <u>(KDD '18).</u> Association for Computing Machinery, New York, NY, USA, 2110--2119.
[17]
Hardik Sharma, Jongse Park, Divya Mahajan, Emmanuel Amaro, Joon Kyung Kim, Chenkai Shao, Asit Mishra, and Hadi Esmaeilzadeh. 2016. From High-Level Deep Neural Models to FPGAs. In <u>The 49th Annual IEEE/ACM International Symposium on Microarchitecture</u> (Taipei, Taiwan) <u>(MICRO-49).</u> IEEE Press, Article 17, 12 pages.
[18]
Lili Song, Ying Wang, Yinhe Han, Xin Zhao, Bosheng Liu, and Xiaowei Li. 2016. C-Brain: A Deep Learning Accelerator That Tames the Diversity of CNNs through Adaptive Data-Level Parallelization. In <u>Proceedings of the 53rd Annual Design Automation Conference</u> (Austin, Texas) <u>(DAC '16).</u> Association for Computing Machinery, New York, NY, USA, Article 123, 6 pages.
[19]
S. I. Venieris and C. Bouganis. 2016. fpgaConvNet: A Framework for Mapping Convolutional Neural Networks on FPGAs. In <u>2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).</u> 40--47.
[20]
Minjie Wang, Lingfan Yu, Da Zheng, Quan Gan, Yu Gai, Zihao Ye, Mufei Li, Jinjing Zhou, Qi Huang, Chao Ma, Ziyue Huang, Qipeng Guo, Hao Zhang, Haibin Lin, Junbo Zhao, Jinyang Li, Alexander Smola, and Zheng Zhang. 2019. Deep Graph Library: Towards Efficient and Scalable Deep Learning on Graphs. arXiv:1909.01315 [cs.LG]
[21]
Xiao Wang, Houye Ji, Chuan Shi, Bai Wang, Yanfang Ye, Peng Cui, and Philip S Yu. 2019. Heterogeneous Graph Attention Network. In <u>The World Wide Web Conference</u> (San Francisco, CA, USA) <u>(WWW '19).</u> Association for Computing Machinery, New York, NY, USA, 2022--2032.
[22]
Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E. Sarma, Michael M. Bronstein, and Justin M. Solomon. 2019. Dynamic Graph CNN for Learning on Point Clouds. <u>ACM Trans. Graph.</u> 38, 5, Article 146 (Oct. 2019), 12 pages.
[23]
Ying Wang, Jie Xu, Yinhe Han, Huawei Li, and Xiaowei Li. 2016. DeepBurning: Automatic Generation of FPGA-Based Learning Accelerators for the Neural Network Family. In <u>Proceedings of the 53rd Annual Design Automation Conference</u> (Austin, Texas) <u>(DAC '16).</u> Association for Computing Machinery, New York, NY, USA, Article 110, 6 pages.
[24]
Pengfei Xu, Xiaofan Zhang, Cong Hao, Yang Zhao, Yongan Zhang, Yue Wang, Chaojian Li, Zetong Guan, Deming Chen, and Yingyan Lin. 2020. AutoDNNchip: An Automated DNN Chip Predictor and Builder for Both FPGAs and ASICs. In <u>The 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays</u> (Seaside, CA, USA) <u>(FPGA '20).</u> Association for Computing Machinery, New York, NY, USA, 40--50.
[25]
M. Yan, Z. Chen, L. Deng, X. Ye, Z. Zhang, D. Fan, and Y. Xie. 2020. Characterizing and Understanding GCNs on GPU. <u>IEEE Computer Architecture Letters</u> (2020), 1--1.
[26]
Mingyu Yan, Lei Deng, Xing Hu, Ling Liang, Yujing Feng, Xiaochun Ye, Zhimin Zhang, Dongrui Fan, and Yuan Xie. 2020. HyGCN: A GCN Accelerator with Hybrid Architecture. ArXiv abs/2001.02514 (2020).
[27]
Hongxia Yang. 2019. AliGraph: A Comprehensive Graph Neural Network Platform. In <u>Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery Data Mining</u> (Anchorage, AK, USA) <u>(KDD '19).</u> Association for Computing Machinery, New York, NY, USA, 3165--3166.
[28]
Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L. Hamilton, and Jure Leskovec. 2018. Graph Convolutional Neural Networks for Web-Scale Recommender Systems. In <u>Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery Data Mining</u> (London, United Kingdom) <u>(KDD '18).</u> Association for Computing Machinery, New York, NY, USA, 974--983.
[29]
Hanqing Zeng and Viktor Prasanna. 2019. GraphACT: Accelerating GCN Training on CPU-FPGA Heterogeneous Platforms. <u>ArXiv</u> abs/2001.02498 (2019).
[30]
Chen Zhang, Zhenman Fang, Peipei Zhou, Peichen Pan, and Jason Cong. 2016. Caffeine: Towards Uniformed Representation and Acceleration for Deep Convolutional Neural Networks. In <u>Proceedings of the 35th International Conference on Computer-Aided Design</u> (Austin, Texas) <u>(ICCAD '16).</u> Association for Computing Machinery, New York, NY, USA, Article 12, 8 pages.
[31]
Xiaofan Zhang, Junsong Wang, Chao Zhu, Yonghua Lin, Jinjun Xiong, Wen-mei Hwu, and Deming Chen. 2018. DNNBuilder: An Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs. In <u>Proceedings of the International Conference on Computer-Aided Design</u> (San Diego, California) <u>(ICCAD '18).</u> Association for Computing Machinery, New York, NY, USA, Article 56, 8 pages.

Cited By

View all
  • (2024)YOLOv7-BW: 基于遥感图像的密集小目标高效检测器智能机器人10.52810/JIR.2024.0041:1(39-54)Online publication date: 30-May-2024
  • (2024)机器学习模型在心血管疾病中的应用智能机器人10.52810/JIR.2024.0031:1(26-38)Online publication date: 7-May-2024
  • (2024)A Survey of Computationally Efficient Graph Neural Networks for Reconfigurable SystemsInformation10.3390/info1507037715:7(377)Online publication date: 28-Jun-2024
  • Show More Cited By

Index Terms

  1. DeepBurning-GL: an automated framework for generating graph neural network accelerators

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICCAD '20: Proceedings of the 39th International Conference on Computer-Aided Design
    November 2020
    1396 pages
    ISBN:9781450380263
    DOI:10.1145/3400302
    • General Chair:
    • Yuan Xie
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    In-Cooperation

    • IEEE CAS
    • IEEE CEDA
    • IEEE CS

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 December 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. FPGA
    2. automated framework
    3. graph neural networks

    Qualifiers

    • Research-article

    Funding Sources

    • Strategic Priority Research Program of Chinese Academy of Sciences
    • National Natural Science Foundation of China
    • YESS hip program
    • National Key Research and Development Program of China

    Conference

    ICCAD '20
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 457 of 1,762 submissions, 26%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)69
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 15 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)YOLOv7-BW: 基于遥感图像的密集小目标高效检测器智能机器人10.52810/JIR.2024.0041:1(39-54)Online publication date: 30-May-2024
    • (2024)机器学习模型在心血管疾病中的应用智能机器人10.52810/JIR.2024.0031:1(26-38)Online publication date: 7-May-2024
    • (2024)A Survey of Computationally Efficient Graph Neural Networks for Reconfigurable SystemsInformation10.3390/info1507037715:7(377)Online publication date: 28-Jun-2024
    • (2024)HLPerf: Demystifying the Performance of HLS-based Graph Neural Networks with Dataflow ArchitecturesACM Transactions on Reconfigurable Technology and Systems10.1145/365562718:1(1-26)Online publication date: 17-Dec-2024
    • (2024)A Dynamically Pipelined Dataflow Architecture for Graph Convolutions in Real-Time Event Interpretation2024 IEEE 37th International System-on-Chip Conference (SOCC)10.1109/SOCC62300.2024.10737798(1-6)Online publication date: 16-Sep-2024
    • (2023)GraphAGILE: An FPGA-Based Overlay Accelerator for Low-Latency GNN InferenceIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.328788334:9(2580-2597)Online publication date: Sep-2023
    • (2023)Network Pruning for Bit-Serial AcceleratorsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.320395542:5(1597-1609)Online publication date: May-2023
    • (2023)Dynasparse: Accelerating GNN Inference through Dynamic Sparsity Exploitation2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS54959.2023.00032(233-244)Online publication date: May-2023
    • (2023)GPT4AIGChip: Towards Next-Generation AI Accelerator Design Automation via Large Language Models2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD)10.1109/ICCAD57390.2023.10323953(1-9)Online publication date: 28-Oct-2023
    • (2023)Accelerating GNN-Based SAR Automatic Target Recognition on HBM-Enabled FPGA2023 IEEE High Performance Extreme Computing Conference (HPEC)10.1109/HPEC58863.2023.10363615(1-7)Online publication date: 25-Sep-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media