skip to main content
research-article

Graph-OPU: A Highly Flexible FPGA-Based Overlay Processor for Graph Neural Networks

Published: 18 November 2024 Publication History

Abstract

Field-programmable gate arrays (FPGAs) are an ideal candidate for accelerating graph neural networks (GNNs). However, the FPGA redeployment process is time-consuming when updating or switching between diverse GNN models across different applications. Existing GNN processors eliminate the need for FPGA redeployment when switching between different GNN models. However, adapting matrix multiplication types by switching processing units decreases hardware utilization. In addition, the bandwidth of DDR limits further improvements in hardware performance. This article proposes a highly flexible FPGA-based overlay processor for GNN accelerations. Graph-OPU provides excellent flexibility and programmability for users, as the executable code of GNN models is automatically compiled and reloaded without requiring FPGA redeployment. First, we customize the compiler and instruction sets for the inference process of different GNN models. Second, we customize the datapath and optimize the data format in the microarchitecture to fully leverage the advantages of high bandwidth memory (HBM). Third, we design a unified matrix multiplication to handle both sparse-dense matrix multiplication (SpMM) and general matrix multiplication (GEMM), enhancing Graph-OPU performance. During Graph-OPU execution, the computational units are shared between SpMM and GEMM instead of being switched, which improves the hardware utilization. Finally, we implement a hardware prototype on the Xilinx Alveo U50 and test the mainstream GNN models using various datasets. Experimental results show that Graph-OPU achieves up to 1,654\(\times\) and 63\(\times\) speedup, as well as up to 5,305\(\times\) and 422\(\times\) energy efficiency boosts, compared to implementations on CPU and GPU, respectively. Graph-OPU outperforms state-of-the-art (SOTA) end-to-end overlay accelerators for GNN, reducing latency by an average of 1.36\(\times\) and improving energy efficiency by 1.41\(\times\) on average. Moreover, Graph-OPU exhibits an average 1.45\(\times\) speed improvement in end-to-end latency over the SOTA GNN processor. Graph-OPU represents an in-depth study of an FPGA-based overlay processor for GNNs, offering high flexibility, speedup, and energy efficiency.

References

[1]
Sergi Abadal, Akshay Jain, Robert Guirado, Jorge López-Alonso, and Eduard Alarcón. 2021. Computing graph neural networks: A survey from algorithms to accelerators. ACM Computing Surveys (CSUR) 54, 9 (2021), 1–38.
[2]
Kenneth Atz, Francesca Grisoni, and Gisbert Schneider. 2021. Geometric deep learning on molecular representations. Nature Machine Intelligence 3, 12 (Dec. 2021), 1023–1032. DOI:
[3]
Yueyin Bai, Hao Zhou, Keqing Zhao, Jianli Chen, Jun Yu, and Kun Wang. 2023. Transformer-OPU: An FPGA-based overlay processor for transformer networks. In Proceedings of the IEEE 31st Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM ’23), 221–221. DOI:
[4]
Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann Lecun. 2014. Spectral networks and locally connected networks on graphs. In Proceedings of the International Conference on Learning Representations (ICLR2014 ’14), CBLS, April 2014.
[5]
Ruiqi Chen, Haoyang Zhang, Shun Li, Enhao Tang, Jun Yu, and Kun Wang. 2023. Graph-OPU: A highly integrated FPGA-based overlay processor for graph neural networks. In Proceedings of the 33rd International Conference on Field-Programmable Logic and Applications (FPL ’23), 228–234. DOI:
[6]
Xinyu Chen, Yao Chen, Feng Cheng, Hongshi Tan, Bingsheng He, and Weng-Fai Wong. 2022. ReGraph: Scaling graph processing on HBM-enabled FPGAs with heterogeneous pipelines. In Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture (MICRO ’22), 1342–1358. DOI:
[7]
Nachiket Deo, Eric Wolff, and Oscar Beijbom. 2022. Multimodal trajectory prediction conditioned on lane-graph traversals. In Proceedings of the 5th Conference on Robot Learning, PMLR 164 (2022), 203–212. Retrieved from https://proceedings.mlr.press/v164/deo22a.html
[8]
Matthias Fey and Jan E. Lenssen. 2019. Fast graph representation Learning with PyTorch geometric. In Proceedings of the ICLR Workshop on Representation Learning on Graphs and Manifolds.
[9]
Tong Geng, Ang Li, Runbin Shi, Chunshu Wu, Tianqi Wang, Yanfei Li, Pouya Haghi, Antonino Tumeo, Shuai Che, Steve Reinhardt, and Martin C. Herbordt. 2020. AWB-GCN: A graph convolutional network accelerator with runtime workload rebalancing. In Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO ’20), 922–936. DOI:
[10]
Tong Geng, Chunshu Wu, Yongan Zhang, Cheng Tan, Chenhao Xie, Haoran You, Martin Herbordt, Yingyan Lin, and Ang Li. 2021. I-GCN: A graph convolutional network accelerator with runtime locality enhancement through islandization. In Proceedings of the MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO ’21). ACM, 1051–1063. DOI:
[11]
GroqInc. 2020. The Challenge of Batch Size 1: Groq Adds Responsiveness to Inference Performance. Whitepaper. Retrieved from www.groq.com
[12]
Kaiyuan Guo, Shulin Zeng, Jincheng Yu, Yu Wang, and Huazhong Yang. 2019. [DL] A Survey of FPGA-based neural network inference accelerators. ACM Transactions on Reconfigurable Technology and Systems 12, 1 (2019), 2:1–2:26. DOI:
[13]
William L. Hamilton, Rex Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17). Curran Associates Inc., Red Hook, NY, 1025–1035.
[14]
Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, YongDong Zhang, and Meng Wang. 2020. LightGCN: Simplifying and powering graph convolution network for recommendation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’20). ACM, 639–648. DOI:
[15]
Philipp Holzinger, Daniel Reiser, Tobias Hahn, and Marc Reichenbach. 2021. Fast HBM access with FPGAs: Analysis, architectures, and applications. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW ’21), 152–159. DOI:
[16]
Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. 2020. Open graph benchmark: Datasets for machine learning on graphs. In Proceedings of the Advances in Neural Information Processing Systems, Vol. 33. Curran Associates, Inc., 22118–22133. Retrieved from https://papers.neurips.cc/paper_files/paper/2020/hash/fb60d411a5c5b72b2e7d3527cfc84fd0-Abstract.html
[17]
Thomas N. Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. In Proceedings of the International Conference on Learning Representations. Retrieved from https://openreview.net/forum?id=SJU4ayYgl
[18]
Shengwen Liang, Cheng Liu, Ying Wang, Huawei Li, and Xiaowei Li. 2020. DeepBurning-GL: An automated framework for generating graph neural network accelerators. In Proceedings of the IEEE/ACM International Conference On Computer Aided Design (ICCAD ’20), 1–9.
[19]
Yi Chien Lin, Bingyi Zhang, and Viktor Prasanna. 2021. GCN inference acceleration using high-level synthesis. In Proceedings of the IEEE High Performance Extreme Computing Conference (HPEC ’21), 1–6. DOI:
[20]
Huilin Qu and Loukas Gouskos. 2020. Jet tagging via particle clouds. Physical Review D 101, 5 (Mar. 2020), 056019. DOI:
[21]
Rishov Sarkar, Stefan Abi-Karam, Yuqi He, Lakshmi Sathidevi, and Cong Hao. 2023. FlowGNN: A dataflow architecture for real-time workload-agnostic graph neural network inference. In Proceedings of the IEEE International Symposium on High-Performance Computer Architecture (HPCA ’23), 1099–1112. DOI:
[22]
Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. 2009. The graph neural network model. IEEE Transactions on Neural Networks 20, 1 (Jan. 2009), 61–80. DOI:
[23]
Ghassan Shobaki and Kent Wilken. 2004. Optimal superblock scheduling using enumeration. In Proceedings of the 37th International Symposium on Microarchitecture (MICRO-37 ’04), 283–293. DOI:
[24]
Bor-Yiing Su and Kurt Keutzer. 2012. ClSpMV: A cross-platform OpenCL SpMV framework on GPUs. In Proceedings of the 26th ACM International Conference on Supercomputing (ICS ’12). ACM, New York, NY, 353–364. DOI:
[25]
Zhuofu Tao, Chen Wu, Yuan Liang, Kun Wang, and Lei He. 2022. LW-GCN: A lightweight FPGA-based graph convolutional network accelerator. ACM Transactions on Reconfigurable Technology and Systems 16, 1, Article 10 (Dec. 2022), 19 pages. DOI:
[26]
Teng Tian, Letian Zhao, Xiaotian Wang, Qizhe Wu, Wei Yuan, and Xi Jin. 2022. FP-GNN: Adaptive FPGA accelerator for graph neural networks. Future Generation Computer Systems 136, C (Nov. 2022), 294–310. DOI:
[27]
Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph attention networks. In Proceedings of the International Conference on Learning Representations. Retrieved from https://openreview.net/forum?id=rJXMpikCZ
[28]
Yingheng Wang, Yaosen Min, Xin Chen, and Ji Wu. 2021. Multi-view graph contrastive representation learning for drug-drug interaction prediction. In Proceedings of the Web Conference 2021 (WWW ’21). ACM, 2921–2933. DOI:
[29]
Shaopeng Wei, Yu Zhao, Xingyan Chen, Qing Li, Fuzhen Zhuang, Ji Liu, Fuji Ren, and Gang Kou. 2023. Graph learning and its advancements on large language models: A holistic survey. arXiv:2212.08966 [cs.AI]. Retrieved from https://arxiv.org/abs/2212.08966
[30]
Samuel Williams, Leonid Oliker, Richard Vuduc, John Shalf, Katherine Yelick, and James Demmel. 2007. Optimization of sparse matrix-vector multiplication on emerging multicore platforms. In Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC ’07), 1–12. DOI:
[31]
Chen Wu, Zhuofu Tao, Kun Wang, and Lei He. 2022. SkeletonGCN: A simple yet effective accelerator for GCN training. In Proceedings of the 32nd International Conference on Field-Programmable Logic and Applications (FPL ’22), 445–451. DOI:
[32]
Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and Philip S. Yu. 2021. A comprehensive survey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems 32, 1 (Jan. 2021), 4–24. DOI:
[33]
Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. 2019. How powerful are graph neural networks?. In Proceedings of the International Conference on Learning Representations. Retrieved from https://openreview.net/forum?id=ryGs6iA5Km
[34]
Zhilin Xu, Jincheng Yu, Chao Yu, Hao Shen, Yu Wang, and Huazhong Yang. 2020. CNN-based feature-point extraction for real-time visual SLAM on embedded FPGA. In Proceedings of the IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM ’20), 33–37. DOI:
[35]
Mingyu Yan, Zhaodong Chen, Lei Deng, Xiaochun Ye, Zhimin Zhang, Dongrui Fan, and Yuan Xie. 2020. Characterizing and understanding GCNs on GPU. IEEE Computer Architecture Letters 19, 1 (Jan. 2020), 22–25. DOI:
[36]
Mingyu Yan, Lei Deng, Xing Hu, Ling Liang, Yujing Feng, Xiaochun Ye, Zhimin Zhang, Dongrui Fan, and Yuan Xie. 2020. HyGCN: A GCN accelerator with hybrid architecture. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA ’20), 15–29. DOI:
[37]
Yunxuan Yu, Chen Wu, Tiandong Zhao, Kun Wang, and Lei He. 2020. OPU: An FPGA-based overlay processor for convolutional neural networks. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 28, 1 (Jan. 2020), 35–47. DOI:
[38]
Hanqing Zeng, Hongkuan Zhou, Ajitesh Srivastava, Rajgopal Kannan, and Viktor Prasanna. 2020. GraphSAINT: Graph sampling based inductive learning method. In Proceedings of the International Conference on Learning Representations. Retrieved from https://openreview.net/forum?id=BJe8pkHFwS
[39]
Bingyi Zhang, Rajgopal Kannan, and Viktor Prasanna. 2021. BoostGCN: A framework for optimizing GCN inference on FPGA. In Proceedings of the IEEE 29th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM ’21), 29–39. DOI:
[40]
Bingyi Zhang, Hanqing Zeng, and Viktor Prasanna. 2020. Hardware acceleration of large scale GCN inference. In Proceedings of the IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP ’20), 61–68. DOI:
[41]
Bingyi Zhang, Hanqing Zeng, and Viktor K. Prasanna. 2023. GraphAGILE: An FPGA-based overlay accelerator for low-latency GNN inference. IEEE Transactions on Parallel and Distributed Systems 34, 9 (Sept. 2023), 2580–2597. DOI:

Index Terms

  1. Graph-OPU: A Highly Flexible FPGA-Based Overlay Processor for Graph Neural Networks

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Reconfigurable Technology and Systems
    ACM Transactions on Reconfigurable Technology and Systems  Volume 17, Issue 4
    December 2024
    303 pages
    EISSN:1936-7414
    DOI:10.1145/3613637
    • Editor:
    • Deming Chen
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 November 2024
    Online AM: 02 September 2024
    Accepted: 19 August 2024
    Revised: 20 June 2024
    Received: 20 September 2023
    Published in TRETS Volume 17, Issue 4

    Check for updates

    Author Tags

    1. Graph Neural Networks
    2. Custom Processor
    3. Hardware Accelerator

    Qualifiers

    • Research-article

    Funding Sources

    • National Key Research and Development Program of China
    • Shanghai Pujiang Program
    • CFFF platform of Fudan University

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 381
      Total Downloads
    • Downloads (Last 12 months)381
    • Downloads (Last 6 weeks)36
    Reflects downloads up to 03 Mar 2025

    Other Metrics

    Citations

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media