skip to main content
10.1145/3470496.3527439acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

Hyperscale FPGA-as-a-service architecture for large-scale distributed graph neural network

Published: 11 June 2022 Publication History

Abstract

Graph neural network (GNN) is a promising emerging application for link prediction, recommendation, etc. Existing hardware innovation is limited to single-machine GNN (SM-GNN), however, the enterprises usually adopt huge graph with large-scale distributed GNN (LSD-GNN) that has to be carried out with distributed in-memory storage. The LSD-GNN is very different from SM-GNN in terms of system architecture demand, workflow and operators, and hence characterizations.
In this paper, we first quantitively characterize the LSD-GNN with industrial-grade framework and application, summarize that its challenges lie in graph sampling, including distributed graph access, long latency, and underutilized communication and memory bandwidth. These challenges are missing from previous SM-GNN targeted researches. We then propose a customized hardware architecture to solve the challenges, including a fully pipelined access engine architecture for graph access and sampling, a low-latency and bandwidth-efficient customized memory-over-fabric hardware, and a RISC-V centric control system providing good programma-bility. We implement the proposed architecture with full software support in a 4-card FPGA heterogeneous proof-of-concept (PoC) system. Based on the measurement result from the FPGA PoC, we demonstrate a single FPGA can provide up to 894 vCPU's sampling capability. With the goal of being profitable, programmable, and scalable, we further integrate the architecture to FPGA cloud (FaaS) at hyperscale, along with the industrial software framework. We explicitly explore eight FaaS architectures that carry out the proposed accelerator hardware. We finally conclude that off-the-shelf FaaS.base can already provide 2.47× performance per dollar improvement with our hardware. With architecture optimizations, FaaS.comm-opt with customized FPGA fabrics pushes the benefit to 7.78×, and FaaS.mem-opt with FPGA local DRAM and high-speed links to GPU further unleash the benefit to 12.58×.

References

[1]
Sergi Abadal, Akshay Jain, Robert Guirado, Jorge López-Alonso, and Eduard Alarcón. 2020. Computing graph neural networks: A survey from algorithms to accelerators. arXiv preprint arXiv:2010.00130 (2020).
[2]
AliCloud F3 2022. Alibaba Cloud Elastic Compute Service, Compute optimized type family with FPGA. https://www.alibabacloud.com/help/doc-detail/108504.html
[3]
AliCloud MoC 2022. Alibaba Cloud X-Dragon NiC. https://www.alibabacloud.com/blog/introducing-the-sixth-generation-of-alibaba-clouds-elastic-compute-service_595716
[4]
AliCloud PAI 2022. Alibaba Cloud's Machine Learning Platform for AI (PAI). https://www.alibabacloud.com/product/machine-learning
[5]
AliCloud Price 2022. Alibaba Cloud Elastic Compute Service, Price Calculator. https://www.alibabacloud.com/pricing-calculator
[6]
Adam Auten, Matthew Tomei, and Rakesh Kumar. 2020. Hardware acceleration of graph neural networks. In Proceedings of the 2020 57th ACM/IEEE Design Automation Conference (DAC). IEEE, 1--6.
[7]
AWS EC2 2022. Amazon EC2 F1 Instances. https://aws.amazon.com/ec2/instance-types/f1/
[8]
AWS Nitro 2022. AWS Nitro System. https://aws.amazon.com/ec2/nitro/
[9]
Azure FPGA 2022. Microsoft Azure FPGA Inference. https://docs.microsoft.com/en-us/azure/machine-learning/how-to-deploy-fpga-web-service
[10]
Abanti Basak, Shuangchen Li, Xing Hu, Sang Min Oh, Xinfeng Xie, Li Zhao, Xiaowei Jiang, and Yuan Xie. 2019. Analysis and optimization of the memory hierarchy for graph processing workloads. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 373--386.
[11]
Bluefield 2022. BlueField-3, the most powerful software-defined, hardware-accelerated data center infrastructure on a chip. https://www.nvidia.com/en-us/networking/products/data-processing-unit/
[12]
Adrian M Caulfield, Eric S Chung, Andrew Putnam, Hari Angepat, Jeremy Fowers, Michael Haselman, Stephen Heil, Matt Humphrey, Puneet Kaur, Joo-Young Kim, et al. 2016. A cloud-scale acceleration architecture. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 1--13.
[13]
Xuhao Chen, Tianhao Huang, Shuotao Xu, Thomas Bourgeat, Chanwoo Chung, and Arvind Arvind. 2021. FlexMiner: A Pattern-Aware Accelerator for Graph Pattern Mining. In 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). IEEE, 581--594.
[14]
Xiaobing Chen, Yuke Wang, Xinfeng Xie, Xing Hu, Abanti Basak, Ling Liang, Mingyu Yan, Lei Deng, Yufei Ding, Zidong Du, et al. 2021. Rubik: A Hierarchical Architecture for Efficient Graph Neural Network Training. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2021).
[15]
Zhaodong Chen, Mingyu Yan, Maohua Zhu, Lei Deng, Guoqi Li, Shuangchen Li, and Yuan Xie. 2020. fuseGNN: accelerating graph convolutional neural network training on GPGPU. In Proceedings of the 2020 IEEE/ACM International Conference On Computer Aided Design (ICCAD). IEEE, 1--9.
[16]
CXL 2022. Compute Express Link 2.0: The Breakthrough CPU-to-Device Interconnect. https://www.computeexpresslink.org/
[17]
Vidushi Dadu, Sihao Liu, and Tony Nowatzki. 2021. PolyGraph: Exposing the Value of Flexibility for Graph Processing Accelerators. In 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). IEEE, 595--608.
[18]
Guohao Dai, Yuze Chi, Yu Wang, and Huazhong Yang. 2016. FPGP: Graph processing framework on FPGA a case study of breadth-first search. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 105--110.
[19]
Guohao Dai, Tianhao Huang, Yuze Chi, Ningyi Xu, Yu Wang, and Huazhong Yang. 2017. Foregraph: Exploring large-scale graph processing on multi-fpga architecture. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 217--226.
[20]
Extended Version 2022. Hyperscale FPGA-As-A-Service Architecture for Large-Scale Distributed Graph Neural Network (extended version). https://shuangchenli.github.io/isca22tr.pdf
[21]
FaaS Benefit 2022. Alibaba Cloud FPGA-based ECS Instances Scenarios. https://partners-intl.aliyun.com/help/doc-detail/163848.htm
[22]
Jeremy Fowers, Kalin Ovtcharov, Michael Papamichael, Todd Massengill, Ming Liu, Daniel Lo, Shlomi Alkalay, Michael Haselman, Logan Adams, Mahdi Ghandi, et al. 2018. A configurable cloud-scale DNN processor for real-time AI. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). IEEE, 1--14.
[23]
FPGA Board 2022. Bittware XUP-VV8. https://www.bittware.com/fpga/xup-vv8/
[24]
Alberto Garcia Duran and Mathias Niepert. 2017. Learning graph representations with embedding propagation. Advances in neural information processing systems 30 (2017), 5119--5130.
[25]
Tong Geng, Ang Li, Runbin Shi, Chunshu Wu, Tianqi Wang, Yanfei Li, Pouya Haghi, Antonino Tumeo, Shuai Che, Steve Reinhardt, et al. 2020. AWB-GCN: A graph convolutional network accelerator with runtime workload rebalancing. In Proceedings of the 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 922--936.
[26]
GENZ 2022. Gen-Z: An open systems Interconnect designed to provide memory-semantic access to data and devices via direct-attached, switched or fabric topologies. https://genzconsortium.org/
[27]
GRACE 2022. NVIDIA GRACE CPU. https://www.nvidia.com/en-us/data-center/grace-cpu/
[28]
GraphLrean 2022. Graph-Learn (a.k.a. AliGraph): An Industrial Graph Neural Network Framework. https://github.com/alibaba/graph-learn
[29]
H100 2022. NVIDIA Hopper Architecture In-Depth. https://developer.nvidia.com/blog/nvidia-hopper-architecture-in-depth/
[30]
Tae Jun Ham, Lisa Wu, Narayanan Sundaram, Nadathur Satish, and Margaret Martonosi. 2016. Graphicionado: A high-performance and energy-efficient accelerator for graph analytics. In 2016 49th Annual IEEE/ACMInternational Symposium on Microarchitecture (MICRO). IEEE, 1--13.
[31]
William L Hamilton, Rex Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 1025--1035.
[32]
Kartik Hegde, Hadi Asghari-Moghaddam, Michael Pellauer, Neal Crago, Aamer Jaleel, Edgar Solomonik, Joel Emer, and Christopher W Fletcher. 2019. Extensor: An accelerator for sparse tensor algebra. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. 319--333.
[33]
Yuwei Hu, Zihao Ye, Minjie Wang, Jiali Yu, Da Zheng, Mu Li, Zheng Zhang, Zhiru Zhang, and Yida Wang. 2020. Featgraph: A flexible and efficient backend for graph neural network systems. In Proceedings of the SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1--13.
[34]
Linyong Huang, Zhe Zhang, Shuangchen Li, Dimin Niu, Yijin Guan, Hongzhong Zheng, and Yuan Xie. 2022. Practical Near-Data-Processing Architecture for Large-Scale Distributed Graph Neural Network. In 2022 IEEE Access. IEEE.
[35]
Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning deep structured semantic models for web search using clickthrough data. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management. 2333--2338.
[36]
Intel SPR 2022. Golden Cove - Microarchitectures - Intel. https://en.wikichip.org/wiki/intel/microarchitectures/golden_cove
[37]
Zhihao Jia, Sina Lin, Mingyu Gao, Matei Zaharia, and Alex Aiken. 2020. Improving the accuracy, scalability, and performance of graph neural networks with roc. Proceedings of Machine Learning and Systems 2 (2020), 187--198.
[38]
Norman P Jouppi, Doe Hyun Yoon, Matthew Ashcraft, Mark Gottscho, Thomas B Jablin, George Kurian, James Laudon, Sheng Li, Peter Ma, Xiaoyu Ma, et al. 2021. Ten lessons from three generations shaped google's tpuv4i: Industrial product. In 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). IEEE, 1--14.
[39]
Kevin Kiningham, Philip Levis, and Christopher Ré. 2020. GReTA: Hardware Optimized Graph Processing for GNNs. In Proceedings of the Workshop on Resource-Constrained Machine Learning (ReCoML 2020).
[40]
Kevin Kiningham, Christopher Re, and Philip Levis. 2020. GRIP: a graph neural network accelerator architecture. arXiv preprint arXiv:2007.13828 (2020).
[41]
Yunsup Lee, Andrew Waterman, Henry Cook, Brian Zimmer, Ben Keller, Alberto Puggelli, Jaehwa Kwak, Ruzica Jevtic, Stevo Bailey, Milovan Blagojevic, et al. 2016. An agile approach to building RISC-V microprocessors. IEEE Micro 36, 2 (2016), 8--20.
[42]
Cangyuan Li, Ying Wang, Cheng Liu, Shengwen Liang, Huawei Li, and Xiaowei Li. 2021. GLIST: Towards In-Storage Graph Learning. In 2021 {USENIX} Annual Technical Conference ({USENIX} {ATC} 21). 225--238.
[43]
Houyi Li, Yongchao Liu, Yongyong Li, Bin Huang, Peng Zhang, Guowei Zhang, Xintan Zeng, Kefeng Deng, Wenguang Chen, and Changhua He. 2021. Graph-Theta: A Distributed Graph Neural Network Learning System With Flexible Training Strategy. arXiv preprint arXiv:2104.10569 (2021).
[44]
Jiajun Li, Ahmed Louri, Avinash Karanth, and Razvan Bunescu. 2021. Gcnax: A flexible and energy-efficient accelerator for graph convolutional neural networks. In Proceedings of the 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, 775--788.
[45]
Shengwen Liang, Ying Wang, Cheng Liu, Lei He, LI Huawei, Dawen Xu, and Xiaowei Li. 2020. Engn: A high-throughput and energy-efficient accelerator for large graph neural networks. IEEE Trans. Comput. (2020).
[46]
Jilan Lin, Shuangchen Li, Yufei Ding, and Yuan Xie. 2021. Overcoming the Memory Hierarchy Inefficiencies in Graph Processing Applications. In 2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD). IEEE, 1--9.
[47]
Xin Liu, Mingyu Yan, Shuhan Song, Zhengyang Lv, Wenming Li, Guangyu Sun, Xiaochun Ye, and Dongrui Fan. 2021. GNNSampler: Bridging the Gap between Sampling Algorithms of GNN and Hardware. arXiv preprint arXiv:2108.11571 (2021).
[48]
Ziqi Liu, Chaochao Chen, Xinxing Yang, Jun Zhou, Xiaolong Li, and Le Song. 2018. Heterogeneous graph neural networks for malicious account detection. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 2077--2085.
[49]
Yun-Chen Lo, Yu-Chun Kuo, Yun-Sheng Chang, Jian-Hao Huang, Jun-Shen Wu, Wen-Chien Ting, Tai-Hsing Wen, and Ren-Shuo Liu. 2019. Physically Tightly Coupled, Logically Loosely Coupled, Near-Memory BNN Accelerator (PTLL-BNN). In ESSCIRC 2019-IEEE 45th European Solid State Circuits Conference (ESSCIRC). IEEE, 241--244.
[50]
Lingxiao Ma, Zhi Yang, Han Chen, Jilong Xue, and Yafei Dai. 2017. Garaph: Efficient GPU-accelerated graph processing on a single machine with balanced replication. In 2017 {USENIX} Annual Technical Conference ({USENIX} {ATC} 17). 195--207.
[51]
Lingxiao Ma, Zhi Yang, Youshan Miao, Jilong Xue, Ming Wu, Lidong Zhou, and Yafei Dai. 2019. Neugraph: parallel deep neural network computation on large graphs. In Proceedings of 2019 {USENIX} Annual Technical Conference ({USENIX}{ATC} 19). 443--458.
[52]
Martin Maas, Krste Asanović, and John Kubiatowicz. 2018. A hardware accelerator for tracing garbage collection. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). IEEE, 138--151.
[53]
Seung Won Min, Kun Wu, Sitao Huang, Mert Hidayetoğlu, Jinjun Xiong, Eiman Ebrahimi, Deming Chen, and Wen-mei Hwu. 2021. Large Graph Convolutional Network Training with GPU-Oriented Data Communication Architecture. arXiv preprint arXiv:2103.03330 (2021).
[54]
MVAPICH 2022. MVAPICH: MPI over InfiniBand, Omni-Path, Ethernet/iWARP, and RoCE. https://mvapich.cse.ohio-state.edu/benchmarks/
[55]
Dimin Niu, Shuangchen Li, Yuhao Wang, Wei Han, Zhe Zhang, Yijin Guan, Tianchan Guan, Fei Sun, Fei Xue, Lide Duan, Yuanwei Fang, Hongzhong Zheng, Xiping Jiang, Song Wang, Fengguo Zuo, Yubing Wang, Bing Yu, Qiwei Ren, and Yuan Xie. 2022. 184QPS/Watt, 64Mbit/mm2 3D Logic-to-DRAM Hybrid Bonding with Process-Near-Memory Engine for Recommendation System. In 2022 International Solid-State Circuits Conference (ISSCC). IEEE.
[56]
NVT4 2022. NVIDIA T4. https://www.nvidia.com/en-us/data-center/tesla-t4/
[57]
Muhammet Mustafa Ozdal, Serif Yesil, Taemin Kim, Andrey Ayupov, John Greth, Steven Burns, and Ozcan Ozturk. 2016. Energy efficient architecture for graph analytics accelerators. ACM SIGARCH Computer Architecture News 44, 3 (2016), 166--177.
[58]
Subhankar Pal, Jonathan Beaumont, Dong-Hyeon Park, Aporva Amarnath, Siying Feng, Chaitali Chakrabarti, Hun-Seok Kim, David Blaauw, Trevor Mudge, and Ronald Dreslinski. 2018. Outerspace: An outer product based sparse matrix multiplication accelerator. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 724--736.
[59]
RISC-V 2022. T-head E906 RISC-V core. https://www.t-head.cn/product/E906
[60]
Feng Shi, Ahren Yiqiao Jin, and Song-Chun Zhu. 2021. VersaGNN: a Versatile accelerator for Graph neural networks. arXiv preprint arXiv:2105.01280 (2021).
[61]
SmartConnect 2022. Xilinx LogiCORE IP AXI SmartConnect. https://www.xilinx.com/products/intellectual-property/smartconnect.html
[62]
Nitish Srivastava, Hanchen Jin, Jie Liu, David Albonesi, and Zhiru Zhang. 2020. Matraptor: A sparse-sparse matrix multiplication accelerator based on row-wise product. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 766--780.
[63]
Chao Tian, Lingxiao Ma, Zhi Yang, and Yafei Dai. 2020. Pcgcn: Partition-centric processing for accelerating graph convolutional network. In 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 936--945.
[64]
VPU 2022. An opensource VPU implementation. https://github.com/alibaba/vector-accelerating-unit
[65]
VU13P 2022. Xilinx Virtex UltraScale+ VU13P devices. https://www.xilinx.com/products/silicon-devices/fpga/virtex-ultrascale-plus.html
[66]
Lei Wang, Qiang Yin, Chao Tian, Jianbang Yang, Rong Chen, Wenyuan Yu, Zihang Yao, and Jingren Zhou. 2021. FlexGraph: a flexible and efficient distributed framework for GNN training. In Proceedings of the Sixteenth European Conference on Computer Systems. 67--82.
[67]
Yangzihao Wang, Andrew Davidson, Yuechao Pan, Yuduo Wu, Andy Riffel, and John D Owens. 2016. Gunrock: A high-performance graph processing library on the GPU. In Proceedings of the 21st ACM SIGPLAN symposium on principles and practice of parallel programming. 1--12.
[68]
Yuke Wang, Boyuan Feng, Gushu Li, Shuangchen Li, Lei Deng, Yuan Xie, and Yufei Ding. 2020. Gnnadvisor: An efficient runtime system for gnn acceleration on gpus. In Proceedings of 2020 USENIX Symposium on Operating Systems Design and Implementation (OSDI).
[69]
Mingyu Yan, Lei Deng, Xing Hu, Ling Liang, Yujing Feng, Xiaochun Ye, Zhimin Zhang, Dongrui Fan, and Yuan Xie. 2020. Hygcn: A gcn accelerator with hybrid architecture. In Proceedings of the 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 15--29.
[70]
Mingyu Yan, Xing Hu, Shuangchen Li, Itir Akgun, Han Li, Xin Ma, Lei Deng, Xiaochun Ye, Zhimin Zhang, Dongrui Fan, et al. 2019. Balancing memory accesses for energy-efficient graph analytics accelerators. In 2019 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED). IEEE, 1--6.
[71]
Mingyu Yan, Xing Hu, Shuangchen Li, Abanti Basak, Han Li, Xin Ma, Itir Akgun, Yujing Feng, Peng Gu, Lei Deng, et al. 2019. Alleviating irregularity in graph analytics acceleration: a hardware/software co-design approach. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. 615--628.
[72]
Yifan Yang, Joel S Emer, and Daniel Sanchez. 2021. SpZip: Architectural Support for Effective Data Compression In Irregular Applications. In 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). IEEE, 1069--1082.
[73]
Yifan Yang, Zhaoshi Li, Yangdong Deng, Zhiwei Liu, Shouyi Yin, Shaojun Wei, and Leibo Liu. 2020. GraphABCD: Scaling out graph analytics with asynchronous block coordinate descent. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). IEEE, 419--432.
[74]
Shuqian Ye, Jiechun Liang, Rulin Liu, and Xi Zhu. 2020. Symmetrical Graph Neural Network for Quantum Chemistry with Dual Real and Momenta Space. The Journal of Physical Chemistry A 124, 34 (2020), 6945--6953.
[75]
Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L Hamilton, and Jure Leskovec. 2018. Graph convolutional neural networks for web-scale recommender systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 974--983.
[76]
Hanqing Zeng and Viktor Prasanna. 2020. GraphACT: Accelerating GCN training on CPU-FPGA heterogeneous platforms. In Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA). 255--265.
[77]
Bingyi Zhang, Rajgopal Kannan, and Viktor Prasanna. 2021. BoostGCN: A Framework for Optimizing GCN Inference on FPGA. In Proceedings of the 2021 IEEE 29th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 29--39.
[78]
Bingyi Zhang, Hanqing Zeng, and Viktor Prasanna. 2020. Hardware acceleration of large scale GCN inference. In Proceedings of the 2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP). IEEE, 61--68.
[79]
Dalong Zhang, Xin Huang, Ziqi Liu, Zhiyang Hu, Xianzheng Song, Zhibang Ge, Zhiqiang Zhang, Lin Wang, Jun Zhou, Yang Shuang, et al. 2020. Agl: a scalable system for industrial-purpose graph machine learning. arXiv preprint arXiv:2003.02454 (2020).
[80]
Muhan Zhang and Yixin Chen. 2018. Link prediction based on graph neural networks. Advances in Neural Information Processing Systems 31 (2018), 5165--5175.
[81]
Zhekai Zhang, Hanrui Wang, Song Han, and William J Dally. 2020. Sparch: Efficient architecture for sparse matrix multiplication. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 261--274.
[82]
Jun Zhao, Zhou Zhou, Ziyu Guan, Wei Zhao, Wei Ning, Guang Qiu, and Xiaofei He. 2019. Intentgc: a scalable graph convolution framework fusing heterogeneous information for recommendation. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2347--2357.
[83]
Da Zheng, Chao Ma, Minjie Wang, Jinjing Zhou, Qidong Su, Xiang Song, Quan Gan, Zheng Zhang, and George Karypis. 2020. DistDGL: distributed graph neural network training for billion-scale graphs. In Proceedings of the 2020 IEEE/ACM 10th Workshop on Irregular Applications: Architectures and Algorithms (IA3). IEEE, 36--44.
[84]
Zhe Zhou, Cong Li, Xuechao Wei, and Guangyu Sun. 2021. GCNear: A Hybrid Architecture for Efficient GCN Training with Near-Memory Processing. arXiv preprint arXiv:2111.00680 (2021).
[85]
Zhe Zhou, Bizhao Shi, Zhe Zhang, Yijin Guan, Guangyu Sun, and Guojie Luo. 2021. BlockGNN: Towards Efficient GNN Acceleration Using Block-Circulant Weight Matrices. In 2021 58th ACM/IEEE Design Automation Conference (DAC). IEEE, 1009--1014.
[86]
Hao Zhu, Yankai Lin, Zhiyuan Liu, Jie Fu, Tat-Seng Chua, and Maosong Sun. 2019. Graph neural networks with generated parameters for relation extraction. arXiv preprint arXiv:1902.00756 (2019).
[87]
Rong Zhu, Kun Zhao, Hongxia Yang, Wei Lin, Chang Zhou, Baole Ai, Yong Li, and Jingren Zhou. 2019. AliGraph: a comprehensive graph neural network platform. Proceedings of the VLDB Endowment 12, 12 (2019), 2094--2105.

Cited By

View all
  • (2025)Survey on Characterizing and Understanding GNNs From a Computer Architecture PerspectiveIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2025.353208936:3(537-552)Online publication date: Mar-2025
  • (2024)A Survey of Computationally Efficient Graph Neural Networks for Reconfigurable SystemsInformation10.3390/info1507037715:7(377)Online publication date: 28-Jun-2024
  • (2024)SEGNN4SLP: Structure Enhanced Graph Neural Networks for Service Link PredictionInternational Journal of Advanced Network, Monitoring and Controls10.2478/ijanmc-2024-00329:4(9-18)Online publication date: 31-Dec-2024
  • Show More Cited By

Index Terms

  1. Hyperscale FPGA-as-a-service architecture for large-scale distributed graph neural network

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        ISCA '22: Proceedings of the 49th Annual International Symposium on Computer Architecture
        June 2022
        1097 pages
        ISBN:9781450386104
        DOI:10.1145/3470496
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        In-Cooperation

        • IEEE CS TCAA: IEEE CS technical committee on architectural acoustics

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 11 June 2022

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. FPGA-as-a-service
        2. accelerator
        3. graph neural network

        Qualifiers

        • Research-article

        Conference

        ISCA '22
        Sponsor:

        Acceptance Rates

        ISCA '22 Paper Acceptance Rate 67 of 400 submissions, 17%;
        Overall Acceptance Rate 543 of 3,203 submissions, 17%

        Upcoming Conference

        ISCA '25

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)250
        • Downloads (Last 6 weeks)25
        Reflects downloads up to 15 Feb 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2025)Survey on Characterizing and Understanding GNNs From a Computer Architecture PerspectiveIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2025.353208936:3(537-552)Online publication date: Mar-2025
        • (2024)A Survey of Computationally Efficient Graph Neural Networks for Reconfigurable SystemsInformation10.3390/info1507037715:7(377)Online publication date: 28-Jun-2024
        • (2024)SEGNN4SLP: Structure Enhanced Graph Neural Networks for Service Link PredictionInternational Journal of Advanced Network, Monitoring and Controls10.2478/ijanmc-2024-00329:4(9-18)Online publication date: 31-Dec-2024
        • (2024)A Survey on Graph Neural Network Acceleration: A Hardware PerspectiveChinese Journal of Electronics10.23919/cje.2023.00.13533:3(601-622)Online publication date: May-2024
        • (2024)HiHGNN: Accelerating HGNNs Through Parallelism and Data Reusability ExploitationIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.339484135:7(1122-1138)Online publication date: 30-Apr-2024
        • (2024)HitGNN: High-Throughput GNN Training Framework on CPU+Multi-FPGA Heterogeneous PlatformIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.337133235:5(707-719)Online publication date: May-2024
        • (2024)DPU-Direct: Unleashing Remote Accelerators via Enhanced RDMA for Disaggregated DatacentersIEEE Transactions on Computers10.1109/TC.2024.340408973:8(2081-2095)Online publication date: Aug-2024
        • (2024)Acceleration of Graph Neural Networks with Heterogenous Accelerators ArchitectureProceedings of the SC '24 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis10.1109/SCW63240.2024.00148(1081-1089)Online publication date: 17-Nov-2024
        • (2024)F-TADOC: FPGA-Based Text Analytics Directly on Compression with HLS2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00287(3739-3752)Online publication date: 13-May-2024
        • (2024)Celeritas: Out-of-Core Based Unsupervised Graph Neural Network via Cross-Layer Computing2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00018(91-107)Online publication date: 2-Mar-2024
        • Show More Cited By

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media