skip to main content
10.1145/3431920.3439288acmconferencesArticle/Chapter ViewAbstractPublication PagesfpgaConference Proceedingsconference-collections
research-article

GraSU: A Fast Graph Update Library for FPGA-based Dynamic Graph Processing

Published: 17 February 2021 Publication History

Abstract

Existing FPGA-based graph accelerators, typically designed for static graphs, rarely handle dynamic graphs that often involve substantial graph updates (e.g., edge/node insertion and deletion) over time. In this paper, we aim to fill this gap. The key innovation of this work is to build an FPGA-based dynamic graph accelerator easily from any off-the-shelf static graph accelerator with minimal hardware engineering efforts (rather than from scratch). We observe \em spatial similarity of dynamic graph updates in the sense that most of graph updates get involved with only a small fraction of vertices. We therefore propose an FPGA library, called GraSU, to exploit spatial similarity for fast graph updates. GraSU uses a differential data management, which retains the high-value data (that will be frequently accessed) in the specialized on-chip UltraRAM while the overwhelming majority of low-value ones reside in the off-chip memory. Thus, GraSU can transform most of off-chip communications arising in dynamic graph updates into fast on-chip memory accesses. Our experiences show that GraSU can be easily integrated into existing state-of-the-art static graph accelerators with only 11 lines of code modifications. Our implementation atop AccuGraph using a Xilinx Alveo#8482; \ U250 board outperforms two state-of-the-art CPU-based dynamic graph systems, Stinger and Aspen, by an average of 34.24× and 4.42× in terms of update throughput, improving further overall efficiency by 9.80× and 3.07× on average.

Supplementary Material

MP4 File (3431920.3439288.mp4)
This presentation outlines the study of FPGA-based Dynamic Graph Processing, conducted by Qinggang Wang et al. In this paper, we introduce a graph-structured update library (called GraSU) for high-throughput updates on FPGA. GraSU can be easily integrated with any existing FPGA-based static graph accelerators with only a few lines of code modifications for handling dynamic graphs. GraSU features with the two key designs: an incremental value measurement and a value-aware differential memory management. We integrate GraSU into a state-of-the-art static graph accelerator AccuGraph to drive dynamic graph processing. Our implementation on a Xilinx U250 board demonstrates that the dynamic graph version of AccuGraph outperforms two state-of-the-art CPU-based dynamic graph systems, Stinger and Aspen.

References

[1]
Abanti Basak, Jilan Lin, Ryan Lorica, Xinfeng Xie, Zeshan Chishti, Alaa Alameldeen, and Yuan Xie. 2020. SAGA-Bench: Software and Hardware Characterization of Streaming Graph Analytics Workloads. In ISPASS. IEEE, 12--23.
[2]
Andrew Bean, Nachiket Kapre, and Peter Y. K. Cheung. 2015. G-DMA: improving memory access performance for hardware accelerated sparse graph computation. In ReConFig. IEEE, 1--6.
[3]
Nathan Beckmann and Daniel Sá nchez. 2015. Talus: A simple way to remove cliffs in cache performance. In HPCA. IEEE, 64--75.
[4]
Maciej Besta, Marc Fischer, Tal Ben-Nun, Johannes de Fine Licht, and Torsten Hoefler. 2019 a. Substream-Centric Maximum Matchings on FPGA. In FPGA. ACM, 152--161.
[5]
Maciej Besta, Marc Fischer, Vasiliki Kalavri, Michael Kapralov, and Torsten Hoefler. 2019 b. Practice of Streaming and Dynamic Graphs: Concepts, Models, Systems, and Parallelism. CoRR, Vol. abs/1912.12740 (2019). arxiv: 1912.12740
[6]
Federico Busato, Oded Green, Nicola Bombieri, and David A. Bader. 2018. Hornet: An Efficient Data Structure for Dynamic Sparse Graphs and Matrices on GPUs. In HPEC. IEEE, 1--7.
[7]
Xinyu Chen, Ronak Bajaj, Yao Chen, Jiong He, Bingsheng He, Weng-Fai Wong, and Deming Chen. 2019. On-The-Fly Parallel Data Shuffling for Graph Processing on OpenCL-Based FPGAs. In FPL. 67--73.
[8]
Raymond Cheng, Ji Hong, Aapo Kyrola, Youshan Miao, Xuetian Weng, Ming Wu, Fan Yang, Lidong Zhou, Feng Zhao, and Enhong Chen. 2012. Kineograph: taking the pulse of a fast-changing and connected world. In EuroSys. ACM, 85--98.
[9]
Asaf Cidon, Assaf Eisenman, Mohammad Alizadeh, and Sachin Katti. 2016. Cliffhanger: Scaling Performance Cliffs in Web Memory Caches. In NSDI. USENIX, 379--392.
[10]
Guohao Dai, Yuze Chi, Yu Wang, and Huazhong Yang. 2016. FPGP: Graph Processing Framework on FPGA A Case Study of Breadth-First Search. In FPGA. ACM, 105--110.
[11]
Guohao Dai, Tianhao Huang, Yuze Chi, Ningyi Xu, Yu Wang, and Huazhong Yang. 2017. ForeGraph: Exploring Large-scale Graph Processing on Multi-FPGA Architecture. In FPGA. ACM, 217--226.
[12]
Michael DeLorimier, Nachiket Kapre, Nikil Mehta, Dominic Rizzo, Ian Eslick, Raphael Rubin, Tomá s E. Uribe, Thomas F. Knight Jr., and André DeHon. 2006. GraphStep: A System Architecture for Sparse-Graph Algorithms. In FCCM. IEEE, 143--151.
[13]
Laxman Dhulipala, Guy E. Blelloch, and Julian Shun. 2019. Low-latency graph streaming using compressed purely-functional trees. In PLDI. ACM, 918--934.
[14]
David Ediger, Robert McColl, E. Jason Riedy, and David A. Bader. 2012. STINGER: High performance data structure for streaming graphs. In HPEC. IEEE, 1--5.
[15]
Nina Engelhardt and Hayden Kwok-Hay So. 2016. Gravf: A vertex-centric distributed graph processing framework on fpgas. In FPL. IEEE, 1--4.
[16]
Guoyao Feng, Xiao Meng, and Khaled Ammar. 2015. DISTINGER: A distributed graph data structure for massive dynamic graph processing. In BigData. IEEE, 1814--1822.
[17]
Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed Graph-parallel Computation on Natural Graphs. In OSDI. USENIX, 17--30.
[18]
Xiangyang Gou, Lei Zou, Chenxingyu Zhao, and Tong Yang. 2019. Fast and Accurate Graph Stream Summarization. In ICDE. IEEE, 1118--1129.
[19]
Chuang-Yi Gui, Long Zheng, Bingsheng He, Cheng Liu, Xin-Yu Chen, Xiao-Fei Liao, and Hai Jin. 2019. A Survey on Graph Processing Accelerators: Challenges and Opportunities. J. Comput. Sci. Technol., Vol. 34, 2 (2019), 339--371.
[20]
Tae Jun Ham, Lisa Wu, Narayanan Sundaram, Nadathur Satish, and Margaret Martonosi. 2016. Graphicionado: A High-Performance and Energy-Efficient Accelerator for Graph Analytics. In MICRO. IEEE, 1--13.
[21]
Keita Iwabuchi, Scott Sallinen, Roger A. Pearce, Brian Van Essen, Maya B. Gokhale, and Satoshi Matsuoka. 2016. Towards a Distributed Large-Scale Dynamic Graph Data Store. In IPDPS. IEEE, 892--901.
[22]
Hai Jin, Pengcheng Yao, Xiaofei Liao, Long Zheng, and Xianliang Li. 2017. Towards Dataflow-Based Graph Accelerator. In ICDCS. IEEE, 1981--1992.
[23]
Theodore Johnson and Dennis E. Shasha. 1994. 2Q: A Low Overhead High Performance Buffer Management Replacement Algorithm. In VLDB. Morgan Kaufmann, 439--450.
[24]
Soroosh Khoram, Jialiang Zhang, Maxwell Strange, and Jing Li. 2018. Accelerating Graph Analytics by Co-Optimizing Storage and Access on an FPGA-HMC Platform. In FPGA. ACM, 239--248.
[25]
Pradeep Kumar and H. Howie Huang. 2019. GraphOne: A Data Store for Real-time Analytics on Evolving Graphs. In FAST. USENIX, 249--263.
[26]
Ravi Kumar, Jasmine Novak, and Andrew Tomkins. 2006. Structure and Evolution of Online Social Networks. In KDD. ACM, 611--617.
[27]
Jure Leskovec, Jon M. Kleinberg, and Christos Faloutsos. 2005. Graphs over time: densification laws, shrinking diameters and possible explanations. In KDD. ACM, 177--187.
[28]
Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data.
[29]
Peter Macko, Virendra Marathe, Daniel Margo, and Margo Seltzer. 2015. LLAMA: Efficient graph analytics using Large Multiversioned Arrays. In ICDE. IEEE, 363--374.
[30]
Mugilan Mariappan and Keval Vora. 2019. GraphBolt: Dependency-Driven Synchronous Processing of Streaming Graphs. In EuroSys. ACM, 25:1--25:16.
[31]
Eriko Nurvitadhi, Gabriel Weisz, Yu Wang, Skand Hurkat, Marie Nguyen, James C. Hoe, José F. Martínez, and Carlos Guestrin. 2014. GraphGen: An FPGA Framework for Vertex-Centric Graph Computation. In FCCM. IEEE, 25--28.
[32]
Tayo Oguntebi and Kunle Olukotun. 2016. GraphOps: A Dataflow Library for Graph Analytics Acceleration. In FPGA. ACM, 111--117.
[33]
Muhammet Mustafa Ozdal, Serif Yesil, Taemin Kim, Andrey Ayupov, John Greth, Steven Burns, and Ozcan Ozturk. 2016. Energy Efficient Architecture for Graph Analytics Accelerators. In ISCA. IEEE, 166--177.
[34]
Xiafei Qiu, Wubin Cen, Zhengping Qian, You Peng, Ying Zhang, Xuemin Lin, and Jingren Zhou. 2018. Real-time Constrained Cycle Detection in Large Dynamic Graphs. Proc. VLDB Endow., Vol. 11, 12 (2018), 1876--1888.
[35]
Ryan A. Rossi and Nesreen K. Ahmed. 2015. The Network Data Repository with Interactive Graph Analytics and Visualization. In AAAI. http://networkrepository.com
[36]
David Sayce. 2020. The Number of tweets per day in 2020. https://www.dsayce.com/social-media/tweets-day/.
[37]
Dipanjan Sengupta, Narayanan Sundaram, Xia Zhu, Theodore L. Willke, Jeffrey S. Young, Matthew Wolf, and Karsten Schwan. 2016. GraphIn: An Online High Performance Incremental Graph Processing Framework. In Euro-Par. Springer, 319--333.
[38]
Mo Sha, Yuchen Li, Bingsheng He, and Kian-Lee Tan. 2017. Accelerating Dynamic Graph Analytics on GPUs. Proc. VLDB Endow., Vol. 11, 1 (2017), 107--120.
[39]
Zhiyuan Shao, Ruoshi Li, Diqing Hu, Xiaofei Liao, and Hai Jin. 2019. Improving Performance of Graph Processing on FPGA-DRAM Platform by Two-level Vertex Caching. In FPGA. ACM, 320--329.
[40]
Feng Sheng, Qiang Cao, Haoran Cai, Jie Yao, and Changsheng Xie. 2018. GraPU: Accelerate Streaming Graph Analysis through Preprocessing Buffered Updates. In SoCC. ACM, 301--312.
[41]
Shuang Song, Xu Liu, Qinzhe Wu, Andreas Gerstlauer, Tao Li, and Lizy K. John. 2018. Start Late or Finish Early: A Distributed Graph Processing System with Redundancy Reduction. Proc. VLDB Endow., Vol. 12, 2 (2018), 154--168.
[42]
Keval Vora, Rajiv Gupta, and Guoqing (Harry) Xu. 2016. Synergistic Analysis of Evolving Graphs. ACM Trans. Archit. Code Optim., Vol. 13, 4 (2016), 32:1--32:27.
[43]
Keval Vora, Rajiv Gupta, and Guoqing (Harry) Xu. 2017. KickStarter: Fast and Accurate Computations on Streaming Graphs via Trimmed Approximations. In ASPLOS. ACM, 237--251.
[44]
Qinggang Wang, Long Zheng, Jieshan Zhao, Xiaofei Liao, Hai Jin, and Jingling Xue. 2020. A Conflict-free Scheduler for High-performance Graph Processing on Multi-pipeline FPGAs. ACM Trans. Archit. Code Optim., Vol. 17, 2 (2020), 14:1--14:26.
[45]
Brian Wheatman and Helen Xu. 2018. Packed Compressed Sparse Row: A Dynamic Graph Representation. In HPEC. IEEE, 1--7.
[46]
Martin Winter, Daniel Mlakar, Rhaleb Zayer, Hans-Peter Seidel, and Markus Steinberger. 2018. faimGraph: high performance management of fully-dynamic graphs under tight memory constraints on the GPU. In SC. ACM, 60:1--60:13.
[47]
Alex Woodie, Tiffany Trader, George Leopold, John Russell, Oliver Peckham, James Kobielus, and Steve Conway. 2020. Tracking the Spread of Coronavirus with Graph Databases. datanami. https://www.datanami.com/2020/03/12/tracking-the-spread-of-coronavirus-with-graph-databases/.
[48]
Xilinx. 2019. UltraScale Architecture Memory Resources User Guide. https://www.xilinx.com/support/documentation/user_guides/ug573-ultrascale-memory-resources.pdf.
[49]
Xilinx. 2020. Vivado Design Suite User Guide High-Level Synthesis. https://www.xilinx.com/support/documentation/sw_manuals/xilinx2020_1/ug902-vivado-high-level-synthesis.pdf.
[50]
Pengcheng Yao, Long Zheng, Xiaofei Liao, Hai Jin, and Bingsheng He. 2018. An Efficient Graph Accelerator with Parallel Data Conflict Management. In PACT. ACM, 8:1--8:12.
[51]
Jialiang Zhang, Soroosh Khoram, and Jing Li. 2017. Boosting the Performance of FPGA-based Graph Processor using Hybrid Memory Cube: A Case for Breadth First Search. In FPGA. ACM, 207--216.
[52]
Jialiang Zhang and Jing Li. 2018. Degree-aware Hybrid Graph Traversal on FPGA-HMC Platform. In FPGA. ACM, 229--238.
[53]
Long Zheng, Xianliang Li, Yaohui Zheng, Yu Huang, Xiaofei Liao, Hai Jin, Jingling Xue, Zhiyuan Shao, and Qiang-Sheng Hua. 2020. Scaph: Scalable GPU-Accelerated Graph Processing with Value-Driven Differential Scheduling. In ATC. USENIX, 573--588.
[54]
Shijie Zhou, Charalampos Chelmis, and Viktor K Prasanna. 2016. High-Throughput and Energy-Efficient Graph Processing on FPGA. In FCCM. IEEE, 103--110.
[55]
Shijie Zhou, Rajgopal Kannan, Viktor K. Prasanna, Guna Seetharaman, and Qing Wu. 2019. HitGraph: High-throughput Graph Processing Framework on FPGA. IEEE Trans. Parallel Distrib. Syst., Vol. 30, 10 (2019), 2249--2264.

Cited By

View all
  • (2024)Dynamic-ACTS - A Dynamic Graph Analytics Accelerator For HBM-Enabled FPGAsACM Transactions on Reconfigurable Technology and Systems10.1145/366200217:3(1-29)Online publication date: 30-Apr-2024
  • (2024)LSGraph: A Locality-centric High-performance Streaming Graph EngineProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3650076(33-49)Online publication date: 22-Apr-2024
  • (2024)PhGraph: A High-Performance ReRAM-Based Accelerator for Hypergraph ApplicationsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.334322843:5(1318-1331)Online publication date: May-2024
  • Show More Cited By

Index Terms

  1. GraSU: A Fast Graph Update Library for FPGA-based Dynamic Graph Processing

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    FPGA '21: The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
    February 2021
    240 pages
    ISBN:9781450382182
    DOI:10.1145/3431920
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 February 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. accelerators
    2. dynamic graph
    3. library

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    FPGA '21
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 125 of 627 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)116
    • Downloads (Last 6 weeks)18
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Dynamic-ACTS - A Dynamic Graph Analytics Accelerator For HBM-Enabled FPGAsACM Transactions on Reconfigurable Technology and Systems10.1145/366200217:3(1-29)Online publication date: 30-Apr-2024
    • (2024)LSGraph: A Locality-centric High-performance Streaming Graph EngineProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3650076(33-49)Online publication date: 22-Apr-2024
    • (2024)PhGraph: A High-Performance ReRAM-Based Accelerator for Hypergraph ApplicationsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.334322843:5(1318-1331)Online publication date: May-2024
    • (2024)PDG: A Prefetcher for Dynamic Graph UpdatingIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.333588043:4(1246-1259)Online publication date: Apr-2024
    • (2024)A Scalable, Efficient, and Robust Dynamic Memory Management Library for HLS-based FPGAs2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00040(437-450)Online publication date: 2-Nov-2024
    • (2024)DAUSK: A Transactional Graph Structure for Skewed Dynamic Graph Storage2024 IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA)10.1109/ISPA63168.2024.00163(1216-1223)Online publication date: 30-Oct-2024
    • (2024)Towards High-Performance Graph Processing: From a Hardware/Software Co-Design PerspectiveJournal of Computer Science and Technology10.1007/s11390-024-4150-039:2(245-266)Online publication date: 1-Mar-2024
    • (2023)MEGA Evolving Graph AcceleratorProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614260(310-323)Online publication date: 28-Oct-2023
    • (2023)Vidi: Record Replay for Reconfigurable HardwareProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582040(806-820)Online publication date: 25-Mar-2023
    • (2023)CommonGraph: Graph Analytics on Evolving DataProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3575713(133-145)Online publication date: 27-Jan-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media