research-article

TDGraph: a topology-driven accelerator for high-performance streaming graph processing

Authors:
Jin Zhao

Huazhong University of Science and Technology, China

Huazhong University of Science and Technology, China
View Profile

,
Yun Yang

Huazhong University of Science and Technology, China

Huazhong University of Science and Technology, China
View Profile

,
Yu Zhang

Huazhong University of Science and Technology, China

Huazhong University of Science and Technology, China
View Profile

,
Xiaofei Liao

Huazhong University of Science and Technology, China

Huazhong University of Science and Technology, China
View Profile

,
Lin Gu

Huazhong University of Science and Technology, China

Huazhong University of Science and Technology, China
View Profile

,
Ligang He

University of Warwick, United Kingdom

University of Warwick, United Kingdom
View Profile

,
Bingsheng He

National University of Singapore, Singapore

National University of Singapore, Singapore
View Profile

,
Hai Jin

Huazhong University of Science and Technology, China

Huazhong University of Science and Technology, China
View Profile

,
Haikun Liu

Huazhong University of Science and Technology, China

Huazhong University of Science and Technology, China
View Profile

,
Xinyu Jiang

Huazhong University of Science and Technology, China

Huazhong University of Science and Technology, China
View Profile

,
Hui Yu

Huazhong University of Science and Technology, China

Huazhong University of Science and Technology, China
View Profile

ISCA '22: Proceedings of the 49th Annual International Symposium on Computer ArchitectureJune 2022Pages 116–129https://doi.org/10.1145/3470496.3527409

Published:11 June 2022Publication History

ISCA '22: Proceedings of the 49th Annual International Symposium on Computer Architecture

Pages 116–129

ABSTRACT

Many solutions have been recently proposed to support the processing of streaming graphs. However, for the processing of each graph snapshot of a streaming graph, the new states of the vertices affected by the graph updates are propagated irregularly along the graph topology. Despite the years' research efforts, existing approaches still suffer from the serious problems of redundant computation overhead and irregular memory access, which severely underutilizes a many-core processor. To address these issues, this paper proposes a topology-driven programmable accelerator TDGraph, which is the first accelerator to augment the many-core processors to achieve high performance processing of streaming graphs. Specifically, we propose an efficient topology-driven incremental execution approach into the accelerator design for more regular state propagation and better data locality. TDGraph takes the vertices affected by graph updates as the roots to prefetch other vertices along the graph topology and synchronizes the incremental computations of them on the fly. In this way, most state propagations originated from multiple vertices affected by different graph updates can be conducted together along the graph topology, which help reduce the redundant computations and data access cost. Besides, through the efficient coalescing of the accesses to vertex states, TDGraph further improves the utilization of the cache and memory bandwidth. We have evaluated TDGraph on a simulated 64-core processor. The results show that, the state-of-the-art software system achieves the speedup of 7.1~21.4 times after integrating with TDGraph, while incurring only 0.73% area cost. Compared with four cutting-edge accelerators, i.e., HATS, Minnow, PHI, and DepGraph, TDGraph gains the speedups of 4.6~12.7, 3.2~8.6, 3.8~9.7, and 2.3~6.1 times, respectively.

References

2022. DDR4 SDRAM System Power Calculator. https://media-www.micron.com/-/media/client/global/documents/products/power-calculator/ddr4_power_calc.xlsm?rev=a8a5e30d8a7e41c4adcaad2df73934b4.Google Scholar
2022. macsim. https://github.com/gthparch/macsim.Google Scholar
2022. SNAP. http://snap.stanford.edu/data/index.html.Google Scholar
Junwhan Ahn, Sungpack Hong, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi. 2015. A scalable processing-in-memory accelerator for parallel graph processing. In Proceedings of the 42nd Annual International Symposium on Computer Architecture. 105--117.Google ScholarDigital Library
Sam Ainsworth and Timothy M. Jones. 2016. Graph Prefetching Using Data Structure Knowledge. In Proceedings of the 2016 International Conference on Supercomputing. 39:1--39:11 pages.Google Scholar
Sam Ainsworth and Timothy M. Jones. 2018. An Event-Triggered Programmable Prefetcher for Irregular Workloads. In Proceedings of the 23rd International Conference on Architectural Support for Programming Languages and Operating Systems. 578--592.Google Scholar
Sam Ainsworth and Timothy M. Jones. 2019. Software Prefetching for Indirect Memory Accesses: A Microarchitectural Perspective. ACM Transactions on Computer Systems 36, 3 (2019), 8:1--8:34.Google ScholarDigital Library
Mikhail Asiatici and Paolo Ienne. 2021. Large-Scale Graph Processing on FPGAs with Caches for Thousands of Simultaneous Misses. In Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture. 609--622.Google ScholarDigital Library
Vignesh Balaji, Neal Crago, Aamer Jaleel, and Brandon Lucia. 2021. P-OPT: Practical Optimal Cache Replacement for Graph Analytics. In Proceedings of the 27th IEEE International Symposium on High-Performance Computer Architecture. 668--681.Google ScholarCross Ref
Abanti Basak, Shuangchen Li, Xing Hu, Sang Min Oh, Xinfeng Xie, Li Zhao, Xiaowei Jiang, and Yuan Xie. 2019. Analysis and Optimization of the Memory Hierarchy for Graph Processing Workloads. In Proceedings of the 25th IEEE International Symposium on High Performance Computer Architecture. 373--386.Google ScholarCross Ref
Abanti Basak, Zheng Qu, Jilan Lin, Alaa R. Alameldeen, Zeshan Chishti, Yufei Ding, and Yuan Xie. 2021. Improving Streaming Graph Processing Performance using Input Knowledge. In Proceedings of the 54th Annual IEEE/ACM International Symposium on Microarchitecture. 1036--1050.Google ScholarDigital Library
Robert D. Blumofe and Charles E. Leiserson. 1999. Scheduling Multithreaded Computations by Work Stealing. Journal of the ACM 46, 5 (1999), 720--748.Google ScholarDigital Library
Nagadastagiri Challapalle, Sahithi Rampalli, Linghao Song, Nandhini Chandramoorthy, Karthik Swaminathan, John Sampson, Yiran Chen, and Vijaykrishnan Narayanan. 2020. GaaS-X: Graph Analytics Accelerator Supporting Sparse Data Representation using Crossbar Architectures. In Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture. 433--445.Google ScholarDigital Library
Raymond Cheng, Ji Hong, Aapo Kyrola, Youshan Miao, Xuetian Weng, Ming Wu, Fan Yang, Lidong Zhou, Feng Zhao, and Enhong Chen. 2012. Kineograph: taking the pulse of a fast-changing and connected world. In Proceedings of the 7th European Conference on Computer Systems. 85--98.Google ScholarDigital Library
David Culler, Jaswinder Pal Singh, and Anoop Gupta. 1999. Parallel computer architecture: a hardware/software approach. Gulf Professional Publishing.Google Scholar
Guohao Dai, Tianhao Huang, Yuze Chi, Ningyi Xu, Yu Wang, and Huazhong Yang. 2017. ForeGraph: Exploring Large-scale Graph Processing on Multi-FPGA Architecture. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 217--226.Google ScholarDigital Library
Chantat Eksombatchai, Pranav Jindal, Jerry Zitao Liu, Yuchen Liu, Rahul Sharma, Charles Sugnet, Mark Ulrich, and Jure Leskovec. 2018. Pixie: A System for Recommending 3+ Billion Items to 200+ Million Users in Real-Time. In Proceedings of the 2018 World Wide Web Conference. 1775--1784.Google ScholarDigital Library
Dhivya Eswaran, Christos Faloutsos, Sudipto Guha, and Nina Mishra. 2018. Spot-Light: Detecting Anomalies in Streaming Graphs. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1378--1386.Google ScholarDigital Library
Priyank Faldu, Jeff Diamond, and Boris Grot. 2020. Domain-Specialized Cache Management for Graph Analytics. In Proceedings of the 26th IEEE International Symposium on High Performance Computer Architecture. 234--248.Google ScholarCross Ref
Wenfei Fan, Chunming Hu, and Chao Tian. 2017. Incremental Graph Computations: Doable and Undoable. In Proceedings of the 2017 ACM International Conference on Management of Data. 155--169.Google ScholarDigital Library
Shufeng Gong, Chao Tian, Qiang Yin, Wenyuan Yu, Yanfeng Zhang, Liang Geng, Song Yu, Ge Yu, and Jingren Zhou. 2021. Automating Incremental Graph Processing with Flexible Memoization. Proceedings of the VLDB Endowment 14, 9 (2021), 1613--1625.Google ScholarDigital Library
Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs. In Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation. 17--30.Google ScholarDigital Library
Tae Jun Ham, Lisa Wu, Narayanan Sundaram, Nadathur Satish, and Margaret Martonosi. 2016. Graphicionado: A high-performance and energy-efficient accelerator for graph analytics. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture. 56:1--56:13.Google ScholarCross Ref
Wentao Han, Youshan Miao, Kaiwei Li, Ming Wu, Fan Yang, Lidong Zhou, Vijayan Prabhakaran, Wenguang Chen, and Enhong Chen. 2014. Chronos: a graph engine for temporal graph analysis. In Proceedings of the 9th European Conference on Computer Systems. 1:1--1:14.Google ScholarDigital Library
Aamer Jaleel, Kevin B. Theobald, Simon C. Steely Jr., and Joel S. Emer. 2010. High performance cache replacement using re-reference interval prediction. In Proceedings of the 37th International Symposium on Computer Architecture. 60--71.Google Scholar
Xiaolin Jiang, Chengshuo Xu, Xizhe Yin, Zhijia Zhao, and Rajiv Gupta. 2021. Tripoline: generalized incremental graph processing via graph triangle inequality. In Proceedings of the 16th European Conference on Computer Systems. 17--32.Google ScholarDigital Library
Daniel A. Jiménez. 2013. Insertion and promotion for tree-based PseudoLRU last-level caches. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture. 284--296.Google ScholarDigital Library
Sang Woo Jun, Andy Wright, Sizhuo Zhang, Shuotao Xu, and Arvind. 2018. GraFBoost: Using Accelerated Flash Storage for External Graph Analytics. In Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture. 411--424.Google ScholarDigital Library
Kevin M. Lepak and Mikko H. Lipasti. 2002. Temporally silent stores. In Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems. 30--41.Google Scholar
Jure Leskovec, Jon M. Kleinberg, and Christos Faloutsos. 2005. Graphs over time: densification laws, shrinking diameters and possible explanations. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 177--187.Google ScholarDigital Library
Sheng Li, Jung Ho Ahn, Richard D. Strong, Jay B. Brockman, Dean M. Tullsen, and Norman P. Jouppi. 2009. McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture. 469--480.Google Scholar
Mugilan Mariappan, Joanna Che, and Keval Vora. 2021. DZiG: sparsity-aware incremental processing of streaming graphs. In Proceedings of the 16th European Conference on Computer Systems. 83--98.Google ScholarDigital Library
Mugilan Mariappan and Keval Vora. 2019. GraphBolt: Dependency-Driven Synchronous Processing of Streaming Graphs. In Proceedings of the 14th EuroSys Conference 2019. 25:1--25:16.Google ScholarDigital Library
Kiran Kumar Matam, Gunjae Koo, Haipeng Zha, Hung-Wei Tseng, and Murali Annavaram. 2019. GraphSSD: graph semantics aware SSD. In Proceedings of the 46th International Symposium on Computer Architecture9. 116--128.Google ScholarDigital Library
Andrew McCrabb, Eric Winsor, and Valeria Bertacco. 2019. DREDGE: Dynamic Repartitioning during Dynamic Graph Execution. In Proceedings of the 56th Annual Design Automation Conference. 28.Google ScholarDigital Library
Anurag Mukkara, Nathan Beckmann, Maleen Abeydeera, Xiaosong Ma, and Daniel Sánchez. 2018. Exploiting Locality in Graph Analytics through Hardware-Accelerated Traversal Scheduling. In Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture. 1--14.Google ScholarDigital Library
Anurag Mukkara, Nathan Beckmann, and Daniel Sánchez. 2019. PHI: Architectural Support for Synchronization- and Bandwidth-Efficient Commutative Scatter Updates. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. 1009--1022.Google ScholarDigital Library
Derek Gordon Murray, Frank McSherry, Rebecca Isaacs, Michael Isard, Paul Barham, and Martín Abadi. 2013. Naiad: a timely dataflow system. In Proceedings of the ACM SIGOPS 24th Symposium on Operating Systems Principles. 439--455.Google ScholarDigital Library
Lifeng Nai, Ramyad Hadidi, Jaewoong Sim, Hyojong Kim, Pranith Kumar, and Hyesoon Kim. 2017. GraphPIM: Enabling Instruction-Level PIM Offloading in Graph Computing Frameworks. In Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture. 457--468.Google ScholarCross Ref
Quan M. Nguyen and Daniel Sánchez. 2021. Fifer: Practical Acceleration of Irregular Applications on Reconfigurable Architectures. In Proceedings of the 54th Annual IEEE/ACM International Symposium on Microarchitecture. 1064--1077.Google Scholar
Muhammet Mustafa Ozdal, Serif Yesil, Taemin Kim, Andrey Ayupov, John Greth, Steven M.Burns, and Özcan Özturk. 2016. Energy Efficient Architecture for Graph Analytics Accelerators. In Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture. 166--177.Google Scholar
Xiafei Qiu, Wubin Cen, Zhengping Qian, You Peng, Ying Zhang, Xuemin Lin, and Jingren Zhou. 2018. Real-time Constrained Cycle Detection in Large Dynamic Graphs. Proceedings of the VLDB Endowment 11, 12 (2018), 1876--1888.Google ScholarDigital Library
Shafiur Rahman, Nael Abu-Ghazaleh, and Rajiv Gupta. 2020. GraphPulse: An Event-Driven Hardware Accelerator for Asynchronous Graph Processing. In Proceedings of the 53rd IEEE/ACM International Symposium on Microarchitecture. 908--921.Google ScholarCross Ref
Shafiur Rahman, Mahbod Afarin, Nael B. Abu-Ghazaleh, and Rajiv Gupta. 2021. JetStream: Graph Analytics on Streaming Data with Event-Driven Hardware Accelerator. In Proceedings of the 54th Annual IEEE/ACM International Symposium on Microarchitecture. 1091--1105.Google ScholarDigital Library
Kenneth A. Ross. 2007. Efficient Hash Probes on Modern Processors. In Proceedings of the 23rd International Conference on Data Engineering. 1297--1301.Google ScholarCross Ref
Daniel Sánchez and Christos Kozyrakis. 2013. ZSim: fast and accurate microarchitectural simulation of thousand-core systems. In Proceedings of the 40th Annual International Symposium on Computer Architecture. 475--486.Google ScholarDigital Library
David Sayce. 2020. The Number of tweets per day in 2020. https://www.dsayce.com/social-media/tweets-day/.Google Scholar
Steven L. Scott. 1996. Synchronization and Communication in the T3E Multiprocessor. In Proceedings of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems. 26--36.Google ScholarDigital Library
Albert Segura, Jose-Maria Arnau, and Antonio González. 2019. SCU: a GPU stream compaction unit for graph processing. In Proceedings of the 46th International Symposium on Computer Architecture. 424--435.Google ScholarDigital Library
Albert Segura, Jose-Maria Arnau, and Antonio Gonzalez. 2021. Energy-Efficient Stream Compaction Through Filtering and Coalescing Accesses in GPGPU Memory Partitions. IEEE Trans. Comput. (2021), 1--12. Google ScholarCross Ref
Dipanjan Sengupta, Narayanan Sundaram, Xia Zhu, Theodore L. Willke, Jeffrey S. Young, Matthew Wolf, and Karsten Schwan. 2016. GraphIn: An Online High Performance Incremental Graph Processing Framework. In Proceedings of the 22nd International Conference on Parallel and Distributed Computing. 319--333.Google ScholarDigital Library
Feng Sheng, Qiang Cao, Haoran Cai, Jie Yao, and Changsheng Xie. 2018. GraPU: Accelerate Streaming Graph Analysis through Preprocessing Buffered Updates. In Proceedings of the 2018 ACM Symposium on Cloud Computing. 301--312.Google ScholarDigital Library
Xiaogang Shi, Bin Cui, Yingxia Shao, and Yunhai Tong. 2016. Tornado: A System For Real-Time Iterative Analysis Over Evolving Data. In Proceedings of the 2016 International Conference on Management of Data. 417--430.Google ScholarDigital Library
Julian Shun and Guy E. Blelloch. 2013. Ligra: a lightweight graph processing framework for shared memory. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 135--146.Google Scholar
Avinash Sodani, Roger Gramunt, Jesüs Corbal, Ho-Seop Kim, Krishna Vinod, Sundaram Chinthamani, Steven Hutsell, Rajat Agarwal, and Yen-Chen Liu. 2016. Knights Landing: Second-Generation Intel Xeon Phi Product. IEEE Micro 36, 2 (2016), 34--46.Google ScholarDigital Library
Linghao Song, Youwei Zhuo, Xuehai Qian, Hai Helen Li, and Yiran Chen. 2018. GraphR: Accelerating Graph Processing Using ReRAM. In Proceedings of the 24th IEEE International Symposium on High Performance Computer Architecture. 531--543.Google ScholarCross Ref
Shuang Song, Xu Liu, Qinzhe Wu, Andreas Gerstlauer, Tao Li, and Lizy K. John. 2018. Start Late or Finish Early: A Distributed Graph Processing System with Redundancy Reduction. Proceedings of the VLDB Endowment 12, 2 (2018), 154--168.Google ScholarDigital Library
Yanwei Song and Engin Ipek. 2015. More is less: improving the energy efficiency of data movement via opportunistic use of sparse codes. In Proceedings of the 48th International Symposium on Microarchitecture. ACM, 242--254.Google ScholarDigital Library
Pourya Vaziri and Keval Vora. 2021. Controlling Memory Footprint of Stateful Streaming Graph Processing. In Proceedings of the 2021 USENIX Annual Technical Conference. 269--283.Google Scholar
Keval Vora, Rajiv Gupta, and Guoqing Xu. 2016. Synergistic Analysis of Evolving Graphs. ACM Transactions on Architecture and Code Optimization 13, 4 (2016), 32:1--32:27.Google ScholarDigital Library
Keval Vora, Rajiv Gupta, and Guoqing Xu. 2017. KickStarter: Fast and Accurate Computations on Streaming Graphs via Trimmed Approximations. In Proceedings of the 23rd International Conference on Architectural Support for Programming Languages and Operating Systems. 237--251.Google ScholarDigital Library
Chenning Xie, Rong Chen, Haibing Guan, Binyu Zang, and Haibo Chen. 2015. SYNC or ASYNC: time to fuse for distributed graph-parallel computation. In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 194--204.Google ScholarDigital Library
Mingyu Yan, Xing Hu, Shuangchen Li, Abanti Basak, Han Li, Xin Ma, Itir Akgun, Yujing Feng, Peng Gu, Lei Deng, Xiaochun Ye, Zhimin Zhang, Dongrui Fan, and Yuan Xie. 2019. Alleviating Irregularity in Graph Analytics Acceleration: a Hardware/Software Co-Design Approach. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. 615--628.Google ScholarDigital Library
Yifan Yang, Joel S. Emer, and Daniel Sanchez. 2021. SpZip: Architectural Support for Effective Data Compression In Irregular Applications. In Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture. 1070--1082.Google ScholarDigital Library
Yifan Yang, Zhaoshi Li, Yangdong Deng, Zhiwei Liu, Shouyi Yin, Shaojun Wei, and Leibo Liu. 2020. GraphABCD: Scaling Out Graph Analytics with Asynchronous Block Coordinate Descent. In Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture. 419--432.Google ScholarDigital Library
Xiangyao Yu, Christopher J. Hughes, Nadathur Satish, and Srinivas Devadas. 2015. IMP: indirect memory prefetcher. In Proceedings of the 48th International Symposium on Microarchitecture. 178--190.Google ScholarDigital Library
Dan Zhang, Xiaoyu Ma, Michael Thomson, and Derek Chiou. 2018. Minnow: Lightweight Offload Engines for Worklist Management and Worklist-Directed Prefetching. In Proceedings of the 23rd International Conference on Architectural Support for Programming Languages and Operating Systems. 593--607.Google ScholarDigital Library
Guowei Zhang, Virginia Chiu, and Daniel Sanchez. 2016. Exploiting Semantic Commutativity in Hardware Speculation. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture. Article 34:1--34:12.Google Scholar
Guowei Zhang, Webb Horn, and Daniel Sanchez. 2015. Exploiting Commutativity to Reduce the Cost of Updates to Shared Data in Cache-Coherent Systems. In Proceedings of the 48th Annual IEEE/ACM International Symposium on Microarchitecture. 13--25.Google ScholarDigital Library
Mingxing Zhang, Yongwei Wu, Youwei Zhuo, Xuehai Qian, Chengying Huan, and Kang Chen. 2018. Wonderland: A Novel Abstraction-Based Out-Of-Core Graph Processing System. In Proceedings of the 23rd International Conference on Architectural Support for Programming Languages and Operating Systems. 608--621.Google ScholarDigital Library
Mingxing Zhang, Youwei Zhuo, Chao Wang, Mingyu Gao, Yongwei Wu, Kang Chen, Christos Kozyrakis, and Xuehai Qian. 2018. GraphP: Reducing Communication for PIM-Based Graph Processing with Efficient Data Partition. In Proceedings of the 2018 IEEE International Symposium on High Performance Computer Architecture. 544--557.Google ScholarCross Ref
Yu Zhang, Xiaofei Liao, Hai Jin, Lin Gu, and Bing Bing Zhou. 2018. FBSGraph: Accelerating Asynchronous Graph Processing via Forward and Backward Sweeping. IEEE Transactions on Knowledge and Data Engineering 30, 5 (2018), 895--907.Google ScholarCross Ref
Yu Zhang, Xiaofei Liao, Hai Jin, Ligang He, Bingsheng He, Haikun Liu, and Lin Gu. 2021. DepGraph: A Dependency-Driven Accelerator for Efficient Iterative Graph Processing. In Proceedings of the 2021 IEEE International Symposium on High-Performance Computer Architecture. 371--384.Google ScholarCross Ref
Jin Zhao, Yu Zhang, Xiaofei Liao, Ligang He, Bingsheng He, Hai Jin, and Haikun Liu. 2021. LCCG: a locality-centric hardware accelerator for high throughput of concurrent graph processing. In Proceedings of the 2021 International Conference for High Performance Computing, Networking, Storage and Analysis. 45:1--45:14.Google Scholar
Ruohuang Zheng and Sreepathi Pai. 2021. Efficient Execution of Graph Algorithms on CPU with SIMD Extensions. In Proceedings of the 2021 IEEE/ACM International Symposium on Code Generation and Optimization. 262--276.Google ScholarDigital Library
Youwei Zhuo, Chao Wang, Mingxing Zhang, Rui Wang, Dimin Niu, Yanzhi Wang, and Xuehai Qian. 2019. GraphQ: Scalable PIM-Based Graph Processing. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. 712--725.Google ScholarDigital Library

Index Terms

TDGraph: a topology-driven accelerator for high-performance streaming graph processing
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Data flow architectures
      2. Special purpose systems
    2. Parallel architectures
      1. Multicore architectures

Recommendations

JetStream: Graph Analytics on Streaming Data with Event-Driven Hardware Accelerator
MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture

Graph Processing is at the core of many critical emerging workloads operating on unstructured data, including social network analysis, bioinformatics, and many others. Many applications operate on graphs that are constantly changing, i.e., new nodes ...
Read More
From GPGPU to Many-Core: Nvidia Fermi and Intel Many Integrated Core Architecture

Comparing the architectures and performance levels of an Nvidia Fermi accelerator with an Intel MIC Architecture coprocessor demonstrates the benefit of the coprocessor for bringing highly parallel applications into, or even beyond, GPGPU performance ...
Read More
Direct MPI Library for Intel Xeon Phi Co-Processors
IPDPSW '13: Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum

DCFA-MPI is an MPI library implementation for Intel Xeon Phi co-processor clusters, where a compute node consists of an Intel Xeon Phi co-processor card connected to the host via PCI Express with InfiniBand. DCFA-MPI enables direct data transfer between ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ISCA '22: Proceedings of the 49th Annual International Symposium on Computer Architecture
June 2022
1097 pages
ISBN:9781450386104
DOI:10.1145/3470496
General Chairs:
Valentina Salapura
Google
,
Mohamed Zahran
New York University
,
Program Chairs:
Fred Chong
The University of Chicago
,
Lingjia Tang
The University of Michigan
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 June 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
accelerator
incremental computation
many-core processor
state propagation
streaming graphs
Qualifiers
- research-article
Conference

Acceptance Rates
ISCA '22 Paper Acceptance Rate67of400submissions,17%Overall Acceptance Rate543of3,203submissions,17%
More
Upcoming Conference
ISCA '24

Sponsor:

sigarch

ISCA '24: The 51st Annual International Symposium on Computer Architecture

June 29 - July 3, 2024

Buenos Aires , Argentina
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 1,523
  Total Downloads
- Downloads (Last 12 months)466
- Downloads (Last 6 weeks)45
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

TDGraph: a topology-driven accelerator for high-performance streaming graph processing

ISCA '22: Proceedings of the 49th Annual International Symposium on Computer Architecture

ABSTRACT

References

Cited By

Index Terms

Recommendations

JetStream: Graph Analytics on Streaming Data with Event-Driven Hardware Accelerator

From GPGPU to Many-Core: Nvidia Fermi and Intel Many Integrated Core Architecture

Direct MPI Library for Intel Xeon Phi Co-Processors