research-article

Gzippo: Highly-Compact Processing-in-Memory Graph Accelerator Alleviating Sparsity and Redundancy

Authors:

Rachata Ausavarungnirun,

Xiaoyao LiangAuthors Info & Claims

ICCAD '22: Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design

Article No.: 115, Pages 1 - 9

https://doi.org/10.1145/3508352.3549372

Published: 22 December 2022 Publication History

Abstract

Graph application plays a significant role in real-world data computation. However, the memory access patterns become the performance bottleneck of the graph applications, which include low compute-to-communication ratio, poor temporal locality, and poor spatial locality. Existing RRAM-based processing-in-memory accelerators reduce the data movements but fail to address both sparsity and redundancy of graph data. In this work, we present Gzippo, a highly-compact design that supports graph computation in the compressed sparse format. Gzippo employs a tandem-isomorphic-crossbar architecture both to eliminate redundant searches and sequential indexing during iterations, and to remove sparsity leading to non-effective computation on zero values. Gzippo achieves a 3.0× (up to 17.4×) performance speedup, 23.9× (up to 163.2×) energy efficiency over state-of-the-art RRAM-based PIM accelerator, respectively.

References

[1]

Junwhan Ahn, Sungpack Hong, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi. 2015. A scalable processing-in-memory accelerator for parallel graph processing. In Proceedings of the 42nd Annual International Symposium on Computer Architecture. 105--117.

Digital Library

[2]

Rajeev Balasubramonian, Andrew B Kahng, Naveen Muralimanohar, Ali Shafiee, and Vaishnav Srinivas. 2017. CACTI 7: New tools for interconnect exploration in innovative off-chip memories. ACM Transactions on Architecture and Code Optimization (TACO) 14, 2 (2017), 1--25.

Digital Library

[3]

Abanti Basak, Shuangchen Li, Xing Hu, Sang Min Oh, Xinfeng Xie, Li Zhao, Xiaowei Jiang, and Yuan Xie. 2019. Analysis and optimization of the memory hierarchy for graph processing workloads. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 373--386.

[4]

James Bennett et al. 2007. The netflix prize. In Proceedings of KDD cup and workshop.

[5]

Nagadastagiri Challapalle, Sahithi Rampalli, Linghao Song, Nandhini Chandramoorthy, Karthik Swaminathan, John Sampson, Yiran Chen, and Vijaykrishnan Narayanan. 2020. Gaas-X: Graph analytics accelerator supporting sparse data representation using crossbar architectures. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). IEEE, 433--445.

Digital Library

[6]

Ping Chi, Shuangchen Li, Cong Xu, Tao Zhang, Jishen Zhao, Yongpan Liu, Yu Wang, and Yuan Xie. 2016. PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). IEEE, 27--39.

[7]

Thomas H. Cormen et al. 2009. Introduction to Algorithms, 3rd Edition. MIT Press.

[8]

Nvidia Corporation. 2016. The NVIDIA Graph Analytics library (nvGRAPH). In https://developer.nvidia.com/nvgraph.

[9]

Xiangyu Dong, Cong Xu, Yuan Xie, and Norman P Jouppi. 2012. NVsim: A circuit-level performance, energy, and area model for emerging nonvolatile memory. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 31, 7 (2012), 994--1007.

Digital Library

[10]

Priyank Faldu, Jeff Diamond, and Boris Grot. 2020. Domain-specialized cache management for graph analytics. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 234--248.

[11]

Zhisong Fu, Michael Personick, and Bryan Thompson. 2014. MapGraph: A high level API for fast development of high performance graph analytics on GPUs. In Proceedings of workshop on GRAph data management experiences and systems. 1--6.

Digital Library

[12]

Daichi Fujiki, Scott Mahlke, and Reetuparna Das. 2018. In-memory data parallel processor. ACM SIGPLAN Notices 53, 2 (2018), 1--14.

Digital Library

[13]

Tae Jun Ham, Lisa Wu, Narayanan Sundaram, Nadathur Satish, and Margaret Martonosi. 2016. Graphicionado: A high-performance and energy-efficient accelerator for graph analytics. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 1--13.

[14]

Leskovec J and Krevl A. 2014. SNAP Datasets: Stanford large network dataset collection. Ann Arbor, MI, USA. http://snap.stanford.edu/data

[15]

Nicholas Jao et al. 2019. Programmable Non-Volatile Memory Design Featuring Reconfigurable In-Memory Operations. In ISCAS.

[16]

Seongyun Ko and Wook-Shin Han. 2018. TurboGraph++ A scalable and fast graph analytics system. In Proceedings of the 2018 international conference on management of data. 395--410.

Digital Library

[17]

Aapo Kyrola, Guy Blelloch, and Carlos Guestrin. 2012. GraphChi: Large-Scale Graph Computation on Just a PC. In 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12). 31--46.

[18]

Page Lawrence et al. 1999. The PageRank citation ranking: Bringing order to the web. In Stanford InfoLab.

[19]

Hang Liu and H Howie Huang. 2015. Enterprise: breadth-first graph traversal on GPUs. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1--12.

Digital Library

[20]

Anurag Mukkara, Nathan Beckmann, Maleen Abeydeera, Xiaosong Ma, and Daniel Sanchez. 2018. Exploiting locality in graph analytics through hardware-accelerated traversal scheduling. In 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 1--14.

Digital Library

[21]

Dimin Niu, Cong Xu, Naveen Muralimanohar, Norman P Jouppi, and Yuan Xie. 2013. Design of cross-point metal-oxide ReRAM emphasizing reliability and cost. In 2013 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). IEEE, 17--23.

[22]

Amitabha Roy, Ivo Mihailovic, and Willy Zwaenepoel. 2013. X-Stream: Edge-centric graph processing using streaming partitions. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. 472--488.

Digital Library

[23]

Ali Shafiee, Anirban Nag, Naveen Muralimanohar, Rajeev Balasubramonian, John Paul Strachan, Miao Hu, R Stanley Williams, and Vivek Srikumar. 2016. ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars. (2016), 14--26.

[24]

Linghao Song, Xuehai Qian, Hai Li, and Yiran Chen. 2017. Pipelayer: A pipelined reram-based accelerator for deep learning. In 2017 IEEE international symposium on high performance computer architecture (HPCA). IEEE, 541--552.

[25]

Linghao Song, Youwei Zhuo, Xuehai Qian, Hai Li, and Yiran Chen. 2018. GraphR: Accelerating graph processing using ReRAM. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 531--543.

[26]

Narayanan Sundaram, Nadathur Satish, Md Mostofa Ali Patwary, Subramanya R Dulloor, Michael J Anderson, Satya Gautam Vadlamudi, Dipankar Das, and Pradeep Dubey. 2015. GraphMat: high performance graph analytics made productive. Proceedings of the VLDB Endowment 8, 11 (2015), 1214--1225.

Digital Library

[27]

Synopsys. 2020. Teaching Resources for IC Design. https://www.synopsys.com/community/university-program/teaching-resources.html.

[28]

Yangzihao Wang, Andrew Davidson, Yuechao Pan, Yuduo Wu, Andy Riffel, and John D Owens. 2016. Gunrock: A high-performance graph processing library on the GPU. In Proceedings of the 21st ACM SIGPLAN symposium on principles and practice of parallel programming. 1--12.

Digital Library

[29]

Geng Yuan, Payman Behnam, Zhengang Li, Ali Shafiee, Sheng Lin, Xiaolong Ma, Hang Liu, Xuehai Qian, Mahdi Nazm Bojnordi, Yanzhi Wang, et al. 2021. FORMS: fine-grained polarized ReRAM-based in-situ computation for mixed-signal DNN accelerator. In 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). IEEE, 265--278.

Digital Library

[30]

F. Benjamin Zhan et al. 1998. Shortest Path Algorithms: An Evaluation Using Real Road Networks. Transportation Science (1998).

[31]

Minxuan Zhou, Mohsen Imani, Saransh Gupta, Yeseong Kim, and Tajana Rosing. 2019. GRAM: graph processing in a reram-based computational memory. In IEEE Asia and South Pacific Design Automation Conference.

Digital Library

[32]

Xiaowei Zhu, Wentao Han, and Wenguang Chen. 2015. GridGraph: Large-Scale Graph Processing on a Single Machine Using 2-Level Hierarchical Partitioning. In 2015 USENIX Annual Technical Conference (USENIX ATC 15). 375--386.

Digital Library

Cited By

Kang HZhao YBlelloch GDhulipala LGu YMcGuffey CGibbons PAgrawal KShun J(2023)PIM-trie: A Skew-resistant Trie for Processing-in-MemoryProceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3558481.3591070(1-14)Online publication date: 17-Jun-2023
https://dl.acm.org/doi/10.1145/3558481.3591070

Index Terms

Gzippo: Highly-Compact Processing-in-Memory Graph Accelerator Alleviating Sparsity and Redundancy
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Special purpose systems

Recommendations

GaaS-X: graph analytics accelerator supporting sparse data representation using crossbar architectures
ISCA '20: Proceedings of the ACM/IEEE 47th Annual International Symposium on Computer Architecture

Graph analytics applications are ubiquitous in this era of a connected world. These applications have very low compute to byte-transferred ratios and exhibit poor locality, which limits their computational efficiency on general purpose computing ...
Increased Throughput for the Testing and Repair of RAMs with Redundancy

The problem of determining whether a redundant random-access memory (RRAM) containing faulty memory cells can be repaired with spare rows and columns is discussed. The approach is to increase the number of working RRAMs manufactured per unit time, ...
Modeling, Architecture, and Applications for Emerging Memory Technologies

Editor's note:Spin-transfer torque RAM and phase-change RAM are vying to become the next-generation embedded memory, offering high speed, high density, and nonvolatility. This article discusses new opportunities and challenges presented by these two ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICCAD '22: Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design

October 2022

1467 pages

ISBN:9781450392174

DOI:10.1145/3508352

Conference Chair:
Tulika Mitra
National University of Singapore
,
Program Chairs:
Evangeline Young
The Chinese University of Hong Kong
,
Jinjun Xiong
University at Buffalo (UB)

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGDA: ACM Special Interest Group on Design Automation

In-Cooperation

IEEE-EDS: Electronic Devices Society
IEEE CAS
IEEE CEDA

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 December 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICCAD '22

Sponsor:

SIGDA

ICCAD '22: IEEE/ACM International Conference on Computer-Aided Design

October 30 - November 3, 2022

California, San Diego

Acceptance Rates

Overall Acceptance Rate 457 of 1,762 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
166
Total Downloads

Downloads (Last 12 months)38
Downloads (Last 6 weeks)3

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Kang HZhao YBlelloch GDhulipala LGu YMcGuffey CGibbons PAgrawal KShun J(2023)PIM-trie: A Skew-resistant Trie for Processing-in-MemoryProceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3558481.3591070(1-14)Online publication date: 17-Jun-2023
https://dl.acm.org/doi/10.1145/3558481.3591070

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten