ABSTRACT
Computing-in-memory (CIM) is emerging as a promising architecture to accelerate graph convolutional networks (GCNs) normally bounded by redundant and irregular memory transactions. Current analog based CIM requires frequent analog and digital conversions (AD/DA) that dominate the overall area and power consumption. Furthermore, the analog non-ideality degrades the accuracy and reliability of CIM. In this work, an SRAM based digital CIM system is proposed to accelerate memory intensive GCNs, namely DCIM-GCN, which covers innovations from CIM circuit level eliminating costly AD/DA converters to architecture level addressing irregularity and sparsity of graph data. DCIM-GCN achieves 2.07X, 1.76X, and 1.89× speedup and 29.98×, 1.29×, and 3.73× energy efficiency improvement on average over CIM based PIMGCN, TARe, and PIM-GCN, respectively.
- Thomas N. Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations (ICLR), 2017.Google Scholar
- Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and Philip S. Yu. A comprehensive survey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems, 32(1):4--24, 2021.Google ScholarCross Ref
- Matthias Fey and Jan Eric Lenssen. Fast graph representation learning with pytorch geometric, 2019.Google Scholar
- Zhihao Jia, Sina Lin, Mingyu Gao, Matei Zaharia, and Alex Aiken. Improving the accuracy, scalability, and performance of graph neural networks with roc. In I. Dhillon, D. Papailiopoulos, and V. Sze, editors, Proceedings of Machine Learning and Systems, volume 2, pages 187--198, 2020.Google Scholar
- Ziwei Zhang, Peng Cui, and Wenwu Zhu. Deep learning on graphs: A survey. IEEE Transactions on Knowledge and Data Engineering, 34(1):249--270, 2022.Google ScholarDigital Library
- Yann LeCun, Yoshua Bengio, and Hinton Geoffrey. Deep learning. Nature, 512:436--444, 2015.Google ScholarCross Ref
- Connor W. Coley, Wengong Jin, Luke Rogers, Timothy F. Jamison, Tommi S. Jaakkola, William H. Green, Regina Barzilay, and Klavs F. Jensen. A graph-convolutional neural network model for the prediction of chemical reactivity. Chem. Sci., 10:370--377, 2019.Google ScholarCross Ref
- Huy-Trung Nguyen, Quoc-Dung Ngo, and Van-Hoang Le. Iot botnet detection approach based on psi graph and dgcnn classifier. In 2018 IEEE International Conference on Information Communication and Signal Processing (ICICSP), pages 118--122, 2018.Google ScholarCross Ref
- Tian Xie and Jeffrey C. Grossman. Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties. Phys. Rev. Lett., 120:145301, Apr 2018.Google ScholarCross Ref
- Hongxia Yang. Aligraph: A comprehensive graph neural network platform. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery amp; Data Mining, KDD '19, page 3165--3166, New York, NY, USA, 2019. Association for Computing Machinery.Google ScholarDigital Library
- Marinka Zitnik, Monica Agrawal, and Jure Leskovec. Modeling polypharmacy side effects with graph convolutional networks. bioinformatics. Bioinformatics, 34(13):457--466, 2018.Google ScholarCross Ref
- Jie Zhou, Ganqu Cui, Shengding Hu, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, Lifeng Wang, Changcheng Li, and Maosong Sun. Graph neural networks: A review of methods and applications. AI Open, 1:57--81, 2020.Google ScholarCross Ref
- Mingyu Yan, Zhaodong Chen, Lei Deng, Xiaochun Ye, Zhimin Zhang, Dongrui Fan, and Yuan Xie. Characterizing and understanding gcns on gpu. IEEE Computer Architecture Letters, 19(1):22--25, 2020.Google ScholarCross Ref
- Zhihui Zhang, Jingwen Leng, Lingxiao Ma, Youshan Miao, Chao Li, and Minyi Guo. Architectural implications of graph neural networks. IEEE Computer Architecture Letters, 19(1):59--62, 2020.Google ScholarCross Ref
- Milind Kulkarni, Martin Burtscher, Rajeshkar Inkulu, Keshav Pingali, and Calin Cascaval. How much parallelism is there in irregular applications? In Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '09, page 3--14, New York, NY, USA, 2009. Association for Computing Machinery.Google ScholarDigital Library
- Nagadastagiri Challapalle, Sahithi Rampalli, Linghao Song, Nandhini Chandramoorthy, Karthik Swaminathan, John Sampson, Yiran Chen, and Vijaykrishnan Narayanan. Gaas-x: Graph analytics accelerator supporting sparse data representation using crossbar architectures. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), pages 433--445, 2020.Google ScholarDigital Library
- Tong Geng, Ang Li, Runbin Shi, Chunshu Wu, Tianqi Wang, Yanfei Li, Pouya Haghi, Antonino Tumeo, Shuai Che, Steve Reinhardt, and Martin C. Herbordt. Awb-gcn: A graph convolutional network accelerator with runtime workload rebalancing. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 922--936, 2020.Google ScholarCross Ref
- Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, and William J. Dally. Eie: Efficient inference engine on compressed deep neural network. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), pages 243--254, 2016.Google ScholarDigital Library
- Dongyoung Kim, Junwhan Ahn, and Sungjoo Yoo. A novel zero weight/activation-aware hardware architecture of convolutional neural network. In Design, Automation Test in Europe Conference Exhibition (DATE), 2017, pages 1462--1467, 2017.Google ScholarCross Ref
- Shijin Zhang, Zidong Du, Lei Zhang, Huiying Lan, Shaoli Liu, Ling Li, Qi Guo, Tianshi Chen, and Yunji Chen. Cambricon-x: An accelerator for sparse neural networks. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 1--12, 2016.Google ScholarCross Ref
- A. Abou-Rjeili and G. Karypis. Multilevel algorithms for partitioning power-law graphs. In Proceedings 20th IEEE International Parallel Distributed Processing Symposium, pages 10 pp.-, 2006.Google ScholarCross Ref
- Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. Powergraph: Distributed graph-parallel computation on natural graphs. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI'12, page 17--30, USA, 2012. USENIX Association.Google ScholarDigital Library
- Matthieu Latapy. Main-memory triangle computations for very large (sparse (power-law)) graphs. Theoretical Computer Science, 407(1):458--473, 2008.Google ScholarDigital Library
- Cong Xie, Ling Yan, Wu-Jun Li, and Zhihua Zhang. Distributed power-law graph computing: Theoretical and empirical analysis. In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 1, NIPS'14, page 1673--1681, Cambridge, MA, USA, 2014. MIT Press.Google Scholar
- Adam Auten, Matthew Tomei, and Rakesh Kumar. Hardware acceleration of graph neural networks. In 2020 57th ACM/IEEE Design Automation Conference (DAC), pages 1--6, 2020.Google ScholarCross Ref
- Yintao He, Ying Wang, Cheng Liu, Huawei Li, and Xiaowei Li. Tare: Task-adaptive in-situ reram computing for graph learning. In 2021 58th ACM/IEEE Design Automation Conference (DAC), pages 577--582, 2021.Google ScholarDigital Library
- Mingyu Yan, Lei Deng, Xing Hu, Ling Liang, Yujing Feng, Xiaochun Ye, Zhimin Zhang, Dongrui Fan, and Yuan Xie. Hygcn: A gcn accelerator with hybrid architecture. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 15--29, 2020.Google ScholarCross Ref
- Nagadastagiri Challapalle, Karthik Swaminathan, Nandhini Chandramoorthy, and Vijaykrishnan Narayanan. Crossbar based processing in memory accelerator architecture for graph convolutional networks. In 2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD), pages 1--9, 2021.Google ScholarDigital Library
- Tong Geng, Chunshu Wu, Yongan Zhang, Cheng Tan, Chenhao Xie, Haoran You, Martin Herbordt, Yingyan Lin, and Ang Li. I-gcn: A graph convolutional network accelerator with runtime locality enhancement through islandization. In MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO '21, page 1051--1063, New York, NY, USA, 2021. Association for Computing Machinery.Google ScholarDigital Library
- Chen-Yang Tsai, Chin-Fu Nien, Tz-Ching Yu, Hung-Yu Yeh, and Hsiang-Yun Cheng. Repim: Joint exploitation of activation and weight repetitions for in-reram dnn acceleration. In 2021 58th ACM/IEEE Design Automation Conference (DAC), pages 589--594, 2021.Google ScholarDigital Library
- Tao Yang, Dongyue Li, Yibo Han, Yilong Zhao, Fangxin Liu, Xiaoyao Liang, Zhezhi He, and Li Jiang. Pimgcn: A reram-based pim design for graph convolutional network acceleration. In 2021 58th ACM/IEEE Design Automation Conference (DAC), pages 583--588, 2021.Google ScholarDigital Library
- Miao Hu, John Paul Strachan, Zhiyong Li, Emmanuelle M. Grafals, Noraica Davila, Catherine Graves, Sity Lam, Ning Ge, Jianhua Joshua Yang, and R. Stanley Williams. Dot-product engine for neuromorphic computing: Programming 1t1m crossbar to accelerate matrix-vector multiplication. In 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC), pages 1--6, 2016.Google Scholar
- Teyuh Chou, Wei Tang, Jacob Botimer, and Zhengya Zhang. Cascade: Connecting rrams to extend analog dataflow in an end-to-end in-memory processing paradigm. MICRO '52, page 114--125, New York, NY, USA, 2019. Association for Computing Machinery.Google ScholarDigital Library
- Chuan-Jia Jhang, Cheng-Xin Xue, Je-Min Hung, Fu-Chun Chang, and Meng-Fan Chang. Challenges and trends of sram-based computing-in-memory for ai edge devices. IEEE Transactions on Circuits and Systems I: Regular Papers, 68(5):1773--1786, 2021.Google ScholarCross Ref
- Fengbin Tu, Yiqi Wang, Zihan Wu, Ling Liang, Yufei Ding, Bongjin Kim, Leibo Liu, Shaojun Wei, Yuan Xie, and Shouyi Yin. A 28nm 29.2tflops/w bf16 and 36.5tops/w int8 reconfigurable digital cim processor with unified fp/int pipeline and bitwise in-memory booth multiplication for cloud deep learning acceleration. In 2022 IEEE International Solid- State Circuits Conference (ISSCC), volume 65, pages 1--3, 2022.Google ScholarCross Ref
- Hidehiro Fujiwara, Haruki Mori, Wei-Chang Zhao, Mei-Chen Chuang, Rawan Naous, Chao-Kai Chuang, Takeshi Hashizume, Dar Sun, Chia-Fu Lee, Kerem Akarvardar, Saman Adham, Tan-Li Chou, Mahmut Ersin Sinangil, Yih Wang, Yu-Der Chih, Yen-Huei Chen, Hung-Jen Liao, and Tsung-Yung Jonathan Chang. A 5-nm 254-tops/w 221-tops/mm2 fully-digital computing-in-memory macro supporting wide-range dynamic-voltage-frequency scaling and simultaneous mac and write operations. In 2022 IEEE International Solid- State Circuits Conference (ISSCC), volume 65, pages 1--3, 2022.Google ScholarCross Ref
- Baogang Zhang and Rickard Ewetz. Towards resilient deployment of in-memory neural networks with high throughput. In 2021 58th ACM/IEEE Design Automation Conference (DAC), pages 1081--1086, 2021.Google ScholarDigital Library
- Amr M. S. Tosson, Shimeng Yu, Mohab H. Anis, and Lan Wei. A study of the effect of rram reliability soft errors on the performance of rram-based neuromorphic systems. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 25(11):3125--3137, 2017.Google ScholarCross Ref
- Yu-Der Chih, Po-Hao Lee, Hidehiro Fujiwara, Yi-Chun Shih, Chia-Fu Lee, Rawan Naous, Yu-Lin Chen, Chieh-Pu Lo, Cheng-Han Lu, Haruki Mori, Wei-Chang Zhao, Dar Sun, Mahmut E. Sinangil, Yen-Huei Chen, Tan-Li Chou, Kerem Akarvardar, Hung-Jen Liao, Yih Wang, Meng-Fan Chang, and Tsung-Yung Jonathan Chang. 16.4 an 89tops/w and 16.3tops/mm2 all-digital sram-based full-precision compute-in memory macro in 22nm for machine-learning edge applications. In 2021 IEEE International Solid- State Circuits Conference (ISSCC), volume 64, pages 252--254, 2021.Google ScholarCross Ref
Index Terms
- DCIM-GCN: Digital Computing-in-Memory to Efficiently Accelerate Graph Convolutional Networks
Recommendations
Extreme Partial-Sum Quantization for Analog Computing-In-Memory Neural Network Accelerators
In Analog Computing-in-Memory (CIM) neural network accelerators, analog-to-digital converters (ADCs) are required to convert the analog partial sums generated from a CIM array to digital values. The overhead from ADCs substantially degrades the energy ...
AF-GCN: Completing various graph tasks efficiently via adaptive quadratic frequency response function in graph spectral domain
AbstractGraph neural network is a breakthrough in applying deep learning to non-Euclidean space. It is widely used for tasks such as social network analysis, molecular function inference, drug repositioning and protein modeling, achieving ...
CP-SRAM: charge-pulsation SRAM marco for ultra-high energy-efficiency computing-in-memory
DAC '22: Proceedings of the 59th ACM/IEEE Design Automation ConferenceSRAM-based computing-in-memory (SRAM-CIM) provides fast speed and good scalability with advanced process technology. However, the energy efficiency of the state-of-the-art current-domain SRAM-CIM bit-cell structure is limited and the peripheral ...
Comments