skip to main content
10.1145/3508352.3549333acmconferencesArticle/Chapter ViewAbstractPublication PagesiccadConference Proceedingsconference-collections
research-article

ISSA: Input-Skippable, Set-Associative Computing-in-Memory (SA-CIM) Architecture for Neural Network Accelerators

Published: 22 December 2022 Publication History

Abstract

Among several emerging architectures, computing in memory (CIM), which features in-situ analog computation, is a potential solution to the data movement bottleneck of the Von Neumann architecture for artificial intelligence (AI). Interestingly, more strengths of CIM significantly different from in-situ analog computation are not widely known yet. In this work, we point out that mutually stationary vectors (MSVs), which can be maximized by introducing associativity to CIM, are another inherent power unique to CIM. By MSVs, CIM exhibits significant freedom to dynamically vectorize the stored data (e.g., weights) to perform agile computation using the dynamically formed vectors.
We have designed and realized an SA-CIM silicon prototype and corresponding architecture and acceleration schemes in the TSMC 28 nm process. More specifically, the contributions of this paper are fourfold: 1) We identify MSVs as new features that can be exploited to improve the current performance and energy challenges of the CIM-based hardware. 2) We propose SA-CIM to enhance MSVs for skipping the zeros, small values, and sparse vectors. 3) We propose a transposed systolic dataflow to efficiently conduct conv3×3 while being capable of exploiting input-skipping schemes. 4) We propose a design flow to search for optimal aggressive skipping scheme setups while satisfying the accuracy loss constraint.
The proposed ISSA architecture improves the throughput by 1.91× to 2.97× speedup and the energy efficiency by 2.5× to 4.2×.

References

[1]
Aayush Ankit, Izzat El Hajj, Sai Rahul Chalamalasetti, Geoffrey Ndu, Martin Foltin, R. Stanley Williams, Paolo Faraboschi, Wen-mei W Hwu, John Paul Strachan, Kaushik Roy, and Dejan S. Milojicic. 2019. PUMA: A Programmable Ultra-Efficient Memristor-Based Accelerator for Machine Learning Inference. In The 24th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).
[2]
Avishek Biswas and Anantha P. Chandrakasan. 2018. Conv-RAM: An energy-efficient SRAM with embedded convolution computation for low-power CNN-based machine learning applications. In 2018 IEEE International Solid-State Circuits Conference (ISSCC).
[3]
Pai-Yu Chen and Shimeng Yu. 2016. Partition SRAM and RRAM based synaptic arrays for neuro-inspired computing. In 2016 IEEE International Symposium on Circuits and Systems (ISCAS).
[4]
Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. 2014. DianNao: A Small-Footprint High-Throughput Accelerator for Ubiquitous Machine-Learning. SIGARCH Comput. Archit. News 42, 1 (Feb. 2014), 269--284.
[5]
Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, and Olivier Temam. 2014. DaDianNao: A Machine-Learning Supercomputer. In 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 609--622.
[6]
Ping Chi, Shuangchen Li, Cong Xu, Tao Zhang, Jishen Zhao, Yongpan Liu, Yu Wang, and Yuan Xie. 2016. PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[7]
Zidong Du, Robert Fasthuber, Tianshi Chen, Paolo Ienne, Ling Li, Tao Luo, Xiaobing Feng, Yunji Chen, and Olivier Temam. 2015. ShiDianNao: Shifting vision processing closer to the sensor. In 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA). 92--104.
[8]
Tayfun Gokmen, Murat Onen, and Wilfried Haensch. 2017. Training Deep Convolutional Neural Networks with Resistive Cross-Point Devices. Frontiers in Neuroscience 11 (2017), 538.
[9]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, 770--778.
[10]
Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q. Weinberger. 2017. Densely Connected Convolutional Networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2261--2269.
[11]
Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, and Dmitry Kalenichenko. 2018. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[12]
Chuan-Jia Jhang, Cheng-Xin Xue, Je-Min Hung, Fu-Chun Chang, and Meng-Fan Chang. 2021. Challenges and Trends of SRAM-Based Computing-In-Memory for AI Edge Devices. IEEE Transactions on Circuits and Systems I (TCAS1): Regular Papers 68, 5 (2021), 1773--1786.
[13]
Hongyang Jia, Murat Ozatay, Yinqi Tang, Hossein Valavi, Rakshit Pathak, Jinseok Lee, and Naveen Verma. 2021. 15.1 A Programmable Neural-Network Inference Accelerator Based on Scalable In-Memory Computing. In 2021 IEEE International Solid-State Circuits Conference (ISSCC).
[14]
Hyunjoon Kim, Qian Chen, and Bongjin Kim. 2019. A 16K SRAM-Based Mixed-Signal In-Memory Computing Macro Featuring Voltage-Mode Accumulator and Row-by-Row ADC. In 2019 IEEE Asian Solid-State Circuits Conference (A-SSCC).
[15]
Yeonhong Park, Seung Yul Lee, Hoon Shin, Jun Heo, Tae Jun Ham, and Jae W. Lee. 2020. Unlocking Wordline-level Parallelism for Fast Inference on RRAM-based DNN Accelerator. In 2020 IEEE/ACM International Conference On Computer Aided Design (ICCAD).
[16]
Xiaochen Peng, Rui Liu, and Shimeng Yu. 2019. Optimizing Weight Mapping and Data Flow for Convolutional Neural Networks on RRAM Based Processing-In-Memory Architecture. In 2019 IEEE International Symposium on Circuits and Systems (ISCAS).
[17]
Ali Shafiee, Anirban Nag, Naveen Muralimanohar, Rajeev Balasubramonian, John Paul Strachan, Miao Hu, R. Stanley Williams, and Vivek Srikumar. 2016. ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[18]
Xin Si, Yung-Ning Tu, Wei-Hsing Huang, Jian-Wei Su, Pei-Jung Lu, Jing-Hong Wang, Ta-Wei Liu, Ssu-Yen Wu, Ruhui Liu, Yen-Chi Chou, Zhixiao Zhang, Syuan-Hao Sie, Wei-Chen Wei, Yun-Chen Lo, Tai-Hsing Wen, Tzu-Hsiang Hsu, Yen-Kai Chen, William Shih, Chung-Chuan Lo, Ren-Shuo Liu, Chih-Cheng Hsieh, Kea-Tiong Tang, Nan-Chun Lien, Wei-Chiang Shih, Yajuan He, Qiang Li, and Meng-Fan Chang. 2020. 15.5 A 28nm 64Kb 6T SRAM Computing-in-Memory Macro with 8b MAC Operation for AI Edge Chips. In 2020 IEEE International Solid-State Circuits Conference (ISSCC).
[19]
William Andrew Simon, Yasir Mahmood Qureshi, Marco Rios, Alexandre Levisse, Marina Zapater, and David Atienza. 2020. BLADE: An in-Cache Computing Architecture for Edge Devices. IEEE Transactions on Computers (TC) 69, 9 (2020), 1349--1363.
[20]
K. Simonyan and A. Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In International Conference on Learning Representations (ICLR).
[21]
Jian-Wei Su, Yen-Chi Chou, Ruhui Liu, Ta-Wei Liu, Pei-Jung Lu, Ping-Chun Wu, Yen-Lin Chung, Li-Yang Hung, Jin-Sheng Ren, Tianlong Pan, Sih-Han Li, Shih-Chieh Chang, Shyh-Shyuan Sheu, Wei-Chung Lo, Chih-I Wu, Xin Si, Chung-Chuan Lo, Ren-Shuo Liu, Chih-Cheng Hsieh, Kea-Tiong Tang, and Meng-Fan Chang. 2021. 16.3 A 28nm 384kb 6T-SRAM Computation-in-Memory Macro with 8b Precision for AI Edge Chips. In 2021 IEEE International Solid-State Circuits Conference (ISSCC).
[22]
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the Inception Architecture for Computer Vision. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2818--2826.
[23]
Cheng-Xin Xue, Tsung-Yuan Huang, Je-Syu Liu, Ting-Wei Chang, Hui-Yao Kao, Jing-Hong Wang, Ta-Wei Liu, Shih-Ying Wei, Sheng-Po Huang, Wei-Chen Wei, Yi-Ren Chen, Tzu-Hsiang Hsu, Yen-Kai Chen, Yun-Chen Lo, Tai-Hsing Wen, Chung-Chuan Lo, Ren-Shuo Liu, Chih-Cheng Hsieh, Kea-Tiong Tang, and Meng-Fan Chang. 2020. 15.4 A 22nm 2Mb ReRAM Compute-in-Memory Macro with 121-28TOPS/W for Multibit MAC Computing for Tiny AI Edge Devices. In 2020 IEEE International Solid-State Circuits Conference (ISSCC).
[24]
Cheng-Xin Xue, Je-Min Hung, Hui-Yao Kao, Yen-Hsiang Huang, Sheng-Po Huang, Fu-Chun Chang, Peng Chen, Ta-Wei Liu, Chuan-Jia Jhang, Chin-I Su, Win-San Khwa, Chung-Chuan Lo, Ren-Shuo Liu, Chih-Cheng Hsieh, Kea-Tiong Tang, Yu-Der Chih, Tsung-Yung Jonathan Chang, and Meng-Fan Chang. 2021. 16.1 A 22nm 4Mb 8b-Precision ReRAM Computing-in-Memory Macro with 11.91 to 195.7TOPS/W for Tiny AI Edge Devices. In 2021 IEEE International Solid-State Circuits Conference (ISSCC).
[25]
Tzu-Hsien Yang, Hsiang-Yun Cheng, Chia-Lin Yang, I-Ching Tseng, Han-Wen Hu, Hung-Sheng Chang, and Hsiang-Pang Li. 2019. Sparse ReRAM Engine: Joint Exploration of Activation and Weight Sparsity in Compressed Neural Networks. In 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA) (Phoenix, Arizona).
[26]
Shihui Yin, Zhewei Jiang, Jae-Sun Seo, and Mingoo Seok. 2020. XNOR-SRAM: In-Memory Computing SRAM Macro for Binary/Ternary Deep Neural Networks. IEEE Journal of Solid-State Circuits (JSSC) 55, 6 (2020), 1733--1743.
[27]
Jinshan Yue, Xiaoyu Feng, Yifan He, Yuxuan Huang, Yipeng Wang, Zhe Yuan, Mingtao Zhan, Jiaxin Liu, Jian-Wei Su, Yen-Lin Chung, Ping-Chun Wu, Li-Yang Hung, Meng-Fan Chang, Nan Sun, Xueqing Li, Huazhong Yang, and Yongpan Liu. 2021. 15.2 A 2.75-to-75.9TOPS/W Computing-in-Memory NN Processor Supporting Set-Associate Block-Wise Zero Skipping and Ping-Pong CIM with Simultaneous Computation and Weight Updating. In 2021 IEEE International Solid-State Circuits Conference (ISSCC).
[28]
Jinshan Yue, Zhe Yuan, Xiaoyu Feng, Yifan He, Zhixiao Zhang, Xin Si, Ruhui Liu, Meng-Fan Chang, Xueqing Li, Huazhong Yang, and Yongpan Liu. 2020. 14.3 A 65nm Computing-in-Memory-Based CNN Processor with 2.9-to-35.8TOPS/W System Energy Efficiency Using Dynamic-Sparsity Performance-Scaling Architecture and Energy-Efficient Inter/Intra-Macro Data Reuse. In 2020 IEEE International Solid-State Circuits Conference (ISSCC).

Cited By

View all
  • (2023)BICEP: Exploiting Bitline Inversion for Efficient Operation-Unit-Based Compute-in-Memory Architecture: No Retraining Needed!2023 IEEE 41st International Conference on Computer Design (ICCD)10.1109/ICCD58817.2023.00087(531-534)Online publication date: 6-Nov-2023
  • (2023)Exploiting and Enhancing Computation Latency Variability for High-Performance Time-Domain Computing-in-Memory Neural Network Accelerators2023 IEEE 41st International Conference on Computer Design (ICCD)10.1109/ICCD58817.2023.00085(515-522)Online publication date: 6-Nov-2023
  • (2023)FM-P2L: An Algorithm Hardware Co-design of Fixed-Point MSBs with Power-of-2 LSBs in CNN Accelerators2023 IEEE 41st International Conference on Computer Design (ICCD)10.1109/ICCD58817.2023.00067(407-414)Online publication date: 6-Nov-2023

Index Terms

  1. ISSA: Input-Skippable, Set-Associative Computing-in-Memory (SA-CIM) Architecture for Neural Network Accelerators
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Information & Contributors

            Information

            Published In

            cover image ACM Conferences
            ICCAD '22: Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design
            October 2022
            1467 pages
            ISBN:9781450392174
            DOI:10.1145/3508352
            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Sponsors

            In-Cooperation

            • IEEE-EDS: Electronic Devices Society
            • IEEE CAS
            • IEEE CEDA

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            Published: 22 December 2022

            Permissions

            Request permissions for this article.

            Check for updates

            Qualifiers

            • Research-article

            Funding Sources

            • National Science and Technology Council (NSTC)

            Conference

            ICCAD '22
            Sponsor:
            ICCAD '22: IEEE/ACM International Conference on Computer-Aided Design
            October 30 - November 3, 2022
            California, San Diego

            Acceptance Rates

            Overall Acceptance Rate 457 of 1,762 submissions, 26%

            Contributors

            Other Metrics

            Bibliometrics & Citations

            Bibliometrics

            Article Metrics

            • Downloads (Last 12 months)70
            • Downloads (Last 6 weeks)7
            Reflects downloads up to 18 Feb 2025

            Other Metrics

            Citations

            Cited By

            View all
            • (2023)BICEP: Exploiting Bitline Inversion for Efficient Operation-Unit-Based Compute-in-Memory Architecture: No Retraining Needed!2023 IEEE 41st International Conference on Computer Design (ICCD)10.1109/ICCD58817.2023.00087(531-534)Online publication date: 6-Nov-2023
            • (2023)Exploiting and Enhancing Computation Latency Variability for High-Performance Time-Domain Computing-in-Memory Neural Network Accelerators2023 IEEE 41st International Conference on Computer Design (ICCD)10.1109/ICCD58817.2023.00085(515-522)Online publication date: 6-Nov-2023
            • (2023)FM-P2L: An Algorithm Hardware Co-design of Fixed-Point MSBs with Power-of-2 LSBs in CNN Accelerators2023 IEEE 41st International Conference on Computer Design (ICCD)10.1109/ICCD58817.2023.00067(407-414)Online publication date: 6-Nov-2023

            View Options

            Login options

            View options

            PDF

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader

            Figures

            Tables

            Media

            Share

            Share

            Share this Publication link

            Share on social media