research-article

Memristive-based Mixed-signal CGRA for Accelerating Deep Neural Network Inference

Authors:

Reza Kazerooni-Zand,

Ali Afzali-Kusha,

Massoud PedramAuthors Info & Claims

ACM Transactions on Design Automation of Electronic Systems, Volume 28, Issue 4

Article No.: 66, Pages 1 - 25

https://doi.org/10.1145/3595638

Published: 18 July 2023 Publication History

Abstract

In this paper, a mixed-signal coarse-grained reconfigurable architecture (CGRA) for accelerating inference in deep neural networks (DNNs) is presented. It is based on performing dot-product computations using analog computing to achieve a considerable speed improvement. Other computations are performed digitally. In the proposed structure (called MX-CGRA), analog tiles consisting of memristor crossbars are employed. To reduce the overhead of converting the data between analog and digital domains, we utilize a proper interface between the analog and digital tiles. In addition, the structure benefits from an efficient memory hierarchy where the data is moved as close as possible to the computing fabric. Moreover, to fully utilize the tiles, we define a set of micro instructions to configure the analog and digital domains. Corresponding context words used in the CGRA are determined by these instructions (generated by a companion compiler tool). The efficacy of the MX-CGRA is assessed by modeling the execution of state-of-the-art DNN architectures on this structure. The architectures are used to classify images of the ImageNet dataset. Simulation results show that, compared to the previous mixed-signal DNN accelerators, on average, a higher throughput of 2.35 × is achieved.

References

[1]

C. Badue, R. Guidolini, R. Carneiro, P. Azevedo, V. Cardoso, A. Forechi, L. Jesus, R. Berriel, T. Paixão, F. Mutz, L. de Paula Veronese, T. Oliveira-Santos, and A. De Souza. 2021. Self-driving cars: A survey. Expert Systems with Applications 165 (2021), 113816. DOI:

[2]

P. Kumar Mallick, S. H. Ryu, S. K. Satapathy, S. Mishra, G. N. Nguyen, and P. Tiwari. 2019. Brain MRI image classification for cancer detection using deep wavelet autoencoder-based deep neural network. IEEE Access 7 (2019), 46278–46287. DOI:

[3]

J. Pan, C. Liu, Z. Wang, Y. Hu, and H. Jiang. 2012. Investigation of deep neural networks (DNN) for large vocabulary continuous speech recognition: Why DNN surpasses GMMS in acoustic modeling. 2012 8th Int. Symp. Chinese Spok. Lang. Process. ISCSLP 2012. 301–305. DOI:

[4]

S. A. Mohamed, A. A. Elsayed, Y. F. Hassan, and M. A. Abdou. 2021. Neural machine translation: Past, present, and future. Neural Comput. Appl. 33, 23 (2021), 15919–15931. DOI:

Digital Library

[5]

S. Han et al. 2016. EIE: Efficient inference engine on compressed deep neural network. Proc. - 2016 43rd Int. Symp. Comput. Archit. ISCA 2016 16 (2016), 243–254. DOI:

Digital Library

[6]

D. Dang, J. Dass, and R. Mahapatra. 2018. ConvLight: A convolutional accelerator with memristor integrated photonic computing. Proc. - 24th IEEE Int. Conf. High Perform. Comput. HiPC 2017. 114–123. DOI:

[7]

K. Ando, S. Takamaeda-Yamazaki, M. Ikebe, T. Asai, and M. Motomura. 2017. A multithreaded CGRA for convolutional neural network processing. Circuits Syst. 08, 06 (2017), 149–170. DOI:

[8]

M. Tanomoto, S. Takamaeda-Yamazaki, J. Yao, and Y. Nakashima. 2015. A CGRA-based approach for accelerating convolutional neural networks. Proc. - IEEE 9th Int. Symp. Embed. Multicore/Manycore SoCs, MCSoC 2015. 73–80. DOI:

Digital Library

[9]

O. Akbari, M. Kamal, A. Afzali-Kusha, M. Pedram, and M. Shafique. 2018. PX-CGRA: Polymorphic approximate coarse-grained reconfigurable architecture. Proc. 2018 Des. Autom. Test Eur. Conf. Exhib. DATE 2018. 413–418. DOI:

[10]

C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi. 2017. Inception-v4, inception-ResNet and the impact of residual connections on learning. 31st AAAI Conf. Artif. Intell. AAAI 2017. 4278–4284.

[11]

A. Shafiee et al. 2016. ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. Proc. - 2016 43rd Int. Symp. Comput. Archit. ISCA 2016. 14–26. DOI:

Digital Library

[12]

A. Rodriguez-Vazquez, A. Dominguez-Castro, A. Rueda, J. L.Huertas, and E. Sanchez-Sinencio. 1990. Nonlinear switched capacitor ‘neural’ networks for optimization problems. IEEE Transactions on Circuits and Systems 37, 3 (1990), 384–398.

[13]

A. Tripathi, M. Arabizadeh, S. Khandelwal, and C. S. Thakur. 2019. Analog neuromorphic system based on multi input floating gate MOS neuron model. Proc. - IEEE Int. Symp. Circuits Syst. DOI:

[14]

A. Ankit et al. 2019. PUMA: A programmable ultra-efficient memristor-based accelerator for machine learning inference. Int. Conf. Archit. Support Program. Lang. Oper. Syst. - ASPLOS (2019), 715–731. DOI:

Digital Library

[15]

S. Mittal. 2018. A survey of ReRAM-based architectures for processing-in-memory and neural networks. Mach. Learn. Knowl. Extr. 1, 1 (2018), 75–114. DOI:

[16]

M. Ansari, A. Fayyazi, M. Kamal, A. Afzali-Kusha, and M. Pedram. 2019. OCTAN: An on-chip training algorithm for memristive neuromorphic circuits. IEEE Trans. Circuits Syst. I: Regul. Pap. 66, 12 (2019), 4687–4698. DOI:

[17]

P. Yao et al. 2020. Fully hardware-implemented memristor convolutional neural network. Nature 577, 7792 (2020), 641–646. DOI:

[18]

Y. C. Xiang et al. 2019. Analog deep neural network based on NOR flash computing array for high speed/energy efficiency computation. Proc. - IEEE Int. Symp. Circuits Syst. 7–10. DOI:

[19]

P. Srivastava et al. 2018. PROMISE: An end-to-end design of a programmable mixed-signal accelerator for machine-learning algorithms. Proc. - Int. Symp. Comput. Archit. (2018) 43–56. DOI:

Digital Library

[20]

G. Yuan et al. 2021. FORMS: Fine-grained polarized ReRAM-based in-situ computation for mixed-signal DNN accelerator. Proc. - Int. Symp. Comput. Archit. 265–278. DOI:

Digital Library

[21]

C. Deng, Y. Sui, S. Liao, X. Qian, and B. Yuan. 2021. GoSPA: An energy-efficient high-performance globally optimized SParse convolutional neural network accelerator. Proc. - Int. Symp. Comput. Archit. 1110–1123. DOI:

Digital Library

[22]

M. Bavandpour, M. R. Mahmoodi, and D. B. Strukov. 2020. aCortex: An energy-efficient multipurpose mixed-signal inference accelerator. IEEE J. Explor. Solid-State Comput. Devices Circuits 6, 1 (2020), 98–106. DOI:

[23]

B. Zhang et al. 2022. PIMCA: A Programmable In-Memory Computing Accelerator for Energy-Efficient DNN Inference. (2022).

[24]

S. Yin, Z. Jiang, M. Kim, T. Gupta, M. Seok, and J. S. Seo. 2020. Vesti: Energy-efficient in-memory computing accelerator for deep neural networks. IEEE Trans. Very Large Scale Integr. Syst. 28, 1 (2020), 48–61. DOI:

[25]

P. Chi et al. 2016. PRIME: A novel processing-in-memory architecture for neural network computation in ReRAM-Based main memory. Proc. - 2016 43rd Int. Symp. Comput. Archit. ISCA 2016. 27–39. DOI:

Digital Library

[26]

X. Liu et al. 2015. RENO: A high-efficient reconfigurable neuromorphic computing accelerator design. Proc. - Des. Autom. Conf. DOI:

Digital Library

[27]

D. J. Mountain, M. R. McLean, and C. D. Krieger. 2018. Memristor crossbar tiles in a flexible, general purpose neural processor. IEEE J. Emerg. Sel. Top. Circuits Syst. 8, 1 (2018), 137–145. DOI:

[28]

A. Ankit et al. 2020. PANTHER: A programmable architecture for neural network training harnessing energy-efficient ReRAM. IEEE Trans. Comput. 69, 8 (2020), 1128–1142. DOI:

Digital Library

[29]

Y. H. Chen, T. Krishna, J. S. Emer, and V. Sze. 2017. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J. Solid-State Circuits 52, 1 (2017), 127–138. DOI:

[30]

Y. H. Chen, T. J. Yang, J. S. Emer, and V. Sze. 2019. Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices. IEEE J. Emerg. Sel. Top. Circuits Syst. 9, 2 (2019), 292–308. DOI:

[31]

A. Aimar et al. 2019. NullHop: A flexible convolutional neural network accelerator based on sparse representations of feature maps. IEEE Trans. Neural Networks Learn. Syst. 30, 3 (2019), 644–656. DOI:

[32]

S. M. A. H. Jafri et al. 2014. NeuroCGRA: A CGRA with support for neural networks. Proc. 2014 Int. Conf. High Perform. Comput. Simulation, HPCS 2014, 1, c, 506–511. DOI:

[33]

Y. Inagaki, S. Takamaeda-Yamazaki, J. Yao, and Y. Nakashima. 2014. Performance evaluation of a 3D-stencil library for distributed memory array accelerators. Proc. - 2014 2nd Int. Symp. Comput. Networking, CANDAR 2014. 388–393. DOI:

Digital Library

[34]

J. Pei et al. 2019. Towards artificial general intelligence with hybrid Tianjic chip architecture. Nature 572, 7767 (2019), 106–111. DOI:

[35]

I. Bae, B. Harris, H. Min, and B. Egger. 2018. Auto-tuning CNNs for coarse-grained reconfigurable array-based accelerators. IEEE Trans. Comput. Des. Integr. Circuits Syst. 37, 11 (2018), 2301–2310. DOI:

[36]

M. Karunaratne, A. K. Mohite, T. Mitra, and L. S. Peh. 2017. HyCUBE: A CGRA with reconfigurable single-cycle multi-hop interconnect. Proc. - Des. Autom. Conf., Part 12828. DOI:

Digital Library

[37]

H. Afzali-Kusha, O. Akbari, M. Kamal, and M. Pedram. 2018. Energy and reliability improvement of voltage-based, clustered, coarse-grain reconfigurable architectures by employing quality-aware mapping. IEEE J. Emerg. Sel. Top. Circuits Syst. 8, 3 (2018), 480–493. DOI:

[38]

D. Liu et al. 2019. Data-flow graph mapping optimization for CGRA with deep reinforcement learning. IEEE Trans. Comput. Des. Integr. Circuits Syst. 38, 12 (2019), 2271–2283. DOI:

Digital Library

[39]

J. Yang, M. Rao, H. Tang et al. 2020. Thousands of conductance levels in memristors monolithically integrated on CMOS. Research Square (2022). DOI:

[40]

A. Gholami, S. Kim, Z. Dong, Z. Yao, M. W. Mahoney, and K. Keutzer. 2022. A survey of quantization methods for efficient neural network inference. Low-Power Comput. Vis. (2022). 291–326. DOI:

[41]

B. Jacob et al. 2018. Quantization and training of neural networks for efficient integer-arithmetic-only inference. Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 2704–2713. DOI:

[42]

T. Liang, J. Glossner, L. Wang, S. Shi, and X. Zhang. 2021. Pruning and quantization for deep neural network acceleration: A survey. Neurocomputing 461 (2021), 370–403. DOI:

Digital Library

[43]

A. BanaGozar, M. A. Maleki, M. Kamal, A. Afzali-Kusha, and M. Pedram. 2017. Robust neuromorphic computing in the presence of process variation. Proc. 2017 Des. Autom. Test Eur. DATE 2017. 440–445. DOI:

[44]

B. Liu, H. Li, Y. Chen, X. Li, Q. Wu, and T. Huang. 2015. Vortex: Variation-aware training for memristor X-bar. Proc. - Des. Autom. Conf. 2015-July, c, 1–6. DOI:

Digital Library

[45]

S. Vahdat, M. Kamal, A. Afzali-Kusha, and M. Pedram. 2021. Loading-aware reliability improvement of ultra-low power memristive neural networks. IEEE Trans. Circuits Syst. I: Regul. Pap. 68, 8 (2021), 3411–3421. DOI:

[46]

S. Vahdat, M. Kamal, A. Afzali-Kusha, and M. Pedram. 2021. Reliability enhancement of inverter-based memristor crossbar neural networks using mathematical analysis of circuit non-idealities. IEEE Trans. Circuits Syst. I: Regul. Pap. 68, 10 (Aug. 2021), 4310–4323. DOI:

[47]

S. Vahdat, M. Kamal, A. Afzali-Kusha, and M. Pedram. 2021. LATIM: Loading-aware offline training method for inverter-based memristive neural networks. IEEE Trans. Circuits Syst. II Express Briefs 68, 10 (2021), 3346–3350. DOI:

[48]

A. Fayyazi, M. Ansari, M. Kamal, A. Afzali-Kusha, and M. Pedram. 2018. An ultra low-power memristive neuromorphic circuit for internet of things smart sensors. IEEE Internet Things J. 5, 2 (2018), 1011–1022. DOI:

[49]

D. E. Ratnawati, Marjono Widodo, and S. Anam. 2020. Comparison of activation function on extreme learning machine (ELM) performance for classifying the active compound. AIP Conf. Proc. 2264, (Sept. 2020). DOI:

[50]

A. Amirsoleimani et al. 2020. In-memory vector-matrix multiplication in monolithic complementary metal–oxide–semiconductor-memristor integrated circuits: Design choices, challenges, and perspectives. Adv. Intell. Syst. 2, 11 (2020), 2000115. DOI:

[51]

S. Zhang, G. L. Zhang, B. Li, H. H. Li, and U. Schlichtmann. 2020. Lifetime enhancement for RRAM-based computing-in-memory engine considering aging and thermal effects. Proc. - 2020 IEEE Int. Conf. Artif. Intell. Circuits Syst. AICAS 2020. 11–15. DOI:

[52]

Y. Ma, C. Zhang, and P. Zhou. 2021. Efficient techniques for extending service time for memristor-based neural networks. 2021 IEEE Asia Pacific Conf. Circuits Syst. APCCAS 2021 2021 IEEE Conf. Postgrad. Res. Microelectron. Electron. PRIMEASIA 2021. 81–84. DOI:

[53]

X. Liu and Z. Zeng. 2022. Memristor crossbar architectures for implementing deep neural networks. Complex Intell. Syst. 8, 2 (2022), 787–802. DOI:

[54]

K. Simonyan and A. Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. 3rd Int. Conf. Learn. Represent. ICLR 2015 - Conf. Track Proc. 1–14.

[55]

A. Krizhevsky, I. Sutskever, and G. E. Hinton. 2017. ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 6 (2017), 84–90. DOI:

Digital Library

[56]

C. Szegedy et al. 2015. Going deeper with convolutions. Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. vol. 07-12-June, 1–9. DOI:

[57]

C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. 2016. Rethinking the inception architecture for computer vision. Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (2016), 2818–2826. DOI:

[58]

G. S. Ravi and M. H. Lipasti. 2017. CHARSTAR: Clock hierarchy aware resource scaling in tiled architectures. Proc. - Int. Symp. Comput. Archit. vol. Part F1286 (2017), 147–160. DOI:

Digital Library

Cited By

Chen SCai CZheng SLi JZhu GLi JYan YDai YYin WWang L(2024)HierCGRA: A Novel Framework for Large-scale CGRA with Hierarchical Modeling and Automated Design Space ExplorationACM Transactions on Reconfigurable Technology and Systems10.1145/365617617:2(1-31)Online publication date: 10-May-2024
https://dl.acm.org/doi/10.1145/3656176
Chen SMao YDai YGao XLuk WYin WWang L(2024)FCE: A Fast CGRA Architecture Exploration Framework2024 IEEE 17th International Conference on Solid-State & Integrated Circuit Technology (ICSICT)10.1109/ICSICT62049.2024.10832017(1-3)Online publication date: 22-Oct-2024
https://doi.org/10.1109/ICSICT62049.2024.10832017
Hassan ZOmetov ALohan ENurmi J(2024)Coarse-grained reconfigurable architectures for radio baseband processing: A surveyJournal of Systems Architecture10.1016/j.sysarc.2024.103243154(103243)Online publication date: Sep-2024
https://doi.org/10.1016/j.sysarc.2024.103243

Index Terms

Memristive-based Mixed-signal CGRA for Accelerating Deep Neural Network Inference
1. Hardware
  1. Emerging technologies
  2. Very large scale integration design
    1. Analog and mixed-signal circuits

Recommendations

A High-Performance Reconfigurable Accelerator for Convolutional Neural Networks
ICMSSP '18: Proceedings of the 3rd International Conference on Multimedia Systems and Signal Processing

In this paper, we propose a new high-performance accelerator that supports a variety of convolutional neural networks (CNNs) such as GoogLeNet, ResNet and AlexNet. The proposed accelerator mainly includes 24 parallel PEs (processing engines) for ...
An FPGA-based accelerator platform implements for convolutional neural network
HP3C '19: Proceedings of the 3rd International Conference on High Performance Compilation, Computing and Communications

In recent years, convolutional neural network (CNN) has become widely universal in large number of applications including computer vision, natural language processing and automatic driving. However, the CNN-based methods are computational-intensive and ...
A novel zero weight/activation-aware hardware architecture of convolutional neural network
DATE '17: Proceedings of the Conference on Design, Automation & Test in Europe

It is imperative to accelerate convolutional neural networks (CNNs) due to their ever-widening application areas from server, mobile to IoT devices. Based on the fact that CNNs can be characterized by a significant amount of zero values in both kernel ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Design Automation of Electronic Systems

ACM Transactions on Design Automation of Electronic Systems Volume 28, Issue 4

July 2023

432 pages

ISSN:1084-4309

EISSN:1557-7309

DOI:10.1145/3597460

Editor:
X. Sharon Hu
University of Notre Dame, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 18 July 2023

Online AM: 03 May 2023

Accepted: 26 April 2023

Revised: 25 March 2023

Received: 28 December 2022

Published in TODAES Volume 28, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
516
Total Downloads

Downloads (Last 12 months)230
Downloads (Last 6 weeks)18

Reflects downloads up to 18 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Chen SCai CZheng SLi JZhu GLi JYan YDai YYin WWang L(2024)HierCGRA: A Novel Framework for Large-scale CGRA with Hierarchical Modeling and Automated Design Space ExplorationACM Transactions on Reconfigurable Technology and Systems10.1145/365617617:2(1-31)Online publication date: 10-May-2024
https://dl.acm.org/doi/10.1145/3656176
Chen SMao YDai YGao XLuk WYin WWang L(2024)FCE: A Fast CGRA Architecture Exploration Framework2024 IEEE 17th International Conference on Solid-State & Integrated Circuit Technology (ICSICT)10.1109/ICSICT62049.2024.10832017(1-3)Online publication date: 22-Oct-2024
https://doi.org/10.1109/ICSICT62049.2024.10832017
Hassan ZOmetov ALohan ENurmi J(2024)Coarse-grained reconfigurable architectures for radio baseband processing: A surveyJournal of Systems Architecture10.1016/j.sysarc.2024.103243154(103243)Online publication date: Sep-2024
https://doi.org/10.1016/j.sysarc.2024.103243

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents