skip to main content
research-article

Memristive-based Mixed-signal CGRA for Accelerating Deep Neural Network Inference

Published: 18 July 2023 Publication History

Abstract

In this paper, a mixed-signal coarse-grained reconfigurable architecture (CGRA) for accelerating inference in deep neural networks (DNNs) is presented. It is based on performing dot-product computations using analog computing to achieve a considerable speed improvement. Other computations are performed digitally. In the proposed structure (called MX-CGRA), analog tiles consisting of memristor crossbars are employed. To reduce the overhead of converting the data between analog and digital domains, we utilize a proper interface between the analog and digital tiles. In addition, the structure benefits from an efficient memory hierarchy where the data is moved as close as possible to the computing fabric. Moreover, to fully utilize the tiles, we define a set of micro instructions to configure the analog and digital domains. Corresponding context words used in the CGRA are determined by these instructions (generated by a companion compiler tool). The efficacy of the MX-CGRA is assessed by modeling the execution of state-of-the-art DNN architectures on this structure. The architectures are used to classify images of the ImageNet dataset. Simulation results show that, compared to the previous mixed-signal DNN accelerators, on average, a higher throughput of 2.35 × is achieved.

References

[1]
C. Badue, R. Guidolini, R. Carneiro, P. Azevedo, V. Cardoso, A. Forechi, L. Jesus, R. Berriel, T. Paixão, F. Mutz, L. de Paula Veronese, T. Oliveira-Santos, and A. De Souza. 2021. Self-driving cars: A survey. Expert Systems with Applications 165 (2021), 113816. DOI:
[2]
P. Kumar Mallick, S. H. Ryu, S. K. Satapathy, S. Mishra, G. N. Nguyen, and P. Tiwari. 2019. Brain MRI image classification for cancer detection using deep wavelet autoencoder-based deep neural network. IEEE Access 7 (2019), 46278–46287. DOI:
[3]
J. Pan, C. Liu, Z. Wang, Y. Hu, and H. Jiang. 2012. Investigation of deep neural networks (DNN) for large vocabulary continuous speech recognition: Why DNN surpasses GMMS in acoustic modeling. 2012 8th Int. Symp. Chinese Spok. Lang. Process. ISCSLP 2012. 301–305. DOI:
[4]
S. A. Mohamed, A. A. Elsayed, Y. F. Hassan, and M. A. Abdou. 2021. Neural machine translation: Past, present, and future. Neural Comput. Appl. 33, 23 (2021), 15919–15931. DOI:
[5]
S. Han et al. 2016. EIE: Efficient inference engine on compressed deep neural network. Proc. - 2016 43rd Int. Symp. Comput. Archit. ISCA 2016 16 (2016), 243–254. DOI:
[6]
D. Dang, J. Dass, and R. Mahapatra. 2018. ConvLight: A convolutional accelerator with memristor integrated photonic computing. Proc. - 24th IEEE Int. Conf. High Perform. Comput. HiPC 2017. 114–123. DOI:
[7]
K. Ando, S. Takamaeda-Yamazaki, M. Ikebe, T. Asai, and M. Motomura. 2017. A multithreaded CGRA for convolutional neural network processing. Circuits Syst. 08, 06 (2017), 149–170. DOI:
[8]
M. Tanomoto, S. Takamaeda-Yamazaki, J. Yao, and Y. Nakashima. 2015. A CGRA-based approach for accelerating convolutional neural networks. Proc. - IEEE 9th Int. Symp. Embed. Multicore/Manycore SoCs, MCSoC 2015. 73–80. DOI:
[9]
O. Akbari, M. Kamal, A. Afzali-Kusha, M. Pedram, and M. Shafique. 2018. PX-CGRA: Polymorphic approximate coarse-grained reconfigurable architecture. Proc. 2018 Des. Autom. Test Eur. Conf. Exhib. DATE 2018. 413–418. DOI:
[10]
C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi. 2017. Inception-v4, inception-ResNet and the impact of residual connections on learning. 31st AAAI Conf. Artif. Intell. AAAI 2017. 4278–4284.
[11]
A. Shafiee et al. 2016. ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. Proc. - 2016 43rd Int. Symp. Comput. Archit. ISCA 2016. 14–26. DOI:
[12]
A. Rodriguez-Vazquez, A. Dominguez-Castro, A. Rueda, J. L.Huertas, and E. Sanchez-Sinencio. 1990. Nonlinear switched capacitor ‘neural’ networks for optimization problems. IEEE Transactions on Circuits and Systems 37, 3 (1990), 384–398.
[13]
A. Tripathi, M. Arabizadeh, S. Khandelwal, and C. S. Thakur. 2019. Analog neuromorphic system based on multi input floating gate MOS neuron model. Proc. - IEEE Int. Symp. Circuits Syst. DOI:
[14]
A. Ankit et al. 2019. PUMA: A programmable ultra-efficient memristor-based accelerator for machine learning inference. Int. Conf. Archit. Support Program. Lang. Oper. Syst. - ASPLOS (2019), 715–731. DOI:
[15]
S. Mittal. 2018. A survey of ReRAM-based architectures for processing-in-memory and neural networks. Mach. Learn. Knowl. Extr. 1, 1 (2018), 75–114. DOI:
[16]
M. Ansari, A. Fayyazi, M. Kamal, A. Afzali-Kusha, and M. Pedram. 2019. OCTAN: An on-chip training algorithm for memristive neuromorphic circuits. IEEE Trans. Circuits Syst. I: Regul. Pap. 66, 12 (2019), 4687–4698. DOI:
[17]
P. Yao et al. 2020. Fully hardware-implemented memristor convolutional neural network. Nature 577, 7792 (2020), 641–646. DOI:
[18]
Y. C. Xiang et al. 2019. Analog deep neural network based on NOR flash computing array for high speed/energy efficiency computation. Proc. - IEEE Int. Symp. Circuits Syst. 7–10. DOI:
[19]
P. Srivastava et al. 2018. PROMISE: An end-to-end design of a programmable mixed-signal accelerator for machine-learning algorithms. Proc. - Int. Symp. Comput. Archit. (2018) 43–56. DOI:
[20]
G. Yuan et al. 2021. FORMS: Fine-grained polarized ReRAM-based in-situ computation for mixed-signal DNN accelerator. Proc. - Int. Symp. Comput. Archit. 265–278. DOI:
[21]
C. Deng, Y. Sui, S. Liao, X. Qian, and B. Yuan. 2021. GoSPA: An energy-efficient high-performance globally optimized SParse convolutional neural network accelerator. Proc. - Int. Symp. Comput. Archit. 1110–1123. DOI:
[22]
M. Bavandpour, M. R. Mahmoodi, and D. B. Strukov. 2020. aCortex: An energy-efficient multipurpose mixed-signal inference accelerator. IEEE J. Explor. Solid-State Comput. Devices Circuits 6, 1 (2020), 98–106. DOI:
[23]
B. Zhang et al. 2022. PIMCA: A Programmable In-Memory Computing Accelerator for Energy-Efficient DNN Inference. (2022).
[24]
S. Yin, Z. Jiang, M. Kim, T. Gupta, M. Seok, and J. S. Seo. 2020. Vesti: Energy-efficient in-memory computing accelerator for deep neural networks. IEEE Trans. Very Large Scale Integr. Syst. 28, 1 (2020), 48–61. DOI:
[25]
P. Chi et al. 2016. PRIME: A novel processing-in-memory architecture for neural network computation in ReRAM-Based main memory. Proc. - 2016 43rd Int. Symp. Comput. Archit. ISCA 2016. 27–39. DOI:
[26]
X. Liu et al. 2015. RENO: A high-efficient reconfigurable neuromorphic computing accelerator design. Proc. - Des. Autom. Conf. DOI:
[27]
D. J. Mountain, M. R. McLean, and C. D. Krieger. 2018. Memristor crossbar tiles in a flexible, general purpose neural processor. IEEE J. Emerg. Sel. Top. Circuits Syst. 8, 1 (2018), 137–145. DOI:
[28]
A. Ankit et al. 2020. PANTHER: A programmable architecture for neural network training harnessing energy-efficient ReRAM. IEEE Trans. Comput. 69, 8 (2020), 1128–1142. DOI:
[29]
Y. H. Chen, T. Krishna, J. S. Emer, and V. Sze. 2017. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J. Solid-State Circuits 52, 1 (2017), 127–138. DOI:
[30]
Y. H. Chen, T. J. Yang, J. S. Emer, and V. Sze. 2019. Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices. IEEE J. Emerg. Sel. Top. Circuits Syst. 9, 2 (2019), 292–308. DOI:
[31]
A. Aimar et al. 2019. NullHop: A flexible convolutional neural network accelerator based on sparse representations of feature maps. IEEE Trans. Neural Networks Learn. Syst. 30, 3 (2019), 644–656. DOI:
[32]
S. M. A. H. Jafri et al. 2014. NeuroCGRA: A CGRA with support for neural networks. Proc. 2014 Int. Conf. High Perform. Comput. Simulation, HPCS 2014, 1, c, 506–511. DOI:
[33]
Y. Inagaki, S. Takamaeda-Yamazaki, J. Yao, and Y. Nakashima. 2014. Performance evaluation of a 3D-stencil library for distributed memory array accelerators. Proc. - 2014 2nd Int. Symp. Comput. Networking, CANDAR 2014. 388–393. DOI:
[34]
J. Pei et al. 2019. Towards artificial general intelligence with hybrid Tianjic chip architecture. Nature 572, 7767 (2019), 106–111. DOI:
[35]
I. Bae, B. Harris, H. Min, and B. Egger. 2018. Auto-tuning CNNs for coarse-grained reconfigurable array-based accelerators. IEEE Trans. Comput. Des. Integr. Circuits Syst. 37, 11 (2018), 2301–2310. DOI:
[36]
M. Karunaratne, A. K. Mohite, T. Mitra, and L. S. Peh. 2017. HyCUBE: A CGRA with reconfigurable single-cycle multi-hop interconnect. Proc. - Des. Autom. Conf., Part 12828. DOI:
[37]
H. Afzali-Kusha, O. Akbari, M. Kamal, and M. Pedram. 2018. Energy and reliability improvement of voltage-based, clustered, coarse-grain reconfigurable architectures by employing quality-aware mapping. IEEE J. Emerg. Sel. Top. Circuits Syst. 8, 3 (2018), 480–493. DOI:
[38]
D. Liu et al. 2019. Data-flow graph mapping optimization for CGRA with deep reinforcement learning. IEEE Trans. Comput. Des. Integr. Circuits Syst. 38, 12 (2019), 2271–2283. DOI:
[39]
J. Yang, M. Rao, H. Tang et al. 2020. Thousands of conductance levels in memristors monolithically integrated on CMOS. Research Square (2022). DOI:
[40]
A. Gholami, S. Kim, Z. Dong, Z. Yao, M. W. Mahoney, and K. Keutzer. 2022. A survey of quantization methods for efficient neural network inference. Low-Power Comput. Vis. (2022). 291–326. DOI:
[41]
B. Jacob et al. 2018. Quantization and training of neural networks for efficient integer-arithmetic-only inference. Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 2704–2713. DOI:
[42]
T. Liang, J. Glossner, L. Wang, S. Shi, and X. Zhang. 2021. Pruning and quantization for deep neural network acceleration: A survey. Neurocomputing 461 (2021), 370–403. DOI:
[43]
A. BanaGozar, M. A. Maleki, M. Kamal, A. Afzali-Kusha, and M. Pedram. 2017. Robust neuromorphic computing in the presence of process variation. Proc. 2017 Des. Autom. Test Eur. DATE 2017. 440–445. DOI:
[44]
B. Liu, H. Li, Y. Chen, X. Li, Q. Wu, and T. Huang. 2015. Vortex: Variation-aware training for memristor X-bar. Proc. - Des. Autom. Conf. 2015-July, c, 1–6. DOI:
[45]
S. Vahdat, M. Kamal, A. Afzali-Kusha, and M. Pedram. 2021. Loading-aware reliability improvement of ultra-low power memristive neural networks. IEEE Trans. Circuits Syst. I: Regul. Pap. 68, 8 (2021), 3411–3421. DOI:
[46]
S. Vahdat, M. Kamal, A. Afzali-Kusha, and M. Pedram. 2021. Reliability enhancement of inverter-based memristor crossbar neural networks using mathematical analysis of circuit non-idealities. IEEE Trans. Circuits Syst. I: Regul. Pap. 68, 10 (Aug. 2021), 4310–4323. DOI:
[47]
S. Vahdat, M. Kamal, A. Afzali-Kusha, and M. Pedram. 2021. LATIM: Loading-aware offline training method for inverter-based memristive neural networks. IEEE Trans. Circuits Syst. II Express Briefs 68, 10 (2021), 3346–3350. DOI:
[48]
A. Fayyazi, M. Ansari, M. Kamal, A. Afzali-Kusha, and M. Pedram. 2018. An ultra low-power memristive neuromorphic circuit for internet of things smart sensors. IEEE Internet Things J. 5, 2 (2018), 1011–1022. DOI:
[49]
D. E. Ratnawati, Marjono Widodo, and S. Anam. 2020. Comparison of activation function on extreme learning machine (ELM) performance for classifying the active compound. AIP Conf. Proc. 2264, (Sept. 2020). DOI:
[50]
A. Amirsoleimani et al. 2020. In-memory vector-matrix multiplication in monolithic complementary metal–oxide–semiconductor-memristor integrated circuits: Design choices, challenges, and perspectives. Adv. Intell. Syst. 2, 11 (2020), 2000115. DOI:
[51]
S. Zhang, G. L. Zhang, B. Li, H. H. Li, and U. Schlichtmann. 2020. Lifetime enhancement for RRAM-based computing-in-memory engine considering aging and thermal effects. Proc. - 2020 IEEE Int. Conf. Artif. Intell. Circuits Syst. AICAS 2020. 11–15. DOI:
[52]
Y. Ma, C. Zhang, and P. Zhou. 2021. Efficient techniques for extending service time for memristor-based neural networks. 2021 IEEE Asia Pacific Conf. Circuits Syst. APCCAS 2021 2021 IEEE Conf. Postgrad. Res. Microelectron. Electron. PRIMEASIA 2021. 81–84. DOI:
[53]
X. Liu and Z. Zeng. 2022. Memristor crossbar architectures for implementing deep neural networks. Complex Intell. Syst. 8, 2 (2022), 787–802. DOI:
[54]
K. Simonyan and A. Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. 3rd Int. Conf. Learn. Represent. ICLR 2015 - Conf. Track Proc. 1–14.
[55]
A. Krizhevsky, I. Sutskever, and G. E. Hinton. 2017. ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 6 (2017), 84–90. DOI:
[56]
C. Szegedy et al. 2015. Going deeper with convolutions. Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. vol. 07-12-June, 1–9. DOI:
[57]
C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. 2016. Rethinking the inception architecture for computer vision. Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (2016), 2818–2826. DOI:
[58]
G. S. Ravi and M. H. Lipasti. 2017. CHARSTAR: Clock hierarchy aware resource scaling in tiled architectures. Proc. - Int. Symp. Comput. Archit. vol. Part F1286 (2017), 147–160. DOI:

Cited By

View all
  • (2024)HierCGRA: A Novel Framework for Large-scale CGRA with Hierarchical Modeling and Automated Design Space ExplorationACM Transactions on Reconfigurable Technology and Systems10.1145/365617617:2(1-31)Online publication date: 10-May-2024
  • (2024)FCE: A Fast CGRA Architecture Exploration Framework2024 IEEE 17th International Conference on Solid-State & Integrated Circuit Technology (ICSICT)10.1109/ICSICT62049.2024.10832017(1-3)Online publication date: 22-Oct-2024
  • (2024)Coarse-grained reconfigurable architectures for radio baseband processing: A surveyJournal of Systems Architecture10.1016/j.sysarc.2024.103243154(103243)Online publication date: Sep-2024

Index Terms

  1. Memristive-based Mixed-signal CGRA for Accelerating Deep Neural Network Inference

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Design Automation of Electronic Systems
      ACM Transactions on Design Automation of Electronic Systems  Volume 28, Issue 4
      July 2023
      432 pages
      ISSN:1084-4309
      EISSN:1557-7309
      DOI:10.1145/3597460
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Journal Family

      Publication History

      Published: 18 July 2023
      Online AM: 03 May 2023
      Accepted: 26 April 2023
      Revised: 25 March 2023
      Received: 28 December 2022
      Published in TODAES Volume 28, Issue 4

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Coarse-grained reconfigurable architecture
      2. accelerator
      3. memristor
      4. Convolutional Neural Network

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)230
      • Downloads (Last 6 weeks)18
      Reflects downloads up to 18 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)HierCGRA: A Novel Framework for Large-scale CGRA with Hierarchical Modeling and Automated Design Space ExplorationACM Transactions on Reconfigurable Technology and Systems10.1145/365617617:2(1-31)Online publication date: 10-May-2024
      • (2024)FCE: A Fast CGRA Architecture Exploration Framework2024 IEEE 17th International Conference on Solid-State & Integrated Circuit Technology (ICSICT)10.1109/ICSICT62049.2024.10832017(1-3)Online publication date: 22-Oct-2024
      • (2024)Coarse-grained reconfigurable architectures for radio baseband processing: A surveyJournal of Systems Architecture10.1016/j.sysarc.2024.103243154(103243)Online publication date: Sep-2024

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media