An On-Chip Trainable and Scalable In-Memory ANN Architecture for AI/ML Applications

Kumar, Abhash; Beeraka, Sai Manohar; Singh, Jawar; Gupta, Bharat

doi:10.1007/s00034-022-02237-7

An On-Chip Trainable and Scalable In-Memory ANN Architecture for AI/ML Applications

Published: 18 December 2022

Volume 42, pages 2828–2851, (2023)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abhash Kumar ORCID: orcid.org/0000-0002-8254-6378¹,
Sai Manohar Beeraka¹,
Jawar Singh² &
…
Bharat Gupta¹

544 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Traditional von Neumann architecture-based processors become inefficient in terms of energy and throughput as they involved separate processing and memory units, also known as memory walls. The memory wall problem is further exacerbated when massive parallelism and frequent data movement are required between processing and memory units for real-time implementation of artificial neural networks (ANNs) that enable many intelligent applications. One of the most promising approach to address this problem is to perform computations inside the memory core that enhances the memory bandwidth and energy efficiency. This paper presents an in-memory computing architecture for ANN enabling artificial intelligence (AI) and machine learning (ML) applications. The proposed architecture utilizes a standard six-transistor (6T) static random access memory (SRAM) core for the implementation of a multilayered perceptron. Our novel on-chip training and inference in-memory architecture reduces energy cost and enhances throughput by simultaneously accessing the multiple rows of SRAM array per pre-charge cycle by eliminating frequent data access. The proposed architecture was trained and tested on the IRIS dataset and was observed to consume ≈ 22.46 × less energy/decision compared to earlier DIMA-based classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A 3D MCAM architecture based on flash memory enabling binary neural network computing for edge AI

Article 15 November 2024

NVM Device-Based Deep Inference Architecture Using Self-gated Activation Functions (Swish)

The Memory Challenge in Ultra-Low Power Deep Learning

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

P.G. Emma, Understanding some simple processor-performance limits. IBM J. Res. Dev. 41(3), 215–232 (1997)
Article Google Scholar
Y. Hirai, Hardware implementation of neural networks in Japan. Neurocomputing 5(1), 3–16 (1993)
Article Google Scholar
B. Moons and M. Verhelst, A 0.3–2.6 TOPS/W precision scalable processor for real-time large-scale ConvNets. in 2016 IEEE Symposium on VLSI Circuits (VLSI-Circuits). IEEE (2016).
B. Moons, et al., 14.5 envision: A 0.26-to-10tops/w subword-parallel dynamic-voltage-accuracy-frequency-scalable convolutional neural network processor in 28nm fdsoi. in 2017 IEEE International Solid-State Circuits Conference (ISSCC). IEEE (2017).
M. Price, J. Glass and A.P. Chandrakasan, 14.4 A scalable speech recognizer with deep-neural-network acoustic models and voice-activated power gating. in 2017 IEEE International Solid-State Circuits Conference (ISSCC). IEEE (2017).
P.N. Whatmough, et al., 14.3 A 28nm SoC with a 1.2 GHz 568nJ/prediction sparse deep-neural-network engine with¿ 0.1 timing error rate tolerance for IoT applications. in 2017 IEEE International Solid-State Circuits Conference (ISSCC). IEEE (2017).
M. Kang et al., An in-memory VLSI architecture for convolutional neural networks. IEEE J. Emerg. Selected Topics Circuits Syst. 8(3), 494–505 (2018)
Article Google Scholar
M. Kang, S.K. Gonugondla, N.R. Shanbhag, A 19.4 nJ/decision 364K decisions/s in-memory random forest classifier in 6T SRAM array. in ESSCIRC 2017–43rd IEEE European Solid State Circuits Conference. IEEE, (2017).
J. Zhang, Z. Wang, N. Verma, In-memory computation of a machine-learning classifier in a standard 6T SRAM array. IEEE J. Solid-State Circuits 52(4), 915–924 (2017)
Article Google Scholar
K. Karras et al., A hardware acceleration platform for AI-based inference at the edge. Circuits Syst. Signal Process. 39(2), 1059–1070 (2020)
Article Google Scholar
U.A. Korat, A. Alimohammad, A reconfigurable hardware architecture for principal component analysis. Circuits Syst. Signal Process. 38(5), 2097–2113 (2019)
Article Google Scholar
N. Nedjah et al., Dynamic MAC-based architecture of artificial neural networks suitable for hardware implementation on FPGAs. Neurocomputing 72(10–12), 2171–2179 (2009)
Article Google Scholar
M. Panwar et al., M2DA: a low-complex design methodology for convolutional neural network exploiting data symmetry and redundancy. Circuits Syst. Signal Process. 40(3), 1542–1567 (2021)
Article Google Scholar
E. Won, A hardware implementation of artificial neural networks using field programmable gate arrays. Nucl. Instrum. Methods Phys. Res., Sect. A 581(3), 816–820 (2007)
Article Google Scholar
S.M. Beeraka et al., Accuracy enhancement of epileptic seizure detection: a deep learning approach with hardware realization of STFT. Circuits Syst. Signal Process. 41(1), 461–484 (2022)
Article Google Scholar
O. Krestinskaya, K.N. Salama, and A. P. James, Analog backpropagation learning circuits for memristive crossbar neural networks. in 2018 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, (2018).
A.J. P´erez-Avila, et al., Multilevel memristor based matrix-vector ´ multiplication: influence of the discretization method. in 2021 13th Spanish Conference on Electron Devices (CDE). IEEE (2021).
W. Woods and C. Teuscher. Approximate vector matrix multiplication implementations for neuromorphic applications using memristive crossbars. In 2017 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH). IEEE, (2017).
M.V. Nair, P. Dudek. Gradient-descent-based learning in memristive crossbar arrays. in 2015 International Joint Conference on Neural Networks (IJCNN). IEEE (2015).
L.F. Abbott, S.B. Nelson, Synaptic plasticity: taming the beast. Nat. Neurosci. 3(11), 1178–1183 (2000)
Article Google Scholar
E.R. Kandel, The molecular biology of memory storage: a dialogue between genes and synapses. Science 294(5544), 1030–1038 (2001)
Article Google Scholar
S. Choi et al., SiGe epitaxial memory for neuromorphic computing with reproducible high performance based on engineered dislocations. Nature Mater. 17(4), 335–340 (2018)
Article Google Scholar
S. Park, et al., RRAM-based synapse for neuromorphic system with pattern recognition function. in 2012 International Electron Devices Meeting. IEEE (2012).
Y.-T. Seo et al., Si-based FET-type synaptic device with short-term and long-term plasticity using high-κ gate-stack. IEEE Trans. Electron. Devices 66(2), 917–923 (2019)
Article MathSciNet Google Scholar
M. Kang et al., A multi-functional in-memory inference processor using a standard 6T SRAM array. IEEE J. Solid-State Circuits 53(2), 642–655 (2018)
Article Google Scholar
N. Shanbhag, M. Kang and M. S. Keel, Compute memory. US Patent US9697877B2.[Online]. Available: https://patentsgoogle.com/-patent/US9697877 (2017).
S.K. Gonugondla, M. Kang, N.R. Shanbhag, A variation-tolerant in-memory machine learning classifier via on-chip training. IEEE J. Solid-State Circuits 53(11), 3163–3173 (2018)
Article Google Scholar
A. Kumar, et al., In-memory Implementation of On-chip Trainable and Scalable ANN for AI/ML Applications. arXiv preprint arXiv:2005.09526 (2020).
I. Tsmots, O. Skorokhoda, and V. Rabyk, Hardware implementation of sigmoid activation functions using FPGA. in 2019 IEEE 15th International Conference on the Experience of Designing and Application of CAD Systems (CADSM). IEEE (2019).
A.H. Namin, et al., Efficient hardware implementation of the hyperbolic tangent sigmoid function. in 2009 IEEE International Symposium on Circuits and Systems. IEEE (2009).
I. Kouretas, and V. Paliouras. Simplified hardware implementation of the softmax activation function. in 2019 8th International Conference on Modern Circuits and Systems Technologies (MOCAST). IEEE, (2019).
Predictive Technology Model. http://ptm.asu.edu/
R.A. Fisher, The use of multiple measurements in taxonomic problems. Ann. Eugen. 7(2), 179–188 (1936)
Article Google Scholar
Iris Dataset. https://archive.ics.uci.edu/ml/datasets/iris
Li. Deng, The mnist database of handwritten digit images for machine learning research [best of the web]. IEEE Signal Process. Mag. 29(6), 141–142 (2012)
Article Google Scholar
A. Biswas, A.P. Chandrakasan, Conv-RAM: An energy-efficient SRAM with embedded convolution computation for low-power CNN-based machine learning applications. in 2018 IEEE International Solid-State Circuits Conference-(ISSCC). IEEE, (2018).
W.-S. Khwa, et al., A 65nm 4Kb algorithm-dependent computing-in-memory SRAM unit-macro with 2.3 ns and 55.8 TOPS/W fully parallel product-sum operation for binary DNN edge processors. in 2018 IEEE International Solid-State Circuits Conference-(ISSCC). IEEE, (2018).
A. Sayal et al., A 12.08-TOPS/W all-digital time-domain CNN engine using bi-directional memory delay lines for energy efficient edge computing. IEEE J. Solid-State Circuits 55(1), 60–75 (2019)
Article Google Scholar
P. Harpe, A compact 10-b SAR ADC with unit-length capacitors and a passive FIR filter. IEEE J. Solid-State Circuits 54(3), 636–645 (2018)
Article Google Scholar
Choi, I., et al., An SRAM-based hybrid computation-in-memory macro using current-reused differential CCO. IEEE J. Emerg. Selected Topics Circuits Syst. (2022).
Y. Toyama et al., An 8 bit 12.4 TOPS/W phase-domain MAC circuit for energy-constrained deep learning accelerators. IEEE J. Solid-State Circuits 54(10), 2730–2742 (2019)
Article Google Scholar
X. Si et al., A local computing cell and 6T SRAM-based computing in-memory macro with 8-b MAC operation for edge AI chips. IEEE J. Solid-State Circuits 56(9), 2817–2831 (2021)
Article Google Scholar
H. Xiao, K. Rasul, and R. Vollgraf. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017).
D. Robinson, Comparing pairs of mnist digits based on one pixel. Github. https://gist.github.com/dgrtwo/aaef94ecc6a60cd50322c0054cc04478
I. Goodfellow Instead of moving on to harder datasets than mnist, the ML community is studying it more than ever. even proportional to other datasets https://t.co/tao52vc1fg. Twitter (2017). https://twitter.com/goodfellowian/status/852591106655043584
F. Cholle, Many good ideas will not work well on MNIST. Twitter (2017). https://twitter.com/fchollet/status/852594987527045120

Download references

Author information

Authors and Affiliations

Department of Electronics and Communication Engineering, National Institute of Technology Patna, Patna, India
Abhash Kumar, Sai Manohar Beeraka & Bharat Gupta
Department of Electrical Engineering, Indian Institute of Technology Patna, Bihta, India
Jawar Singh

Authors

Abhash Kumar
View author publications
You can also search for this author inPubMed Google Scholar
Sai Manohar Beeraka
View author publications
You can also search for this author inPubMed Google Scholar
Jawar Singh
View author publications
You can also search for this author inPubMed Google Scholar
Bharat Gupta
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Abhash Kumar.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Kumar, A., Beeraka, S.M., Singh, J. et al. An On-Chip Trainable and Scalable In-Memory ANN Architecture for AI/ML Applications. Circuits Syst Signal Process 42, 2828–2851 (2023). https://doi.org/10.1007/s00034-022-02237-7

Download citation

Received: 11 February 2022
Revised: 07 November 2022
Accepted: 08 November 2022
Published: 18 December 2022
Issue Date: May 2023
DOI: https://doi.org/10.1007/s00034-022-02237-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An On-Chip Trainable and Scalable In-Memory ANN Architecture for AI/ML Applications

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A 3D MCAM architecture based on flash memory enabling binary neural network computing for edge AI

NVM Device-Based Deep Inference Architecture Using Self-gated Activation Functions (Swish)

The Memory Challenge in Ultra-Low Power Deep Learning

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now