Abstract
Traditional von Neumann architecture-based processors become inefficient in terms of energy and throughput as they involved separate processing and memory units, also known as memory walls. The memory wall problem is further exacerbated when massive parallelism and frequent data movement are required between processing and memory units for real-time implementation of artificial neural networks (ANNs) that enable many intelligent applications. One of the most promising approach to address this problem is to perform computations inside the memory core that enhances the memory bandwidth and energy efficiency. This paper presents an in-memory computing architecture for ANN enabling artificial intelligence (AI) and machine learning (ML) applications. The proposed architecture utilizes a standard six-transistor (6T) static random access memory (SRAM) core for the implementation of a multilayered perceptron. Our novel on-chip training and inference in-memory architecture reduces energy cost and enhances throughput by simultaneously accessing the multiple rows of SRAM array per pre-charge cycle by eliminating frequent data access. The proposed architecture was trained and tested on the IRIS dataset and was observed to consume ≈ 22.46 × less energy/decision compared to earlier DIMA-based classifiers.














Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
P.G. Emma, Understanding some simple processor-performance limits. IBM J. Res. Dev. 41(3), 215–232 (1997)
Y. Hirai, Hardware implementation of neural networks in Japan. Neurocomputing 5(1), 3–16 (1993)
B. Moons and M. Verhelst, A 0.3–2.6 TOPS/W precision scalable processor for real-time large-scale ConvNets. in 2016 IEEE Symposium on VLSI Circuits (VLSI-Circuits). IEEE (2016).
B. Moons, et al., 14.5 envision: A 0.26-to-10tops/w subword-parallel dynamic-voltage-accuracy-frequency-scalable convolutional neural network processor in 28nm fdsoi. in 2017 IEEE International Solid-State Circuits Conference (ISSCC). IEEE (2017).
M. Price, J. Glass and A.P. Chandrakasan, 14.4 A scalable speech recognizer with deep-neural-network acoustic models and voice-activated power gating. in 2017 IEEE International Solid-State Circuits Conference (ISSCC). IEEE (2017).
P.N. Whatmough, et al., 14.3 A 28nm SoC with a 1.2 GHz 568nJ/prediction sparse deep-neural-network engine with¿ 0.1 timing error rate tolerance for IoT applications. in 2017 IEEE International Solid-State Circuits Conference (ISSCC). IEEE (2017).
M. Kang et al., An in-memory VLSI architecture for convolutional neural networks. IEEE J. Emerg. Selected Topics Circuits Syst. 8(3), 494–505 (2018)
M. Kang, S.K. Gonugondla, N.R. Shanbhag, A 19.4 nJ/decision 364K decisions/s in-memory random forest classifier in 6T SRAM array. in ESSCIRC 2017–43rd IEEE European Solid State Circuits Conference. IEEE, (2017).
J. Zhang, Z. Wang, N. Verma, In-memory computation of a machine-learning classifier in a standard 6T SRAM array. IEEE J. Solid-State Circuits 52(4), 915–924 (2017)
K. Karras et al., A hardware acceleration platform for AI-based inference at the edge. Circuits Syst. Signal Process. 39(2), 1059–1070 (2020)
U.A. Korat, A. Alimohammad, A reconfigurable hardware architecture for principal component analysis. Circuits Syst. Signal Process. 38(5), 2097–2113 (2019)
N. Nedjah et al., Dynamic MAC-based architecture of artificial neural networks suitable for hardware implementation on FPGAs. Neurocomputing 72(10–12), 2171–2179 (2009)
M. Panwar et al., M2DA: a low-complex design methodology for convolutional neural network exploiting data symmetry and redundancy. Circuits Syst. Signal Process. 40(3), 1542–1567 (2021)
E. Won, A hardware implementation of artificial neural networks using field programmable gate arrays. Nucl. Instrum. Methods Phys. Res., Sect. A 581(3), 816–820 (2007)
S.M. Beeraka et al., Accuracy enhancement of epileptic seizure detection: a deep learning approach with hardware realization of STFT. Circuits Syst. Signal Process. 41(1), 461–484 (2022)
O. Krestinskaya, K.N. Salama, and A. P. James, Analog backpropagation learning circuits for memristive crossbar neural networks. in 2018 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, (2018).
A.J. P´erez-Avila, et al., Multilevel memristor based matrix-vector ´ multiplication: influence of the discretization method. in 2021 13th Spanish Conference on Electron Devices (CDE). IEEE (2021).
W. Woods and C. Teuscher. Approximate vector matrix multiplication implementations for neuromorphic applications using memristive crossbars. In 2017 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH). IEEE, (2017).
M.V. Nair, P. Dudek. Gradient-descent-based learning in memristive crossbar arrays. in 2015 International Joint Conference on Neural Networks (IJCNN). IEEE (2015).
L.F. Abbott, S.B. Nelson, Synaptic plasticity: taming the beast. Nat. Neurosci. 3(11), 1178–1183 (2000)
E.R. Kandel, The molecular biology of memory storage: a dialogue between genes and synapses. Science 294(5544), 1030–1038 (2001)
S. Choi et al., SiGe epitaxial memory for neuromorphic computing with reproducible high performance based on engineered dislocations. Nature Mater. 17(4), 335–340 (2018)
S. Park, et al., RRAM-based synapse for neuromorphic system with pattern recognition function. in 2012 International Electron Devices Meeting. IEEE (2012).
Y.-T. Seo et al., Si-based FET-type synaptic device with short-term and long-term plasticity using high-κ gate-stack. IEEE Trans. Electron. Devices 66(2), 917–923 (2019)
M. Kang et al., A multi-functional in-memory inference processor using a standard 6T SRAM array. IEEE J. Solid-State Circuits 53(2), 642–655 (2018)
N. Shanbhag, M. Kang and M. S. Keel, Compute memory. US Patent US9697877B2.[Online]. Available: https://patentsgoogle.com/-patent/US9697877 (2017).
S.K. Gonugondla, M. Kang, N.R. Shanbhag, A variation-tolerant in-memory machine learning classifier via on-chip training. IEEE J. Solid-State Circuits 53(11), 3163–3173 (2018)
A. Kumar, et al., In-memory Implementation of On-chip Trainable and Scalable ANN for AI/ML Applications. arXiv preprint arXiv:2005.09526 (2020).
I. Tsmots, O. Skorokhoda, and V. Rabyk, Hardware implementation of sigmoid activation functions using FPGA. in 2019 IEEE 15th International Conference on the Experience of Designing and Application of CAD Systems (CADSM). IEEE (2019).
A.H. Namin, et al., Efficient hardware implementation of the hyperbolic tangent sigmoid function. in 2009 IEEE International Symposium on Circuits and Systems. IEEE (2009).
I. Kouretas, and V. Paliouras. Simplified hardware implementation of the softmax activation function. in 2019 8th International Conference on Modern Circuits and Systems Technologies (MOCAST). IEEE, (2019).
Predictive Technology Model. http://ptm.asu.edu/
R.A. Fisher, The use of multiple measurements in taxonomic problems. Ann. Eugen. 7(2), 179–188 (1936)
Iris Dataset. https://archive.ics.uci.edu/ml/datasets/iris
Li. Deng, The mnist database of handwritten digit images for machine learning research [best of the web]. IEEE Signal Process. Mag. 29(6), 141–142 (2012)
A. Biswas, A.P. Chandrakasan, Conv-RAM: An energy-efficient SRAM with embedded convolution computation for low-power CNN-based machine learning applications. in 2018 IEEE International Solid-State Circuits Conference-(ISSCC). IEEE, (2018).
W.-S. Khwa, et al., A 65nm 4Kb algorithm-dependent computing-in-memory SRAM unit-macro with 2.3 ns and 55.8 TOPS/W fully parallel product-sum operation for binary DNN edge processors. in 2018 IEEE International Solid-State Circuits Conference-(ISSCC). IEEE, (2018).
A. Sayal et al., A 12.08-TOPS/W all-digital time-domain CNN engine using bi-directional memory delay lines for energy efficient edge computing. IEEE J. Solid-State Circuits 55(1), 60–75 (2019)
P. Harpe, A compact 10-b SAR ADC with unit-length capacitors and a passive FIR filter. IEEE J. Solid-State Circuits 54(3), 636–645 (2018)
Choi, I., et al., An SRAM-based hybrid computation-in-memory macro using current-reused differential CCO. IEEE J. Emerg. Selected Topics Circuits Syst. (2022).
Y. Toyama et al., An 8 bit 12.4 TOPS/W phase-domain MAC circuit for energy-constrained deep learning accelerators. IEEE J. Solid-State Circuits 54(10), 2730–2742 (2019)
X. Si et al., A local computing cell and 6T SRAM-based computing in-memory macro with 8-b MAC operation for edge AI chips. IEEE J. Solid-State Circuits 56(9), 2817–2831 (2021)
H. Xiao, K. Rasul, and R. Vollgraf. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017).
D. Robinson, Comparing pairs of mnist digits based on one pixel. Github. https://gist.github.com/dgrtwo/aaef94ecc6a60cd50322c0054cc04478
I. Goodfellow Instead of moving on to harder datasets than mnist, the ML community is studying it more than ever. even proportional to other datasets https://t.co/tao52vc1fg. Twitter (2017). https://twitter.com/goodfellowian/status/852591106655043584
F. Cholle, Many good ideas will not work well on MNIST. Twitter (2017). https://twitter.com/fchollet/status/852594987527045120
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Kumar, A., Beeraka, S.M., Singh, J. et al. An On-Chip Trainable and Scalable In-Memory ANN Architecture for AI/ML Applications. Circuits Syst Signal Process 42, 2828–2851 (2023). https://doi.org/10.1007/s00034-022-02237-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00034-022-02237-7