research-article

An ultra-low energy internally analog, externally digital vector-matrix multiplier based on NOR flash memory technology

Authors:

M. Reza Mahmoodi,

Dmitri StrukovAuthors Info & Claims

DAC '18: Proceedings of the 55th Annual Design Automation Conference

Article No.: 22, Pages 1 - 6

https://doi.org/10.1145/3195970.3195989

Published: 24 June 2018 Publication History

Abstract

Vector-matrix multiplication (VMM) is a core operation in many signal and data processing algorithms. Previous work showed that analog multipliers based on nonvolatile memories have superior energy efficiency as compared to digital counterparts at low-to-medium computing precision. In this paper, we propose extremely energy efficient analog mode VMM circuit with digital input/output interface and configurable precision. Similar to some previous work, the computation is performed by gate-coupled circuit utilizing embedded floating gate (FG) memories. The main novelty of our approach is an ultra-low power sensing circuitry, which is designed based on translinear Gilbert cell in topological combination with a floating resistor and a low-gain amplifier. Additionally, the digital-to-analog input conversion is merged with VMM, while current-mode algorithmic analog-to-digital circuit is employed at the circuit backend. Such implementations of conversion and sensing allow for circuit operation entirely in a current domain, resulting in high performance and energy efficiency. For example, post-layout simulation results for 400×400 5-bit VMM circuit designed in 55 nm process with embedded NOR flash memory, show up to 400 MHz operation, 1.68 POps/J energy efficiency, and 39.45 TOps/mm² computing throughput. Moreover, the circuit is robust against process-voltage-temperature variations, in part due to inclusion of additional FG cells that are utilized for offset compensation.

References

[1]

Edward Lee and Simon Wong. 2017. Analysis and design of a passive switched-capacitor matrix multiplier for approximate computing. IEEE Journal of Solid-State Circuits. 52(1). 261-271.

[2]

Lita Yang and Boris Murmann. 2017. Approximate SRAM for energy-efficient, privacy-preserving convolutional neural networks. IEEE Computer Society Annual Symposium on VLSI (ISVLSI). Bochum. Germany. 689-694.

[3]

Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. 2015. BinaryConnect: Training deep neural networks with binary weights during propagations. Advances in Neural Information Processing Systems. Montreal. Canada. 3123-3131.

Digital Library

[4]

Jennifer Hasler and Bo Marr. 2013. Finding a roadmap to achieve large neuromorphic hardware systems. Frontiers in Neuroscience. 10(7). 113.

[5]

Brian Degnan, Bo Marr, and Jennifer Hasler. 2016. Assessing trends in performance per watt for signal processing applications. IEEE Transactions on Very Large Scale Integration (VLSI) Systems. 24(1). 58-66.

Digital Library

[6]

Roman Genov and Gert Cauwenberghs. 2001. Charge-mode parallel architecture for vector-matrix multiplication. IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing. 48(10). 930-936.

[7]

Laura Fick, David Blaauw, and Dennis Sylvester. 2017. Analog in-memory subthreshold deep neural network accelerator. IEEE Custom Integrated Circuits Conference (CICC). Austin. TX. 1-4.

[8]

Xinjie Guo, et al. 2017. Fast, energy-efficient, robust, and reproducible mixed-signal neuromorphic classifier based on embedded NOR flash memory technology. International Electron Device Meeting (IEDM). San Francisco. CA. 1-4.

[9]

Shantanu Chakrabartty and Gert Cauwenberghs. 2007. Sub-microwatt analog VLSI trainable pattern classifier. IEEE Journal of Solid-State Circuits. 42(5). 1169-1179.

[10]

Siddharth Joshi, Chul Kim, Sohmyung Ha, and Gert Gauwenberghs. 2017. From algorithms to devices: Enabling machine learning through ultra-low-power VLSI mixed-signal array processing. IEEE Custom Integrated Circuits Conference (CICC). Austin. TX. 1-4.

[11]

Mirko Prezioso, et al. 2016. Training and operation of an integrated neuromorphic network based on metal-oxide memristors. Nature. 521. 61-64.

[12]

Miao Hu, et al. 2016. Dot-product engine for neuromorphic computing: programming 1T1M crossbar to accelerate matrix-vector multiplication. Design Automation Conference (DAC). Austin. TX. 1-6.

Digital Library

[13]

Robert D'Angelo and Sameer Sonkusale. 2015. A time-mode translinear principle for nonlinear analog computation. IEEE Transactions on Circuits and Systems I: Regular Papers. 62(9). 2187-2195.

[14]

Daniel Bankman and Boris Murmann. 2016. An 8-bit, 16 input, 3.2 pJ/Op switched-capacitor dot product circuit in 28-nm FDSOI CMOS. Solid-State Circuits Conference (A-SSCC). Toyama. Japan. 21-24.

[15]

Jonathan Binas, et al. 2016. Precise deep neural network computation on imprecise low-power analog hardware. arXiv:1606.07786.

[16]

Shubha Ramakrishnan and Jennifer Hasler. 2014. Vector-matrix multiply and winner-take-all as an analog classifier. IEEE Transactions on Very Large Scale Integration (VLSI) Systems. 22(2). 353-361.

Digital Library

[17]

Xinjie Guo, et al. 2017. Temperature-insensitive analog vector-by-matrix multiplier based on 55-nm NOR flash memory cells. IEEE Custom Integrated Circuits Conference (CICC). Austin. TX. 1-4.

[18]

Farnood Merrikh Bayat, et al. 2015. Redesigning commercial floating-gate memory for analog computing applications. IEEE International Symposium on Circuits and Systems. Lisbon. Portugal. 1921-1924.

[19]

Nhan Do. 2018. Split-gate floating poly SuperFlash® memory technology, design, and reliability. Embedded Flash Memory for Embedded Systems: Technology, Design for Sub-systems, and Innovations. 131-178.

[20]

David Narin and Andre Salama. 1990. Current-mode algorithmic analog-to-digital converters. IEEE Journal of Solid-State Circuits. 25(4). 997-1004.

[21]

Min Kim and Pavan Hanumolu. 2009. A 10 MS/s 11-bit 0.19 mm2 algorithmic ADC with improved clocking scheme. IEEE Journal of Solid-State Circuits. 44(9). 2348-2355.

[22]

Junjie Lu, Steven Young, Itamar Arel, and Jeremy Holleman. 2015. A 1 TOps/W analog deep machine-learning engine with floating-gate storage in 0.13 μm CMOS. IEEE Journal of Solid-State Circuits. 50(1). 270-281.

Cited By

Bhattacharya THutchinson GPedretti GStrukov D(2024)HO-FPIA: High-Order Field-Programmable Ising Arrays with In-Memory Computing2024 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)10.1109/ISVLSI61997.2024.00054(252-259)Online publication date: 1-Jul-2024
https://doi.org/10.1109/ISVLSI61997.2024.00054
Si JZhang PZhao CLin DXu LXu HLiu LJiang JPeng LZhang Z(2024)A carbon-nanotube-based tensor processing unitNature Electronics10.1038/s41928-024-01211-27:8(684-693)Online publication date: 22-Jul-2024
https://doi.org/10.1038/s41928-024-01211-2
Kwon DWoo SLee KHwang JKim HPark SShin WBae JKim JLee J(2023)Reconfigurable neuromorphic computing block through integration of flash synapse arrays and super-steep neuronsScience Advances10.1126/sciadv.adg91239:29Online publication date: 21-Jul-2023
https://doi.org/10.1126/sciadv.adg9123
Show More Cited By

Recommendations

An Ultra-Low Energy Internally Analog, Externally Digital Vector-Matrix Multiplier Based on NOR Flash Memory Technology
2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)
Vector-matrix multiplication (VMM) is a core operation in many signal and data processing algorithms. Previous work showed that analog multipliers based on nonvolatile memories have superior energy efficiency as compared to digital counterparts at low-to-...
Breaking POps/J Barrier with Analog Multiplier Circuits Based on Nonvolatile Memories
ISLPED '18: Proceedings of the International Symposium on Low Power Electronics and Design

Low-to-medium resolution analog vector-by-matrix multipliers (VMMs) offer a remarkable energy/area efficiency as compared to their digital counterparts. Still, the maximum attainable performance in analog VMMs is often bounded by the overhead of the ...
Ultra-low voltage and low-power voltage-mode DTMOS-based four-quadrant analog multiplier

This paper presents a new four-quadrant analog multiplier based on a dynamic threshold MOS (DTMOS). The attractive features of this DTMOS transistor-based multiplier are its supply voltages and total power consumption, which were determined as 0.2 V and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

DAC '18: Proceedings of the 55th Annual Design Automation Conference

June 2018

1089 pages

ISBN:9781450357005

DOI:10.1145/3195970

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

EDAC: Electronic Design Automation Consortium
SIGDA: ACM Special Interest Group on Design Automation
IEEE Council on Electronic Design Automation (CEDA)

In-Cooperation

SIGBED: ACM Special Interest Group on Embedded Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 June 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

DAC '18

Sponsor:

EDAC
SIGDA

DAC '18: The 55th Annual Design Automation Conference 2018

June 24 - 29, 2018

California, San Francisco

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25

Sponsor:
sigda

62nd ACM/IEEE Design Automation Conference

June 22 - 26, 2025

San Francisco , CA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

13
Total Citations
View Citations
731
Total Downloads

Downloads (Last 12 months)66
Downloads (Last 6 weeks)3

Reflects downloads up to 14 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Bhattacharya THutchinson GPedretti GStrukov D(2024)HO-FPIA: High-Order Field-Programmable Ising Arrays with In-Memory Computing2024 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)10.1109/ISVLSI61997.2024.00054(252-259)Online publication date: 1-Jul-2024
https://doi.org/10.1109/ISVLSI61997.2024.00054
Si JZhang PZhao CLin DXu LXu HLiu LJiang JPeng LZhang Z(2024)A carbon-nanotube-based tensor processing unitNature Electronics10.1038/s41928-024-01211-27:8(684-693)Online publication date: 22-Jul-2024
https://doi.org/10.1038/s41928-024-01211-2
Kwon DWoo SLee KHwang JKim HPark SShin WBae JKim JLee J(2023)Reconfigurable neuromorphic computing block through integration of flash synapse arrays and super-steep neuronsScience Advances10.1126/sciadv.adg91239:29Online publication date: 21-Jul-2023
https://doi.org/10.1126/sciadv.adg9123
Kwon DPark EShin WKoo RHwang JBae JKwon DLee J(2023)Analog Synaptic Devices Based on IGZO Thin‐Film Transistors with a Metal–Ferroelectric–Metal–Insulator–Semiconductor Structure for High‐Performance Neuromorphic SystemsAdvanced Intelligent Systems10.1002/aisy.2023001255:12Online publication date: 28-Sep-2023
https://doi.org/10.1002/aisy.202300125
Cheng JZhou HDong J(2021)Photonic Matrix Computing: From Fundamentals to ApplicationsNanomaterials10.3390/nano1107168311:7(1683)Online publication date: 26-Jun-2021
https://doi.org/10.3390/nano11071683
Fahimi ZMahmoodi MNili HPolishchuk VStrukov D(2021)Combinatorial optimization by weight annealing in memristive hopfield networksScientific Reports10.1038/s41598-020-78944-511:1Online publication date: 12-Aug-2021
https://doi.org/10.1038/s41598-020-78944-5
Cheng CTiw PCai YYan XYang YHuang R(2021)In-memory computing with emerging nonvolatile memory devicesScience China Information Sciences10.1007/s11432-021-3327-764:12Online publication date: 4-Nov-2021
https://doi.org/10.1007/s11432-021-3327-7
Brooks DFrank MGokmen TGupta UHu XJain SLaguna ANiemier MO'Connor IRaghunathan ARanjan AReis DStevens JWu CYin XDi Natale GFummi F(2020)Emerging neural workloads and their impact on hardwareProceedings of the 23rd Conference on Design, Automation and Test in Europe10.5555/3408352.3408685(1462-1471)Online publication date: 9-Mar-2020
https://dl.acm.org/doi/10.5555/3408352.3408685
Bavandpour MSahay SMahmoodi MStrukov D(2020)Efficient Mixed-Signal Neurocomputing Via Successive Integration and RescalingIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2019.294651628:3(823-827)Online publication date: 24-Feb-2020
https://dl.acm.org/doi/10.1109/TVLSI.2019.2946516
Ali MJaiswal ARoy K(2020)In-Memory Low-Cost Bit-Serial Addition Using Commodity DRAM TechnologyIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2019.294561767:1(155-165)Online publication date: Jan-2020
https://doi.org/10.1109/TCSI.2019.2945617
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten