skip to main content
10.1145/3195970.3195989acmconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article

An ultra-low energy internally analog, externally digital vector-matrix multiplier based on NOR flash memory technology

Published: 24 June 2018 Publication History

Abstract

Vector-matrix multiplication (VMM) is a core operation in many signal and data processing algorithms. Previous work showed that analog multipliers based on nonvolatile memories have superior energy efficiency as compared to digital counterparts at low-to-medium computing precision. In this paper, we propose extremely energy efficient analog mode VMM circuit with digital input/output interface and configurable precision. Similar to some previous work, the computation is performed by gate-coupled circuit utilizing embedded floating gate (FG) memories. The main novelty of our approach is an ultra-low power sensing circuitry, which is designed based on translinear Gilbert cell in topological combination with a floating resistor and a low-gain amplifier. Additionally, the digital-to-analog input conversion is merged with VMM, while current-mode algorithmic analog-to-digital circuit is employed at the circuit backend. Such implementations of conversion and sensing allow for circuit operation entirely in a current domain, resulting in high performance and energy efficiency. For example, post-layout simulation results for 400×400 5-bit VMM circuit designed in 55 nm process with embedded NOR flash memory, show up to 400 MHz operation, 1.68 POps/J energy efficiency, and 39.45 TOps/mm2 computing throughput. Moreover, the circuit is robust against process-voltage-temperature variations, in part due to inclusion of additional FG cells that are utilized for offset compensation.

References

[1]
Edward Lee and Simon Wong. 2017. Analysis and design of a passive switched-capacitor matrix multiplier for approximate computing. IEEE Journal of Solid-State Circuits. 52(1). 261-271.
[2]
Lita Yang and Boris Murmann. 2017. Approximate SRAM for energy-efficient, privacy-preserving convolutional neural networks. IEEE Computer Society Annual Symposium on VLSI (ISVLSI). Bochum. Germany. 689-694.
[3]
Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. 2015. BinaryConnect: Training deep neural networks with binary weights during propagations. Advances in Neural Information Processing Systems. Montreal. Canada. 3123-3131.
[4]
Jennifer Hasler and Bo Marr. 2013. Finding a roadmap to achieve large neuromorphic hardware systems. Frontiers in Neuroscience. 10(7). 113.
[5]
Brian Degnan, Bo Marr, and Jennifer Hasler. 2016. Assessing trends in performance per watt for signal processing applications. IEEE Transactions on Very Large Scale Integration (VLSI) Systems. 24(1). 58-66.
[6]
Roman Genov and Gert Cauwenberghs. 2001. Charge-mode parallel architecture for vector-matrix multiplication. IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing. 48(10). 930-936.
[7]
Laura Fick, David Blaauw, and Dennis Sylvester. 2017. Analog in-memory subthreshold deep neural network accelerator. IEEE Custom Integrated Circuits Conference (CICC). Austin. TX. 1-4.
[8]
Xinjie Guo, et al. 2017. Fast, energy-efficient, robust, and reproducible mixed-signal neuromorphic classifier based on embedded NOR flash memory technology. International Electron Device Meeting (IEDM). San Francisco. CA. 1-4.
[9]
Shantanu Chakrabartty and Gert Cauwenberghs. 2007. Sub-microwatt analog VLSI trainable pattern classifier. IEEE Journal of Solid-State Circuits. 42(5). 1169-1179.
[10]
Siddharth Joshi, Chul Kim, Sohmyung Ha, and Gert Gauwenberghs. 2017. From algorithms to devices: Enabling machine learning through ultra-low-power VLSI mixed-signal array processing. IEEE Custom Integrated Circuits Conference (CICC). Austin. TX. 1-4.
[11]
Mirko Prezioso, et al. 2016. Training and operation of an integrated neuromorphic network based on metal-oxide memristors. Nature. 521. 61-64.
[12]
Miao Hu, et al. 2016. Dot-product engine for neuromorphic computing: programming 1T1M crossbar to accelerate matrix-vector multiplication. Design Automation Conference (DAC). Austin. TX. 1-6.
[13]
Robert D'Angelo and Sameer Sonkusale. 2015. A time-mode translinear principle for nonlinear analog computation. IEEE Transactions on Circuits and Systems I: Regular Papers. 62(9). 2187-2195.
[14]
Daniel Bankman and Boris Murmann. 2016. An 8-bit, 16 input, 3.2 pJ/Op switched-capacitor dot product circuit in 28-nm FDSOI CMOS. Solid-State Circuits Conference (A-SSCC). Toyama. Japan. 21-24.
[15]
Jonathan Binas, et al. 2016. Precise deep neural network computation on imprecise low-power analog hardware. arXiv:1606.07786.
[16]
Shubha Ramakrishnan and Jennifer Hasler. 2014. Vector-matrix multiply and winner-take-all as an analog classifier. IEEE Transactions on Very Large Scale Integration (VLSI) Systems. 22(2). 353-361.
[17]
Xinjie Guo, et al. 2017. Temperature-insensitive analog vector-by-matrix multiplier based on 55-nm NOR flash memory cells. IEEE Custom Integrated Circuits Conference (CICC). Austin. TX. 1-4.
[18]
Farnood Merrikh Bayat, et al. 2015. Redesigning commercial floating-gate memory for analog computing applications. IEEE International Symposium on Circuits and Systems. Lisbon. Portugal. 1921-1924.
[19]
Nhan Do. 2018. Split-gate floating poly SuperFlash® memory technology, design, and reliability. Embedded Flash Memory for Embedded Systems: Technology, Design for Sub-systems, and Innovations. 131-178.
[20]
David Narin and Andre Salama. 1990. Current-mode algorithmic analog-to-digital converters. IEEE Journal of Solid-State Circuits. 25(4). 997-1004.
[21]
Min Kim and Pavan Hanumolu. 2009. A 10 MS/s 11-bit 0.19 mm2 algorithmic ADC with improved clocking scheme. IEEE Journal of Solid-State Circuits. 44(9). 2348-2355.
[22]
Junjie Lu, Steven Young, Itamar Arel, and Jeremy Holleman. 2015. A 1 TOps/W analog deep machine-learning engine with floating-gate storage in 0.13 μm CMOS. IEEE Journal of Solid-State Circuits. 50(1). 270-281.

Cited By

View all
  • (2024)HO-FPIA: High-Order Field-Programmable Ising Arrays with In-Memory Computing2024 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)10.1109/ISVLSI61997.2024.00054(252-259)Online publication date: 1-Jul-2024
  • (2024)A carbon-nanotube-based tensor processing unitNature Electronics10.1038/s41928-024-01211-27:8(684-693)Online publication date: 22-Jul-2024
  • (2023)Reconfigurable neuromorphic computing block through integration of flash synapse arrays and super-steep neuronsScience Advances10.1126/sciadv.adg91239:29Online publication date: 21-Jul-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DAC '18: Proceedings of the 55th Annual Design Automation Conference
June 2018
1089 pages
ISBN:9781450357005
DOI:10.1145/3195970
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 June 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. analog computing
  2. floating-gate memory
  3. vector-by-matrix multiplier

Qualifiers

  • Research-article

Conference

DAC '18
Sponsor:
DAC '18: The 55th Annual Design Automation Conference 2018
June 24 - 29, 2018
California, San Francisco

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25
62nd ACM/IEEE Design Automation Conference
June 22 - 26, 2025
San Francisco , CA , USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)66
  • Downloads (Last 6 weeks)3
Reflects downloads up to 14 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)HO-FPIA: High-Order Field-Programmable Ising Arrays with In-Memory Computing2024 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)10.1109/ISVLSI61997.2024.00054(252-259)Online publication date: 1-Jul-2024
  • (2024)A carbon-nanotube-based tensor processing unitNature Electronics10.1038/s41928-024-01211-27:8(684-693)Online publication date: 22-Jul-2024
  • (2023)Reconfigurable neuromorphic computing block through integration of flash synapse arrays and super-steep neuronsScience Advances10.1126/sciadv.adg91239:29Online publication date: 21-Jul-2023
  • (2023)Analog Synaptic Devices Based on IGZO Thin‐Film Transistors with a Metal–Ferroelectric–Metal–Insulator–Semiconductor Structure for High‐Performance Neuromorphic SystemsAdvanced Intelligent Systems10.1002/aisy.2023001255:12Online publication date: 28-Sep-2023
  • (2021)Photonic Matrix Computing: From Fundamentals to ApplicationsNanomaterials10.3390/nano1107168311:7(1683)Online publication date: 26-Jun-2021
  • (2021)Combinatorial optimization by weight annealing in memristive hopfield networksScientific Reports10.1038/s41598-020-78944-511:1Online publication date: 12-Aug-2021
  • (2021)In-memory computing with emerging nonvolatile memory devicesScience China Information Sciences10.1007/s11432-021-3327-764:12Online publication date: 4-Nov-2021
  • (2020)Emerging neural workloads and their impact on hardwareProceedings of the 23rd Conference on Design, Automation and Test in Europe10.5555/3408352.3408685(1462-1471)Online publication date: 9-Mar-2020
  • (2020)Efficient Mixed-Signal Neurocomputing Via Successive Integration and RescalingIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2019.294651628:3(823-827)Online publication date: 24-Feb-2020
  • (2020)In-Memory Low-Cost Bit-Serial Addition Using Commodity DRAM TechnologyIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2019.294561767:1(155-165)Online publication date: Jan-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media