skip to main content
10.1145/3508352.3549469acmconferencesArticle/Chapter ViewAbstractPublication PagesiccadConference Proceedingsconference-collections
research-article
Public Access

ReD-LUT: Reconfigurable In-DRAM LUTs Enabling Massive Parallel Computation

Published: 22 December 2022 Publication History

Abstract

In this paper, we propose a reconfigurable processing-in-DRAM architecture named ReD-LUT leveraging the high density of commodity main memory to enable a flexible, general-purpose, and massively parallel computation. ReD-LUT supports lookup table (LUT) queries to efficiently execute complex arithmetic operations (e.g., multiplication, division, etc.) via only memory read operation. In addition, ReD-LUT enables bulk bit-wise in-memory logic by elevating the analog operation of the DRAM sub-array to implement Boolean functions between operands stored in the same bit-line beyond the scope of prior DRAM-based proposals. We explore the efficacy of ReD-LUT in two computationally-intensive applications, i.e., low-precision deep learning acceleration, and the Advanced Encryption Standard (AES) computation. Our circuit-to-architecture simulation results show that for a quantized deep learning workload, ReD-LUT reduces the energy consumption per image by a factor of 21.4× compared with the GPU and achieves ~37.8× speedup and 2.1× energy-efficiency over the best in-DRAM bit-wise accelerators. As for AES data-encryption, it reduces energy consumption by a factor of ~2.2× compared to an ASIC implementation.

References

[1]
O. Mutlu et al., "Enabling practical processing in and near memory for data-intensive computing," in DAC, 2019, pp. 1--4.
[2]
G. F. Oliveira et al., "Damov: A new methodology and benchmark suite for evaluating data movement bottlenecks," IEEE Access, vol. 9, 2021.
[3]
S. Li et al., "Drisa: A dram-based reconfigurable in-situ accelerator," in MICRO. IEEE, 2017, pp. 288--301.
[4]
V. Seshadri et al., "Ambit: In-memory accelerator for bulk bitwise operations using commodity dram technology," in Micro. ACM, 2017, pp. 273--287.
[5]
S. Angizi and D. Fan, "Graphide: A graph processing accelerator leveraging in-dram-computing," in GLSVLSI, 2019, pp. 45--50.
[6]
J. D. Ferreira et al., "pluto: In-dram lookup tables to enable massively parallel general-purpose computation," arXiv preprint arXiv:2104.07699, 2021.
[7]
S. Angizi and D. Fan, "Redram: A reconfigurable processing-in-dram platform for accelerating bulk bit-wise operations," in ICCAD. IEEE, 2019, pp. 1--8.
[8]
C. Eckert et al., "Neural cache: Bit-serial in-cache acceleration of deep neural networks," in ISCA. IEEE, 2018, pp. 383--396.
[9]
Q. Deng et al., "Lacc: Exploiting lookup table-based fast and accurate vector multiplication in dram-based cnn accelerator," in DAC, 2019, pp. 1--6.
[10]
R. Zhou et al., "Flexidram: A flexible in-dram framework to enable parallel general-purpose computation," in ISLPED, 2022, pp. 1--6.
[11]
M. F. Ali et al., "In-memory low-cost bit-serial addition using commodity dram technology," IEEE TCAS I: Regular Papers, vol. 67, pp. 155--165, 2019.
[12]
N. Hajinazar et al., "Simdram: a framework for bit-serial simd processing using dram," in asplos, 2021, pp. 329--345.
[13]
P. R. Sutradhar et al., "ppim: A programmable processor-in-memory architecture with precision-scaling for deep learning," IEEE CAL, vol. 19, 2020.
[14]
V. Seshadri et al., "Rowclone: Fast and energy-efficient in-dram bulk data copy and initialization," in Micro, 2013, pp. 185--197.
[15]
G. Sideris, "Intel 1103-mos memory that defied cores," Electronics, vol. 46, pp. 108--113, 1973.
[16]
T. Kuroda et al., "A 0.9-v, 150-mhz, 10-mw, 4 mm/sup 2/, 2-d discrete cosine transform core processor with variable threshold-voltage (vt) scheme," IEEE JSSC, vol. 31, pp. 1770--1779, 1996.
[17]
(2018) Parallel thread execution isa version 6.1. [Online]. Available: http://docs.nvidia.com/cuda/parallel-thread-execution/index.html
[18]
(2011) Ncsu eda freepdk45. [Online]. Available: http://www.eda.ncsu.edu/wiki/FreePDK45:Contents
[19]
S. D. C. P. V. Synopsys, Inc.
[20]
S. Thoziyoor et al., "A comprehensive memory modeling tool and its application to the design and analysis of future memory hierarchies," ACM SIGARCH Computer Architecture News, vol. 36, no. 3, pp. 51--62, 2008.
[21]
N. Binkert et al., "The gem5 simulator," ACM SIGARCH computer architecture news, vol. 39, pp. 1--7, 2011.
[22]
A. Krizhevsky et al., "Imagenet classification with deep convolutional neural networks," Advances in neural information processing systems, vol. 25, 2012.
[23]
S. Zhou et al., "Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients," arXiv preprint arXiv:1606.06160, 2016.
[24]
M. Rastegari et al., "Xnor-net: Imagenet classification using binary convolutional neural networks," in ECCV. Springer, 2016, pp. 525--542.
[25]
Y. Wang et al., "Dw-aes: A domain-wall nanowire-based aes for high throughput and energy-efficient data encryption in non-volatile memory," IEEE TIFS, 2016.
[26]
Z. Abid et al., "Efficient cmol gate designs for cryptography applications," IEEE TNANO, vol. 8, no. 3, pp. 315--321, 2009.
[27]
S. Li et al., "Mcpat: an integrated power, area, and timing modeling framework for multicore and manycore architectures," in MICRO. ACM, 2009, pp. 469--480.

Cited By

View all
  • (2023)Aligner-D: Leveraging In-DRAM Computing to Accelerate DNA Short Read AlignmentIEEE Journal on Emerging and Selected Topics in Circuits and Systems10.1109/JETCAS.2023.324154513:1(332-343)Online publication date: Mar-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICCAD '22: Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design
October 2022
1467 pages
ISBN:9781450392174
DOI:10.1145/3508352
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

  • IEEE-EDS: Electronic Devices Society
  • IEEE CAS
  • IEEE CEDA

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 December 2022

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

Conference

ICCAD '22
Sponsor:
ICCAD '22: IEEE/ACM International Conference on Computer-Aided Design
October 30 - November 3, 2022
California, San Diego

Acceptance Rates

Overall Acceptance Rate 457 of 1,762 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)268
  • Downloads (Last 6 weeks)36
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Aligner-D: Leveraging In-DRAM Computing to Accelerate DNA Short Read AlignmentIEEE Journal on Emerging and Selected Topics in Circuits and Systems10.1109/JETCAS.2023.324154513:1(332-343)Online publication date: Mar-2023

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media