research-article

Public Access

ReD-LUT: Reconfigurable In-DRAM LUTs Enabling Massive Parallel Computation

Authors:

Shaahin AngiziAuthors Info & Claims

ICCAD '22: Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design

Article No.: 77, Pages 1 - 8

https://doi.org/10.1145/3508352.3549469

Published: 22 December 2022 Publication History

Abstract

In this paper, we propose a reconfigurable processing-in-DRAM architecture named ReD-LUT leveraging the high density of commodity main memory to enable a flexible, general-purpose, and massively parallel computation. ReD-LUT supports lookup table (LUT) queries to efficiently execute complex arithmetic operations (e.g., multiplication, division, etc.) via only memory read operation. In addition, ReD-LUT enables bulk bit-wise in-memory logic by elevating the analog operation of the DRAM sub-array to implement Boolean functions between operands stored in the same bit-line beyond the scope of prior DRAM-based proposals. We explore the efficacy of ReD-LUT in two computationally-intensive applications, i.e., low-precision deep learning acceleration, and the Advanced Encryption Standard (AES) computation. Our circuit-to-architecture simulation results show that for a quantized deep learning workload, ReD-LUT reduces the energy consumption per image by a factor of 21.4× compared with the GPU and achieves ~37.8× speedup and 2.1× energy-efficiency over the best in-DRAM bit-wise accelerators. As for AES data-encryption, it reduces energy consumption by a factor of ~2.2× compared to an ASIC implementation.

References

[1]

O. Mutlu et al., "Enabling practical processing in and near memory for data-intensive computing," in DAC, 2019, pp. 1--4.

[2]

G. F. Oliveira et al., "Damov: A new methodology and benchmark suite for evaluating data movement bottlenecks," IEEE Access, vol. 9, 2021.

[3]

S. Li et al., "Drisa: A dram-based reconfigurable in-situ accelerator," in MICRO. IEEE, 2017, pp. 288--301.

[4]

V. Seshadri et al., "Ambit: In-memory accelerator for bulk bitwise operations using commodity dram technology," in Micro. ACM, 2017, pp. 273--287.

[5]

S. Angizi and D. Fan, "Graphide: A graph processing accelerator leveraging in-dram-computing," in GLSVLSI, 2019, pp. 45--50.

[6]

J. D. Ferreira et al., "pluto: In-dram lookup tables to enable massively parallel general-purpose computation," arXiv preprint arXiv:2104.07699, 2021.

[7]

S. Angizi and D. Fan, "Redram: A reconfigurable processing-in-dram platform for accelerating bulk bit-wise operations," in ICCAD. IEEE, 2019, pp. 1--8.

[8]

C. Eckert et al., "Neural cache: Bit-serial in-cache acceleration of deep neural networks," in ISCA. IEEE, 2018, pp. 383--396.

[9]

Q. Deng et al., "Lacc: Exploiting lookup table-based fast and accurate vector multiplication in dram-based cnn accelerator," in DAC, 2019, pp. 1--6.

[10]

R. Zhou et al., "Flexidram: A flexible in-dram framework to enable parallel general-purpose computation," in ISLPED, 2022, pp. 1--6.

[11]

M. F. Ali et al., "In-memory low-cost bit-serial addition using commodity dram technology," IEEE TCAS I: Regular Papers, vol. 67, pp. 155--165, 2019.

[12]

N. Hajinazar et al., "Simdram: a framework for bit-serial simd processing using dram," in asplos, 2021, pp. 329--345.

[13]

P. R. Sutradhar et al., "ppim: A programmable processor-in-memory architecture with precision-scaling for deep learning," IEEE CAL, vol. 19, 2020.

[14]

V. Seshadri et al., "Rowclone: Fast and energy-efficient in-dram bulk data copy and initialization," in Micro, 2013, pp. 185--197.

[15]

G. Sideris, "Intel 1103-mos memory that defied cores," Electronics, vol. 46, pp. 108--113, 1973.

[16]

T. Kuroda et al., "A 0.9-v, 150-mhz, 10-mw, 4 mm/sup 2/, 2-d discrete cosine transform core processor with variable threshold-voltage (vt) scheme," IEEE JSSC, vol. 31, pp. 1770--1779, 1996.

[17]

(2018) Parallel thread execution isa version 6.1. [Online]. Available: http://docs.nvidia.com/cuda/parallel-thread-execution/index.html

[18]

(2011) Ncsu eda freepdk45. [Online]. Available: http://www.eda.ncsu.edu/wiki/FreePDK45:Contents

[19]

S. D. C. P. V. Synopsys, Inc.

[20]

S. Thoziyoor et al., "A comprehensive memory modeling tool and its application to the design and analysis of future memory hierarchies," ACM SIGARCH Computer Architecture News, vol. 36, no. 3, pp. 51--62, 2008.

Digital Library

[21]

N. Binkert et al., "The gem5 simulator," ACM SIGARCH computer architecture news, vol. 39, pp. 1--7, 2011.

Digital Library

[22]

A. Krizhevsky et al., "Imagenet classification with deep convolutional neural networks," Advances in neural information processing systems, vol. 25, 2012.

[23]

S. Zhou et al., "Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients," arXiv preprint arXiv:1606.06160, 2016.

[24]

M. Rastegari et al., "Xnor-net: Imagenet classification using binary convolutional neural networks," in ECCV. Springer, 2016, pp. 525--542.

[25]

Y. Wang et al., "Dw-aes: A domain-wall nanowire-based aes for high throughput and energy-efficient data encryption in non-volatile memory," IEEE TIFS, 2016.

[26]

Z. Abid et al., "Efficient cmol gate designs for cryptography applications," IEEE TNANO, vol. 8, no. 3, pp. 315--321, 2009.

Digital Library

[27]

S. Li et al., "Mcpat: an integrated power, area, and timing modeling framework for multicore and manycore architectures," in MICRO. ACM, 2009, pp. 469--480.

Cited By

Zhang FAngizi SSun JZhang WFan D(2023)Aligner-D: Leveraging In-DRAM Computing to Accelerate DNA Short Read AlignmentIEEE Journal on Emerging and Selected Topics in Circuits and Systems10.1109/JETCAS.2023.324154513:1(332-343)Online publication date: Mar-2023
https://doi.org/10.1109/JETCAS.2023.3241545

Index Terms

ReD-LUT: Reconfigurable In-DRAM LUTs Enabling Massive Parallel Computation
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Reconfigurable computing
2. Hardware
  1. Integrated circuits
    1. Semiconductor memory
      1. Dynamic memory

Recommendations

N-port memory mapping for LUT-based FPGAs
FPGA '09: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays

As current FPGAs grow in logic capacity, they are widely used to implement entire systems. In some specific applications, such as our embedded multi-core processor TriBA[1],user memory models are not limited to single-port or dual-port. Thus, we need a ...
An Almost Fully RRAM-Based LUT Design for Reconfigurable Circuits
Applied Reconfigurable Computing. Architectures, Tools, and Applications
Abstract
In the last decade, resistive random-access memory (RRAM) has been used in designing field-programmable gate arrays (FPGAs). The non-volatility of RRAM has made it a promising substitute for the traditional static random-access memory (SRAM) in ...
CAD Tool Flow for Variation-Tolerant Non-Volatile STT-MRAM LUT based FPGA
ICSCA '18: Proceedings of the 2018 7th International Conference on Software and Computer Applications

Field Programmable Gate Array (FPGA) is a reconfigurable circuit and it is used for various applications such as image processing, digital signal processing and neural network. FPGA adopts a logic circuit called Look-Up Table (LUT) as a basic circuit ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICCAD '22: Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design

October 2022

1467 pages

ISBN:9781450392174

DOI:10.1145/3508352

Conference Chair:
Tulika Mitra
National University of Singapore
,
Program Chairs:
Evangeline Young
The Chinese University of Hong Kong
,
Jinjun Xiong
University at Buffalo (UB)

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGDA: ACM Special Interest Group on Design Automation

In-Cooperation

IEEE-EDS: Electronic Devices Society
IEEE CAS
IEEE CEDA

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 December 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

ICCAD '22

Sponsor:

SIGDA

ICCAD '22: IEEE/ACM International Conference on Computer-Aided Design

October 30 - November 3, 2022

California, San Diego

Acceptance Rates

Overall Acceptance Rate 457 of 1,762 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
472
Total Downloads

Downloads (Last 12 months)268
Downloads (Last 6 weeks)36

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang FAngizi SSun JZhang WFan D(2023)Aligner-D: Leveraging In-DRAM Computing to Accelerate DNA Short Read AlignmentIEEE Journal on Emerging and Selected Topics in Circuits and Systems10.1109/JETCAS.2023.324154513:1(332-343)Online publication date: Mar-2023
https://doi.org/10.1109/JETCAS.2023.3241545

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten