short-paper

Public Access

FlutPIM:: A Look-up Table-based Processing in Memory Architecture with Floating-point Computation Support for Deep Learning Applications

Authors:

Purab Ranjan Sutradhar,

Sathwika Bavikadi,

Sai Manoj Pudukotai Dinakarrao,

Amlan GangulyAuthors Info & Claims

GLSVLSI '23: Proceedings of the Great Lakes Symposium on VLSI 2023

Pages 207 - 211

https://doi.org/10.1145/3583781.3590313

Published: 05 June 2023 Publication History

Abstract

Processing-in-Memory (PIM) has shown great potential for a wide range of data-driven applications, especially Deep Learning and AI. However, it is a challenge to facilitate the computational sophistication of a standard processor (i.e. CPU or GPU) within the limited scope of a memory chip without contributing significant circuit overheads. To address the challenge, we propose a programmable LUT-based area-efficient PIM architecture capable of performing various low-precision floating point (FP) computations using a novel LUT-oriented operand-decomposition technique. We incorporate such compact computational units within the memory banks in a large count to achieve impressive parallel processing capabilities, up to 4x higher than state-of-the-art FP-capable PIM. Additionally, we adopt a highly-optimized low-precision FP format that maximizes computational performance at a minimal compromise of computational precision, especially for Deep Learning Applications. The overall result is a 17% higher throughput and an impressive 8-20x higher compute Bandwidth/bank compared to the state-of-the-art of in-memory acceleration.

References

[1]

2021. High Bandwidth Memory (HBM)DRAM. JEDEC SOLID STATE TECHNOLOGY ASSOCIATION (2021).

[2]

A. Nowatzyk et al. 1996. Missing the Memory Wall: The Case for Processor/Memory Integration. In 23rd Annual International Symposium on Computer Architecture (ISCA'96). 90--90. https://doi.org/10.1109/ISCA.1996.10008

[3]

A. Ramanathan et al. 2020. Look-Up Table based Energy Efficient Processing in Cache Support for Neural Network Acceleration. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 88--101.

[4]

C. Sudarshan et al. 2022. A Critical Assessment ofnbsp;DRAM-PIM Architec- tures - Trends, Challenges andnbsp; Solutions. In Embedded Computer Systems: Architectures, Modeling, and Simulation: 22nd International Conference, SAMOS 2022, Samos, Greece, July 3-7, 2022, Proceedings (Samos, Greece). 362--379.

[5]

C. Sudarshan et al. 2022. Optimization of DRAM based PIM Architecture for Energy-Efficient Deep Neural Network Training. In 2022 IEEE International Symposium on Circuits and Systems (ISCAS). 1472--1476.

[6]

J. Ferreira et al. 2021. pLUTo: In-DRAM Lookup Tables to Enable Massively Paral- lel General-Purpose Computation. CoRR abs/2104.07699 (2021). arXiv:2104.07699

[7]

M. He et al. 2020. Newton: A DRAM-maker's Accelerator-in-Memory (AiM) Architecture for Machine Learning. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 372--385.

[8]

P. Gu et al. 2020. DLUX: a LUT-based Near-Bank Accelerator for Data Center Deep Learning Training Workloads. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2020), 1--1.

[9]

P. Sutradhar et al. 2020. pPIM: A Programmable Processor-in-Memory Architecture With Precision-Scaling for Deep Learning. IEEE Computer Architecture Letters 19, 2 (2020), 118--121.

[10]

Q. Deng et al. 2019. LAcc: Exploiting Lookup Table-based Fast and Accurate Vector Multiplication in DRAM-based CNN Accelerator. In 2019 56th ACM/IEEE Design Automation Conference (DAC).

Digital Library

[11]

S. Li et al. 2017. DRISA: A DRAM-based Reconfigurable In-Situ Accelerator. In 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 288--301.

Digital Library

[12]

S. Li et al. 2018. SCOPE: A Stochastic Computing Engine for DRAM-Based In-Situ Accelerator. In 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 696--709.

[13]

S. Lee et al. 2021. Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology: Industrial Product. In 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). 43--56.

[14]

S. Lee et al. 2022. A 1ynm 1.25V 8Gb, 16Gb/s/pin GDDR6-based Accelerator-in- Memory supporting 1TFLOPS MAC Operation and Various Activation Functions for Deep-Learning Applications. In 2022 IEEE International Solid- State Circuits Conference (ISSCC), Vol. 65. 1--3.

[15]

S. Shivanandamurthy et al. 2021. ATRIA: A Bit-Parallel Stochastic Arithmetic Based Accelerator for In-DRAM CNN Processing. In 2021 IEEE Computer Society Annual Symposium on VLSI (ISVLSI). 200--205.

[16]

T. Morgan et al. 2019. Accelerating Compute By Cramming It Into DRAM Memory. https://www.upmem.com/nextplatform-com-2019-10-03-accelerating-compute-by-cramming-it-into-dram/

[17]

Y. Kim et al. 2012. A case for exploiting subarray-level parallelism (SALP) in DRAM. In 2012 39th Annual International Symposium on Computer Architecture (ISCA). 368--379.

[18]

Y. Kwon et al. 2021. 25.4 A 20nm 6GB Function-In-Memory DRAM, Based on HBM2 with a 1.2TFLOPS Programmable Computing Unit Using Bank-Level Parallelism, for Machine Learning Applications. In 2021 IEEE International Solid- State Circuits Conference (ISSCC), Vol. 64. 350--352.

Cited By

Bavikadi SSutradhar PGanguly ADinakarrao S(2024)Reconfigurable Processing-in-Memory Architecture for Data Intensive Applications2024 37th International Conference on VLSI Design and 2024 23rd International Conference on Embedded Systems (VLSID)10.1109/VLSID60093.2024.00043(222-227)Online publication date: 6-Jan-2024
https://doi.org/10.1109/VLSID60093.2024.00043
Singh GDube AVrudhula S(2024)A High Throughput, Energy-Efficient Architecture for Variable Precision Computing in DRAM2024 IFIP/IEEE 32nd International Conference on Very Large Scale Integration (VLSI-SoC)10.1109/VLSI-SoC62099.2024.10767834(1-6)Online publication date: 6-Oct-2024
https://doi.org/10.1109/VLSI-SoC62099.2024.10767834
Bavikadi SSutradhar PIndovina MGanguly ADinakarrao S(2024)ReApprox-PIM: Reconfigurable Approximate Lookup-Table (LUT)-Based Processing-in-Memory (PIM) Machine Learning AcceleratorIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.336782243:8(2288-2300)Online publication date: Aug-2024
https://doi.org/10.1109/TCAD.2024.3367822

Index Terms

FlutPIM:: A Look-up Table-based Processing in Memory Architecture with Floating-point Computation Support for Deep Learning Applications
1. Computer systems organization
  1. Architectures

Recommendations

Exploring Processing In-Memory for Different Technologies
GLSVLSI '19: Proceedings of the 2019 Great Lakes Symposium on VLSI

The recent emergence of IoT has led to a substantial increase in the amount of data processed. Today, a large number of applications are data intensive, involving massive data transfers between processing core and memory. These transfers act as a ...
Rethinking Memory System Design (along with Interconnects)
NoCArc '15: Proceedings of the 8th International Workshop on Network on Chip Architectures

The memory system is a fundamental performance and energy bottleneck in almost all computing systems. Recent system design, application, and technology trends that require more capacity, bandwidth, efficiency, and predictability out of the memory system ...
Power management of hybrid DRAM/PRAM-based main memory
DAC '11: Proceedings of the 48th Design Automation Conference

Hybrid main memory consisting of DRAM and non-volatile memory is attractive since the non-volatile memory can give the advantage of low standby power while DRAM provides high performance and better active power. In this work, we address the power ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

GLSVLSI '23: Proceedings of the Great Lakes Symposium on VLSI 2023

June 2023

731 pages

ISBN:9798400701252

DOI:10.1145/3583781

General Chairs:
Himanshu Thapliyal
University of Tennessee, Knoxville, USA
,
Ronald DeMara
University of Central Florida, USA
,
Program Chairs:
Inna Partin-Vaisband
University of Illinois Chicago, USA
,
Srinivas Katkoori
University of South Florida, USA

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGDA: ACM Special Interest Group on Design Automation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 June 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

US National Science Foundation

Conference

GLSVLSI '23

Sponsor:

SIGDA

GLSVLSI '23: Great Lakes Symposium on VLSI 2023

June 5 - 7, 2023

TN, Knoxville, USA

Acceptance Rates

Overall Acceptance Rate 312 of 1,156 submissions, 27%

Upcoming Conference

GLSVLSI '25

Sponsor:
sigda

Great Lakes Symposium on VLSI 2025

June 30 - July 2, 2025

New Orleans , LA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
318
Total Downloads

Downloads (Last 12 months)119
Downloads (Last 6 weeks)20

Reflects downloads up to 25 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Bavikadi SSutradhar PGanguly ADinakarrao S(2024)Reconfigurable Processing-in-Memory Architecture for Data Intensive Applications2024 37th International Conference on VLSI Design and 2024 23rd International Conference on Embedded Systems (VLSID)10.1109/VLSID60093.2024.00043(222-227)Online publication date: 6-Jan-2024
https://doi.org/10.1109/VLSID60093.2024.00043
Singh GDube AVrudhula S(2024)A High Throughput, Energy-Efficient Architecture for Variable Precision Computing in DRAM2024 IFIP/IEEE 32nd International Conference on Very Large Scale Integration (VLSI-SoC)10.1109/VLSI-SoC62099.2024.10767834(1-6)Online publication date: 6-Oct-2024
https://doi.org/10.1109/VLSI-SoC62099.2024.10767834
Bavikadi SSutradhar PIndovina MGanguly ADinakarrao S(2024)ReApprox-PIM: Reconfigurable Approximate Lookup-Table (LUT)-Based Processing-in-Memory (PIM) Machine Learning AcceleratorIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.336782243:8(2288-2300)Online publication date: Aug-2024
https://doi.org/10.1109/TCAD.2024.3367822

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten