skip to main content
10.1145/3583781.3590313acmconferencesArticle/Chapter ViewAbstractPublication PagesglsvlsiConference Proceedingsconference-collections
short-paper
Public Access

FlutPIM:: A Look-up Table-based Processing in Memory Architecture with Floating-point Computation Support for Deep Learning Applications

Published: 05 June 2023 Publication History

Abstract

Processing-in-Memory (PIM) has shown great potential for a wide range of data-driven applications, especially Deep Learning and AI. However, it is a challenge to facilitate the computational sophistication of a standard processor (i.e. CPU or GPU) within the limited scope of a memory chip without contributing significant circuit overheads. To address the challenge, we propose a programmable LUT-based area-efficient PIM architecture capable of performing various low-precision floating point (FP) computations using a novel LUT-oriented operand-decomposition technique. We incorporate such compact computational units within the memory banks in a large count to achieve impressive parallel processing capabilities, up to 4x higher than state-of-the-art FP-capable PIM. Additionally, we adopt a highly-optimized low-precision FP format that maximizes computational performance at a minimal compromise of computational precision, especially for Deep Learning Applications. The overall result is a 17% higher throughput and an impressive 8-20x higher compute Bandwidth/bank compared to the state-of-the-art of in-memory acceleration.

References

[1]
2021. High Bandwidth Memory (HBM)DRAM. JEDEC SOLID STATE TECHNOLOGY ASSOCIATION (2021).
[2]
A. Nowatzyk et al. 1996. Missing the Memory Wall: The Case for Processor/Memory Integration. In 23rd Annual International Symposium on Computer Architecture (ISCA'96). 90--90. https://doi.org/10.1109/ISCA.1996.10008
[3]
A. Ramanathan et al. 2020. Look-Up Table based Energy Efficient Processing in Cache Support for Neural Network Acceleration. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 88--101.
[4]
C. Sudarshan et al. 2022. A Critical Assessment ofnbsp;DRAM-PIM Architec- tures - Trends, Challenges andnbsp; Solutions. In Embedded Computer Systems: Architectures, Modeling, and Simulation: 22nd International Conference, SAMOS 2022, Samos, Greece, July 3-7, 2022, Proceedings (Samos, Greece). 362--379.
[5]
C. Sudarshan et al. 2022. Optimization of DRAM based PIM Architecture for Energy-Efficient Deep Neural Network Training. In 2022 IEEE International Symposium on Circuits and Systems (ISCAS). 1472--1476.
[6]
J. Ferreira et al. 2021. pLUTo: In-DRAM Lookup Tables to Enable Massively Paral- lel General-Purpose Computation. CoRR abs/2104.07699 (2021). arXiv:2104.07699
[7]
M. He et al. 2020. Newton: A DRAM-maker's Accelerator-in-Memory (AiM) Architecture for Machine Learning. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 372--385.
[8]
P. Gu et al. 2020. DLUX: a LUT-based Near-Bank Accelerator for Data Center Deep Learning Training Workloads. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2020), 1--1.
[9]
P. Sutradhar et al. 2020. pPIM: A Programmable Processor-in-Memory Architecture With Precision-Scaling for Deep Learning. IEEE Computer Architecture Letters 19, 2 (2020), 118--121.
[10]
Q. Deng et al. 2019. LAcc: Exploiting Lookup Table-based Fast and Accurate Vector Multiplication in DRAM-based CNN Accelerator. In 2019 56th ACM/IEEE Design Automation Conference (DAC).
[11]
S. Li et al. 2017. DRISA: A DRAM-based Reconfigurable In-Situ Accelerator. In 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 288--301.
[12]
S. Li et al. 2018. SCOPE: A Stochastic Computing Engine for DRAM-Based In-Situ Accelerator. In 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 696--709.
[13]
S. Lee et al. 2021. Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology: Industrial Product. In 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). 43--56.
[14]
S. Lee et al. 2022. A 1ynm 1.25V 8Gb, 16Gb/s/pin GDDR6-based Accelerator-in- Memory supporting 1TFLOPS MAC Operation and Various Activation Functions for Deep-Learning Applications. In 2022 IEEE International Solid- State Circuits Conference (ISSCC), Vol. 65. 1--3.
[15]
S. Shivanandamurthy et al. 2021. ATRIA: A Bit-Parallel Stochastic Arithmetic Based Accelerator for In-DRAM CNN Processing. In 2021 IEEE Computer Society Annual Symposium on VLSI (ISVLSI). 200--205.
[16]
T. Morgan et al. 2019. Accelerating Compute By Cramming It Into DRAM Memory. https://www.upmem.com/nextplatform-com-2019-10-03-accelerating-compute-by-cramming-it-into-dram/
[17]
Y. Kim et al. 2012. A case for exploiting subarray-level parallelism (SALP) in DRAM. In 2012 39th Annual International Symposium on Computer Architecture (ISCA). 368--379.
[18]
Y. Kwon et al. 2021. 25.4 A 20nm 6GB Function-In-Memory DRAM, Based on HBM2 with a 1.2TFLOPS Programmable Computing Unit Using Bank-Level Parallelism, for Machine Learning Applications. In 2021 IEEE International Solid- State Circuits Conference (ISSCC), Vol. 64. 350--352.

Cited By

View all
  • (2024)Reconfigurable Processing-in-Memory Architecture for Data Intensive Applications2024 37th International Conference on VLSI Design and 2024 23rd International Conference on Embedded Systems (VLSID)10.1109/VLSID60093.2024.00043(222-227)Online publication date: 6-Jan-2024
  • (2024)A High Throughput, Energy-Efficient Architecture for Variable Precision Computing in DRAM2024 IFIP/IEEE 32nd International Conference on Very Large Scale Integration (VLSI-SoC)10.1109/VLSI-SoC62099.2024.10767834(1-6)Online publication date: 6-Oct-2024
  • (2024)ReApprox-PIM: Reconfigurable Approximate Lookup-Table (LUT)-Based Processing-in-Memory (PIM) Machine Learning AcceleratorIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.336782243:8(2288-2300)Online publication date: Aug-2024

Index Terms

  1. FlutPIM:: A Look-up Table-based Processing in Memory Architecture with Floating-point Computation Support for Deep Learning Applications

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    GLSVLSI '23: Proceedings of the Great Lakes Symposium on VLSI 2023
    June 2023
    731 pages
    ISBN:9798400701252
    DOI:10.1145/3583781
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 05 June 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. deep learning
    2. dram
    3. floating point
    4. processing in memory

    Qualifiers

    • Short-paper

    Funding Sources

    • US National Science Foundation

    Conference

    GLSVLSI '23
    Sponsor:
    GLSVLSI '23: Great Lakes Symposium on VLSI 2023
    June 5 - 7, 2023
    TN, Knoxville, USA

    Acceptance Rates

    Overall Acceptance Rate 312 of 1,156 submissions, 27%

    Upcoming Conference

    GLSVLSI '25
    Great Lakes Symposium on VLSI 2025
    June 30 - July 2, 2025
    New Orleans , LA , USA

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)119
    • Downloads (Last 6 weeks)20
    Reflects downloads up to 25 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Reconfigurable Processing-in-Memory Architecture for Data Intensive Applications2024 37th International Conference on VLSI Design and 2024 23rd International Conference on Embedded Systems (VLSID)10.1109/VLSID60093.2024.00043(222-227)Online publication date: 6-Jan-2024
    • (2024)A High Throughput, Energy-Efficient Architecture for Variable Precision Computing in DRAM2024 IFIP/IEEE 32nd International Conference on Very Large Scale Integration (VLSI-SoC)10.1109/VLSI-SoC62099.2024.10767834(1-6)Online publication date: 6-Oct-2024
    • (2024)ReApprox-PIM: Reconfigurable Approximate Lookup-Table (LUT)-Based Processing-in-Memory (PIM) Machine Learning AcceleratorIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.336782243:8(2288-2300)Online publication date: Aug-2024

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media