A Critical Assessment of DRAM-PIM Architectures - Trends, Challenges and Solutions

Sudarshan, Chirag; Sadi, Mohammad Hassani; Steiner, Lukas; Weis, Christian; Wehn, Norbert

doi:10.1007/978-3-031-15074-6_23

Chirag Sudarshan¹⁰,
Mohammad Hassani Sadi¹⁰,
Lukas Steiner¹⁰,
Christian Weis¹⁰ &
…
Norbert Wehn¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13511))

Included in the following conference series:

International Conference on Embedded Computer Systems

1494 Accesses
2 Citations
1 Altmetric

Abstract

Recently, we are witnessing a surge in DRAM-based Processing in Memory (PIM) publications from academia and industry. The architectures and design techniques proposed in these publications vary largely, ranging from integration of computation units in the DRAM IO region (i.e., without modifying DRAM core circuits) to modifying the highly optimized DRAM sub-arrays inside the banks for computation operations. Additionally, the underlying memory type, e.g., DDR4, LPDDR4, GDDR6 and HBM2, for DRAM-PIM is also different. This paper presents the assessment of DRAM-PIM architectural design decisions adapted in all DRAM-PIM publications. Our study presents an in-depth analysis of computation unit placement location, i.e., from the chip-level down to DRAM sub-array-level, and discusses the implementation challenges for a computation unit in various regions of commodity DRAM architectures. We also elaborate on the architectural bottlenecks associated with the scalability of DRAM-PIM performance and energy gains, and present architectural approaches to address the issues. Finally, our assessment covers other important design dimensions, such as computation data formats and DRAM-PIM memory controller design.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Jouppi, N.P., et al.: In-datacenter performance analysis of a tensor processing unit. In: Proceedings of the 44th Annual International Symposium on Computer Architecture, ISCA 2017, pp. 1–12. ACM, New York (2017)
Google Scholar
Kwon, Y.-C., et al.: 25.4 A 20 nm 6 GB function-in-memory DRAM, based on HBM2 with a 1.2TFLOPS programmable computing unit using bank-level parallelism, for machine learning applications. In: 2021 IEEE International Solid- State Circuits Conference (ISSCC), vol. 64, pp. 350–352 (2021)
Google Scholar
Lee, S., et al.: A 1 ynm 1.25 V 8 Gb, 16 Gb/s/pin GDDR6-based accelerator-in-memory supporting 1TFLOPS MAC operation and various activation functions for deep-learning applications. In: 2022 IEEE International Solid- State Circuits Conference (ISSCC), vol. 65, pp. 1–3 (2022)
Google Scholar
Devaux, F.: The true processing in memory accelerator. In: 2019 IEEE Hot Chips 31 Symposium (HCS), pp. 1–24 (2019)
Google Scholar
Lee, S., et al.: Hardware architecture and software stack for PIM based on commercial DRAM technology: industrial product. In: 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA) (2021)
Google Scholar
He, M., et al.: Newton: a DRAM-maker’s accelerator-in-memory (AiM) architecture for machine learning. In: 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 372–385 (2020)
Google Scholar
Seshadri, V., et al.: Ambit: in-memory accelerator for bulk bitwise operations using commodity DRAM technology. In: Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-50 2017, pp. 273–287. ACM, New York (2017)
Google Scholar
Li, S., et al.: DRISA: a DRAM-based reconfigurable in-situ accelerator. In: Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-50 2017, pp. 288–301. ACM, New York (2017)
Google Scholar
Deng, Q., et al.: DrAcc: a DRAM based accelerator for accurate CNN inference. In: Proceedings of the 55th Annual Design Automation Conference, DAC 2018, pp. 168:1–168:6. ACM, New York (2018)
Google Scholar
Sudarshan, C., et al.: An In-DRAM neural network processing engine. In: 2019 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1–5 (2019)
Google Scholar
Deng, Q., et al.: LAcc: exploiting lookup table-based fast and accurate vector multiplication in DRAM-based CNN accelerator. In: 2019 56th ACM/IEEE Design Automation Conference (DAC), pp. 1–6 (2019)
Google Scholar
Li, S., et al.: SCOPE: a stochastic computing engine for DRAM-based in-situ accelerator. In: 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 696–709 (2018)
Google Scholar
Ghaffar, M.M., et al.: A low power in-DRAM architecture for quantized CNNs using fast winograd convolutions. In: The International Symposium on Memory Systems, MEMSYS 2020, pp. 158–168. Association for Computing Machinery, New York (2020)
Google Scholar
Sudarshan, C., et al.: A novel DRAM-based process-in-memory architecture and its implementation for CNNs. In: Proceedings of the 26th Asia and South Pacific Design Automation Conference, ASPDAC 2021, pp. 35–42. Association for Computing Machinery, New York (2021)
Google Scholar
Sudarshan, C., et al.: Optimization of DRAM based PIM architecture for energy-efficient deep neural network training (Accepted for Publication). In: 2022 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1–5 (2022)
Google Scholar
Kim, H., et al.: GradPIM: a practical processing-in-DRAM architecture for gradient descent. In: 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), pp. 249–262 (2021)
Google Scholar
Zhang, F., et al.: Max-PIM: fast and efficient max/min searching in DRAM. In: 2021 58th ACM/IEEE Design Automation Conference (DAC), pp. 211–216 (2021)
Google Scholar
Shin, H., et al.: McDRAM: low latency and energy-efficient matrix computations in DRAM. IEEE Trans. Comput.-Aided Des. Integr. Circ. Syst. 37(11), 2613–2622 (2018)
Article Google Scholar
Cho, S., et al.: McDRAM v2: in-dynamic random access memory systolic array accelerator to address the large model problem in deep neural networks on the edge. IEEE Access 8, 135223–135243 (2020)
Article Google Scholar
Kim, J.K., et al.: Aquabolt-XL: samsung HBM2-PIM with in-memory processing for ML accelerators and beyond. In: 2021 IEEE Hot Chips 33 Symposium (HCS), pp. 1–26 (2021)
Google Scholar
TechInsights. 1Y DRAM Analysis Product Brief (2019)
Google Scholar
Kim, D., et al.: 23.2 A 1.1V 1ynm 6.4 Gb/s/pin 16 Gb DDR5 SDRAM with a phase-rotator-based DLL, high-speed SerDes and RX/TX equalization scheme. In: 2019 IEEE International Solid- State Circuits Conference - (ISSCC), pp. 380–382 (2019)
Google Scholar
Chi, H.-J., et al.: 22.2 An 8.5 Gb/s/pin 12 Gb-LPDDR5 SDRAM with a hybrid-bank architecture using skew-tolerant, low-power and speed-boosting techniques in a 2nd generation 10 nm DRAM process. In: 2020 IEEE International Solid- State Circuits Conference - (ISSCC), pp. 382–384 (2020)
Google Scholar
Kim, Y.H., et al.: 25.2 A 16 Gb Sub-1V 7.14 Gb/s/pin LPDDR5 SDRAM applying a mosaic architecture with a short-feedback 1-Tap DFE, an FSS bus with low-level swing and an adaptively controlled body biasing in a 3rd-generation 10 nm DRAM. In: 2021 IEEE International Solid- State Circuits Conference (ISSCC), vol. 64, pp. 346–348 (2021)
Google Scholar
Sohn, K., et al.: A 1.2 V 20 nm 307 GB/s HBM DRAM with at-speed wafer-level IO test scheme and adaptive refresh considering temperature distribution. IEEE J. Solid-State Circ. 52(1), 250–260 (2017)
Google Scholar
Oh, C.S., et al.: 22.1 A 1.1V 16 GB 640 GB/s HBM2E DRAM with a data-bus window-extension technique and a synergetic on-die ECC scheme. In: 2020 IEEE International Solid- State Circuits Conference - (ISSCC), pp. 330–332 (2020)
Google Scholar
Jacob, B., et al.: Quantization and training of neural networks for efficient integer-arithmetic-only inference. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Google Scholar
Banner, R., et al.: Post training 4-bit quantization of convolutional networks for rapid-deployment. In: Wallach, H., et al. (eds.) Advances in Neural Information Processing Systems, vol. 32. Curran Associates Inc. (2019)
Google Scholar
Zhu, C., et al.: Trained Ternary Quantization (2016)
Google Scholar
Hubara, I., et al.: Binarized neural networks. In: Lee, D., et al. (eds.) Advances in Neural Information Processing Systems, vol. 29. Curran Associates Inc. (2016)
Google Scholar
Lee, S.K., et al.: A 7-nm four-core mixed-precision AI chip with 26.2-TFLOPS hybrid-FP8 training, 104.9-TOPS INT4 inference, and workload-aware throttling. IEEE J. Solid-State Circ. 57(1), 182–197 (2022)
Google Scholar
Kalamkar, D., et al.: A study of BFLOAT16 for deep learning training (2019)
Google Scholar
Takahashi, T., et al.: A multigigabit DRAM technology with 6F/sup 2/ open-bitline cell, distributed overdriven sensing, and stacked-flash fuse. IEEE J. Solid-State Circ. 36(11), 1721–1727 (2001)
Article Google Scholar
Le, Y., et al.: Tiny ImageNet visual recognition challenge (2015)
Google Scholar
Krizhevsky, A.: Learning multiple layers of features from tiny images (2009)
Google Scholar

Download references

Acknowledgment

This work was partly funded by the German ministry of education and research (BMBF) under grant 16KISK004 (Open6GHuB), Carl Zeiss foundation under grant “Sustainable Embedded AI” and EC under grant 952091 (ALMA).

Author information

Authors and Affiliations

Microelectronic Systems Design Research Group, TU Kaiserslautern, Kaiserslautern, Germany
Chirag Sudarshan, Mohammad Hassani Sadi, Lukas Steiner, Christian Weis & Norbert Wehn

Authors

Chirag Sudarshan
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad Hassani Sadi
View author publications
You can also search for this author in PubMed Google Scholar
Lukas Steiner
View author publications
You can also search for this author in PubMed Google Scholar
Christian Weis
View author publications
You can also search for this author in PubMed Google Scholar
Norbert Wehn
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chirag Sudarshan .

Editor information

Editors and Affiliations

University of California, La Jolla, CA, USA
Alex Orailoglu
BTU-Cottbus Senftenberg, Cottbus, Germany
Marc Reichenbach
Fraunhofer IESE, Kaiserslautern, Germany
Matthias Jung

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sudarshan, C., Sadi, M.H., Steiner, L., Weis, C., Wehn, N. (2022). A Critical Assessment of DRAM-PIM Architectures - Trends, Challenges and Solutions. In: Orailoglu, A., Reichenbach, M., Jung, M. (eds) Embedded Computer Systems: Architectures, Modeling, and Simulation. SAMOS 2022. Lecture Notes in Computer Science, vol 13511. Springer, Cham. https://doi.org/10.1007/978-3-031-15074-6_23

Download citation

DOI: https://doi.org/10.1007/978-3-031-15074-6_23
Published: 14 August 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15073-9
Online ISBN: 978-3-031-15074-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Critical Assessment of DRAM-PIM Architectures - Trends, Challenges and Solutions