Skip to main content

A Critical Assessment of DRAM-PIM Architectures - Trends, Challenges and Solutions

  • Conference paper
  • First Online:
Book cover Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13511))

Included in the following conference series:

Abstract

Recently, we are witnessing a surge in DRAM-based Processing in Memory (PIM) publications from academia and industry. The architectures and design techniques proposed in these publications vary largely, ranging from integration of computation units in the DRAM IO region (i.e., without modifying DRAM core circuits) to modifying the highly optimized DRAM sub-arrays inside the banks for computation operations. Additionally, the underlying memory type, e.g., DDR4, LPDDR4, GDDR6 and HBM2, for DRAM-PIM is also different. This paper presents the assessment of DRAM-PIM architectural design decisions adapted in all DRAM-PIM publications. Our study presents an in-depth analysis of computation unit placement location, i.e., from the chip-level down to DRAM sub-array-level, and discusses the implementation challenges for a computation unit in various regions of commodity DRAM architectures. We also elaborate on the architectural bottlenecks associated with the scalability of DRAM-PIM performance and energy gains, and present architectural approaches to address the issues. Finally, our assessment covers other important design dimensions, such as computation data formats and DRAM-PIM memory controller design.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Jouppi, N.P., et al.: In-datacenter performance analysis of a tensor processing unit. In: Proceedings of the 44th Annual International Symposium on Computer Architecture, ISCA 2017, pp. 1–12. ACM, New York (2017)

    Google Scholar 

  2. Kwon, Y.-C., et al.: 25.4 A 20 nm 6 GB function-in-memory DRAM, based on HBM2 with a 1.2TFLOPS programmable computing unit using bank-level parallelism, for machine learning applications. In: 2021 IEEE International Solid- State Circuits Conference (ISSCC), vol. 64, pp. 350–352 (2021)

    Google Scholar 

  3. Lee, S., et al.: A 1 ynm 1.25 V 8 Gb, 16 Gb/s/pin GDDR6-based accelerator-in-memory supporting 1TFLOPS MAC operation and various activation functions for deep-learning applications. In: 2022 IEEE International Solid- State Circuits Conference (ISSCC), vol. 65, pp. 1–3 (2022)

    Google Scholar 

  4. Devaux, F.: The true processing in memory accelerator. In: 2019 IEEE Hot Chips 31 Symposium (HCS), pp. 1–24 (2019)

    Google Scholar 

  5. Lee, S., et al.: Hardware architecture and software stack for PIM based on commercial DRAM technology: industrial product. In: 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA) (2021)

    Google Scholar 

  6. He, M., et al.: Newton: a DRAM-maker’s accelerator-in-memory (AiM) architecture for machine learning. In: 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 372–385 (2020)

    Google Scholar 

  7. Seshadri, V., et al.: Ambit: in-memory accelerator for bulk bitwise operations using commodity DRAM technology. In: Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-50 2017, pp. 273–287. ACM, New York (2017)

    Google Scholar 

  8. Li, S., et al.: DRISA: a DRAM-based reconfigurable in-situ accelerator. In: Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-50 2017, pp. 288–301. ACM, New York (2017)

    Google Scholar 

  9. Deng, Q., et al.: DrAcc: a DRAM based accelerator for accurate CNN inference. In: Proceedings of the 55th Annual Design Automation Conference, DAC 2018, pp. 168:1–168:6. ACM, New York (2018)

    Google Scholar 

  10. Sudarshan, C., et al.: An In-DRAM neural network processing engine. In: 2019 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1–5 (2019)

    Google Scholar 

  11. Deng, Q., et al.: LAcc: exploiting lookup table-based fast and accurate vector multiplication in DRAM-based CNN accelerator. In: 2019 56th ACM/IEEE Design Automation Conference (DAC), pp. 1–6 (2019)

    Google Scholar 

  12. Li, S., et al.: SCOPE: a stochastic computing engine for DRAM-based in-situ accelerator. In: 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 696–709 (2018)

    Google Scholar 

  13. Ghaffar, M.M., et al.: A low power in-DRAM architecture for quantized CNNs using fast winograd convolutions. In: The International Symposium on Memory Systems, MEMSYS 2020, pp. 158–168. Association for Computing Machinery, New York (2020)

    Google Scholar 

  14. Sudarshan, C., et al.: A novel DRAM-based process-in-memory architecture and its implementation for CNNs. In: Proceedings of the 26th Asia and South Pacific Design Automation Conference, ASPDAC 2021, pp. 35–42. Association for Computing Machinery, New York (2021)

    Google Scholar 

  15. Sudarshan, C., et al.: Optimization of DRAM based PIM architecture for energy-efficient deep neural network training (Accepted for Publication). In: 2022 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1–5 (2022)

    Google Scholar 

  16. Kim, H., et al.: GradPIM: a practical processing-in-DRAM architecture for gradient descent. In: 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), pp. 249–262 (2021)

    Google Scholar 

  17. Zhang, F., et al.: Max-PIM: fast and efficient max/min searching in DRAM. In: 2021 58th ACM/IEEE Design Automation Conference (DAC), pp. 211–216 (2021)

    Google Scholar 

  18. Shin, H., et al.: McDRAM: low latency and energy-efficient matrix computations in DRAM. IEEE Trans. Comput.-Aided Des. Integr. Circ. Syst. 37(11), 2613–2622 (2018)

    Article  Google Scholar 

  19. Cho, S., et al.: McDRAM v2: in-dynamic random access memory systolic array accelerator to address the large model problem in deep neural networks on the edge. IEEE Access 8, 135223–135243 (2020)

    Article  Google Scholar 

  20. Kim, J.K., et al.: Aquabolt-XL: samsung HBM2-PIM with in-memory processing for ML accelerators and beyond. In: 2021 IEEE Hot Chips 33 Symposium (HCS), pp. 1–26 (2021)

    Google Scholar 

  21. TechInsights. 1Y DRAM Analysis Product Brief (2019)

    Google Scholar 

  22. Kim, D., et al.: 23.2 A 1.1V 1ynm 6.4 Gb/s/pin 16 Gb DDR5 SDRAM with a phase-rotator-based DLL, high-speed SerDes and RX/TX equalization scheme. In: 2019 IEEE International Solid- State Circuits Conference - (ISSCC), pp. 380–382 (2019)

    Google Scholar 

  23. Chi, H.-J., et al.: 22.2 An 8.5 Gb/s/pin 12 Gb-LPDDR5 SDRAM with a hybrid-bank architecture using skew-tolerant, low-power and speed-boosting techniques in a 2nd generation 10 nm DRAM process. In: 2020 IEEE International Solid- State Circuits Conference - (ISSCC), pp. 382–384 (2020)

    Google Scholar 

  24. Kim, Y.H., et al.: 25.2 A 16 Gb Sub-1V 7.14 Gb/s/pin LPDDR5 SDRAM applying a mosaic architecture with a short-feedback 1-Tap DFE, an FSS bus with low-level swing and an adaptively controlled body biasing in a 3rd-generation 10 nm DRAM. In: 2021 IEEE International Solid- State Circuits Conference (ISSCC), vol. 64, pp. 346–348 (2021)

    Google Scholar 

  25. Sohn, K., et al.: A 1.2 V 20 nm 307 GB/s HBM DRAM with at-speed wafer-level IO test scheme and adaptive refresh considering temperature distribution. IEEE J. Solid-State Circ. 52(1), 250–260 (2017)

    Google Scholar 

  26. Oh, C.S., et al.: 22.1 A 1.1V 16 GB 640 GB/s HBM2E DRAM with a data-bus window-extension technique and a synergetic on-die ECC scheme. In: 2020 IEEE International Solid- State Circuits Conference - (ISSCC), pp. 330–332 (2020)

    Google Scholar 

  27. Jacob, B., et al.: Quantization and training of neural networks for efficient integer-arithmetic-only inference. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

    Google Scholar 

  28. Banner, R., et al.: Post training 4-bit quantization of convolutional networks for rapid-deployment. In: Wallach, H., et al. (eds.) Advances in Neural Information Processing Systems, vol. 32. Curran Associates Inc. (2019)

    Google Scholar 

  29. Zhu, C., et al.: Trained Ternary Quantization (2016)

    Google Scholar 

  30. Hubara, I., et al.: Binarized neural networks. In: Lee, D., et al. (eds.) Advances in Neural Information Processing Systems, vol. 29. Curran Associates Inc. (2016)

    Google Scholar 

  31. Lee, S.K., et al.: A 7-nm four-core mixed-precision AI chip with 26.2-TFLOPS hybrid-FP8 training, 104.9-TOPS INT4 inference, and workload-aware throttling. IEEE J. Solid-State Circ. 57(1), 182–197 (2022)

    Google Scholar 

  32. Kalamkar, D., et al.: A study of BFLOAT16 for deep learning training (2019)

    Google Scholar 

  33. Takahashi, T., et al.: A multigigabit DRAM technology with 6F/sup 2/ open-bitline cell, distributed overdriven sensing, and stacked-flash fuse. IEEE J. Solid-State Circ. 36(11), 1721–1727 (2001)

    Article  Google Scholar 

  34. Le, Y., et al.: Tiny ImageNet visual recognition challenge (2015)

    Google Scholar 

  35. Krizhevsky, A.: Learning multiple layers of features from tiny images (2009)

    Google Scholar 

Download references

Acknowledgment

This work was partly funded by the German ministry of education and research (BMBF) under grant 16KISK004 (Open6GHuB), Carl Zeiss foundation under grant “Sustainable Embedded AI” and EC under grant 952091 (ALMA).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chirag Sudarshan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sudarshan, C., Sadi, M.H., Steiner, L., Weis, C., Wehn, N. (2022). A Critical Assessment of DRAM-PIM Architectures - Trends, Challenges and Solutions. In: Orailoglu, A., Reichenbach, M., Jung, M. (eds) Embedded Computer Systems: Architectures, Modeling, and Simulation. SAMOS 2022. Lecture Notes in Computer Science, vol 13511. Springer, Cham. https://doi.org/10.1007/978-3-031-15074-6_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-15074-6_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-15073-9

  • Online ISBN: 978-3-031-15074-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics