Abstract
Recently, we are witnessing a surge in DRAM-based Processing in Memory (PIM) publications from academia and industry. The architectures and design techniques proposed in these publications vary largely, ranging from integration of computation units in the DRAM IO region (i.e., without modifying DRAM core circuits) to modifying the highly optimized DRAM sub-arrays inside the banks for computation operations. Additionally, the underlying memory type, e.g., DDR4, LPDDR4, GDDR6 and HBM2, for DRAM-PIM is also different. This paper presents the assessment of DRAM-PIM architectural design decisions adapted in all DRAM-PIM publications. Our study presents an in-depth analysis of computation unit placement location, i.e., from the chip-level down to DRAM sub-array-level, and discusses the implementation challenges for a computation unit in various regions of commodity DRAM architectures. We also elaborate on the architectural bottlenecks associated with the scalability of DRAM-PIM performance and energy gains, and present architectural approaches to address the issues. Finally, our assessment covers other important design dimensions, such as computation data formats and DRAM-PIM memory controller design.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Jouppi, N.P., et al.: In-datacenter performance analysis of a tensor processing unit. In: Proceedings of the 44th Annual International Symposium on Computer Architecture, ISCA 2017, pp. 1–12. ACM, New York (2017)
Kwon, Y.-C., et al.: 25.4 A 20 nm 6 GB function-in-memory DRAM, based on HBM2 with a 1.2TFLOPS programmable computing unit using bank-level parallelism, for machine learning applications. In: 2021 IEEE International Solid- State Circuits Conference (ISSCC), vol. 64, pp. 350–352 (2021)
Lee, S., et al.: A 1 ynm 1.25 V 8 Gb, 16 Gb/s/pin GDDR6-based accelerator-in-memory supporting 1TFLOPS MAC operation and various activation functions for deep-learning applications. In: 2022 IEEE International Solid- State Circuits Conference (ISSCC), vol. 65, pp. 1–3 (2022)
Devaux, F.: The true processing in memory accelerator. In: 2019 IEEE Hot Chips 31 Symposium (HCS), pp. 1–24 (2019)
Lee, S., et al.: Hardware architecture and software stack for PIM based on commercial DRAM technology: industrial product. In: 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA) (2021)
He, M., et al.: Newton: a DRAM-maker’s accelerator-in-memory (AiM) architecture for machine learning. In: 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 372–385 (2020)
Seshadri, V., et al.: Ambit: in-memory accelerator for bulk bitwise operations using commodity DRAM technology. In: Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-50 2017, pp. 273–287. ACM, New York (2017)
Li, S., et al.: DRISA: a DRAM-based reconfigurable in-situ accelerator. In: Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-50 2017, pp. 288–301. ACM, New York (2017)
Deng, Q., et al.: DrAcc: a DRAM based accelerator for accurate CNN inference. In: Proceedings of the 55th Annual Design Automation Conference, DAC 2018, pp. 168:1–168:6. ACM, New York (2018)
Sudarshan, C., et al.: An In-DRAM neural network processing engine. In: 2019 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1–5 (2019)
Deng, Q., et al.: LAcc: exploiting lookup table-based fast and accurate vector multiplication in DRAM-based CNN accelerator. In: 2019 56th ACM/IEEE Design Automation Conference (DAC), pp. 1–6 (2019)
Li, S., et al.: SCOPE: a stochastic computing engine for DRAM-based in-situ accelerator. In: 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 696–709 (2018)
Ghaffar, M.M., et al.: A low power in-DRAM architecture for quantized CNNs using fast winograd convolutions. In: The International Symposium on Memory Systems, MEMSYS 2020, pp. 158–168. Association for Computing Machinery, New York (2020)
Sudarshan, C., et al.: A novel DRAM-based process-in-memory architecture and its implementation for CNNs. In: Proceedings of the 26th Asia and South Pacific Design Automation Conference, ASPDAC 2021, pp. 35–42. Association for Computing Machinery, New York (2021)
Sudarshan, C., et al.: Optimization of DRAM based PIM architecture for energy-efficient deep neural network training (Accepted for Publication). In: 2022 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1–5 (2022)
Kim, H., et al.: GradPIM: a practical processing-in-DRAM architecture for gradient descent. In: 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), pp. 249–262 (2021)
Zhang, F., et al.: Max-PIM: fast and efficient max/min searching in DRAM. In: 2021 58th ACM/IEEE Design Automation Conference (DAC), pp. 211–216 (2021)
Shin, H., et al.: McDRAM: low latency and energy-efficient matrix computations in DRAM. IEEE Trans. Comput.-Aided Des. Integr. Circ. Syst. 37(11), 2613–2622 (2018)
Cho, S., et al.: McDRAM v2: in-dynamic random access memory systolic array accelerator to address the large model problem in deep neural networks on the edge. IEEE Access 8, 135223–135243 (2020)
Kim, J.K., et al.: Aquabolt-XL: samsung HBM2-PIM with in-memory processing for ML accelerators and beyond. In: 2021 IEEE Hot Chips 33 Symposium (HCS), pp. 1–26 (2021)
TechInsights. 1Y DRAM Analysis Product Brief (2019)
Kim, D., et al.: 23.2 A 1.1V 1ynm 6.4 Gb/s/pin 16 Gb DDR5 SDRAM with a phase-rotator-based DLL, high-speed SerDes and RX/TX equalization scheme. In: 2019 IEEE International Solid- State Circuits Conference - (ISSCC), pp. 380–382 (2019)
Chi, H.-J., et al.: 22.2 An 8.5 Gb/s/pin 12 Gb-LPDDR5 SDRAM with a hybrid-bank architecture using skew-tolerant, low-power and speed-boosting techniques in a 2nd generation 10 nm DRAM process. In: 2020 IEEE International Solid- State Circuits Conference - (ISSCC), pp. 382–384 (2020)
Kim, Y.H., et al.: 25.2 A 16 Gb Sub-1V 7.14 Gb/s/pin LPDDR5 SDRAM applying a mosaic architecture with a short-feedback 1-Tap DFE, an FSS bus with low-level swing and an adaptively controlled body biasing in a 3rd-generation 10 nm DRAM. In: 2021 IEEE International Solid- State Circuits Conference (ISSCC), vol. 64, pp. 346–348 (2021)
Sohn, K., et al.: A 1.2 V 20 nm 307 GB/s HBM DRAM with at-speed wafer-level IO test scheme and adaptive refresh considering temperature distribution. IEEE J. Solid-State Circ. 52(1), 250–260 (2017)
Oh, C.S., et al.: 22.1 A 1.1V 16 GB 640 GB/s HBM2E DRAM with a data-bus window-extension technique and a synergetic on-die ECC scheme. In: 2020 IEEE International Solid- State Circuits Conference - (ISSCC), pp. 330–332 (2020)
Jacob, B., et al.: Quantization and training of neural networks for efficient integer-arithmetic-only inference. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Banner, R., et al.: Post training 4-bit quantization of convolutional networks for rapid-deployment. In: Wallach, H., et al. (eds.) Advances in Neural Information Processing Systems, vol. 32. Curran Associates Inc. (2019)
Zhu, C., et al.: Trained Ternary Quantization (2016)
Hubara, I., et al.: Binarized neural networks. In: Lee, D., et al. (eds.) Advances in Neural Information Processing Systems, vol. 29. Curran Associates Inc. (2016)
Lee, S.K., et al.: A 7-nm four-core mixed-precision AI chip with 26.2-TFLOPS hybrid-FP8 training, 104.9-TOPS INT4 inference, and workload-aware throttling. IEEE J. Solid-State Circ. 57(1), 182–197 (2022)
Kalamkar, D., et al.: A study of BFLOAT16 for deep learning training (2019)
Takahashi, T., et al.: A multigigabit DRAM technology with 6F/sup 2/ open-bitline cell, distributed overdriven sensing, and stacked-flash fuse. IEEE J. Solid-State Circ. 36(11), 1721–1727 (2001)
Le, Y., et al.: Tiny ImageNet visual recognition challenge (2015)
Krizhevsky, A.: Learning multiple layers of features from tiny images (2009)
Acknowledgment
This work was partly funded by the German ministry of education and research (BMBF) under grant 16KISK004 (Open6GHuB), Carl Zeiss foundation under grant “Sustainable Embedded AI” and EC under grant 952091 (ALMA).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Sudarshan, C., Sadi, M.H., Steiner, L., Weis, C., Wehn, N. (2022). A Critical Assessment of DRAM-PIM Architectures - Trends, Challenges and Solutions. In: Orailoglu, A., Reichenbach, M., Jung, M. (eds) Embedded Computer Systems: Architectures, Modeling, and Simulation. SAMOS 2022. Lecture Notes in Computer Science, vol 13511. Springer, Cham. https://doi.org/10.1007/978-3-031-15074-6_23
Download citation
DOI: https://doi.org/10.1007/978-3-031-15074-6_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15073-9
Online ISBN: 978-3-031-15074-6
eBook Packages: Computer ScienceComputer Science (R0)