Skip to main content
Log in

Performance investigation of packet-based communication in 3D-memories

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Recent advances in 3D fabrication have allowed the development of 3D memory over the logic die. The 3D memory presents itself as a viable solution to the memory wall problem. The 3D memory has stacked DRAM layers connected with Through Silicon Vias (TSVs). In the coming future, data-intensive applications on memory-centric network architecture will rely on packet-based communication to ensure scalability and reliable data transfers. The paper studies the performance of 3D memory that uses a packet-based communication protocol for communication between the CPU and off-chip memory. Our study provides insight into the internal flit traffic for different configurations of 3D memory when observed under the diverse memory access patterns and workload characteristics. We use CasHMC to capture the effect on performance for packet-based communication protocol and have integrated it with the gem5 simulator to study the workloads from Rodinia Benchmarks Suite. Our evaluation focuses on the following metrics- total memory bandwidth utilization, off-chip link bandwidth utilization, latency, & power consumption. We look at the performance characteristics of the 3D stacked memory, under the variation of the number of banks and vaults in the structure. Further, the effect of varying the packet size and the number of communication links on off-chip link bandwidth and latency have been studied. We further examine different off-chip link power optimization strategies. Finally, we observe the impact of varying buffer sizes on the latency at the off-chip links buffer and at the vault buffer of 3D memory. Our study provides more perspective into further developments of Data Centric Computing architectures and insight into proper flit management strategies in future memory architectures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Reinsel D, Gantz J, Rydning J et al (2018) The digitization of the world from edge to core. Framingham Int Data Corporat 16

  2. Wulf WA, McKee SA (1995) Hitting the memory wall: implications of the obvious. ACM SIGARCH Comp Archit News 23(1):20–24

    Article  Google Scholar 

  3. Ahn J, Yoo S, Mutlu O, Choi K (2015) Pim-enabled instructions: a low-overhead, locality-aware processing-in-memory architecture. In: 2015 ACM/IEEE 42nd annual international symposium on computer architecture (ISCA), pp. 336–348. IEEE

  4. Hadidi R, Nai L, Kim H, Kim H (2017) Cairo: a compiler-assisted technique for enabling instruction-level offloading of processing-in-memory. ACM Trans Archit Code Optimizat (TACO) 14(4):1–25

    Article  Google Scholar 

  5. Zhang C, Meng T, Sun G (2018) Pm3: power modeling and power management for processing-in-memory. In: 2018 IEEE International symposium on high performance computer architecture (HPCA), pp. 558–570. IEEE

  6. Pawlowski JT (2011) Hybrid memory cube (hmc). In: 2011 IEEE hot chips 23 symposium (HCS), pp. 1–24. https://doi.org/10.1109/HOTCHIPS.2011.7477494

  7. Macri J (2015) Amd’s next generation gpu and high bandwidth memory architecture: fury. In: 2015 IEEE hot chips 27 symposium (HCS), pp. 1–26. https://doi.org/10.1109/HOTCHIPS.2015.7477461

  8. Samsung speeds a with processing in memory. IEEE Spectrum

  9. Kim G, Kim J, Ahn JH, Kim J (2013) Memory-centric system interconnect design with hybrid memory cubes. In: Proceedings of the 22nd international Conference on Parallel Architectures and Compilation Techniques, pp. 145–155. IEEE

  10. Penney DD, Chen L (2019) A survey of machine learning applied to computer architecture design. arXiv preprint arXiv:1909.12373

  11. DiTomaso D, Sikder A, Kodi A, Louri A (2017) Machine learning enabled power-aware network-on-chip design. In: Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017, pp. 1354–1359. IEEE

  12. Henessey J, Patterson D (1990) Computer architecture: a quantitative approach mogran kaufman publishers. Palo Alto, CA

    Google Scholar 

  13. Consortium H et al (2013) Hybrid memory cube specification 2.1. Retrieved from hybridmemorycube.org. https://www.nuvation.com/sites/default/files/Nuvation-Engineering-Images/Articles/FPGAs-and-HMC/HMC-30G-VSR_HMCC_Specification.pdf

  14. Hadidi R, Asgari B, Mudassar BA, Mukhopadhyay S, Yalamanchili S, Kim H (2017) Demystifying the characteristics of 3d-stacked memories: a case study for hybrid memory cube. In: 2017 IEEE international symposium on workload characterization (IISWC). IEEE

  15. Menon S, Murugan VI (2020) Validating and characterizing a 2.5d high bandwidth memory subsystem. In: 2020 IEEE International Test Conference India, pp. 1–9. https://doi.org/10.1109/ITCIndia49857.2020.9171795

  16. Glew A (1998) MLP yes! ILP no. ASPLOS wild and crazy idea session 98

  17. Chou Y, Fahs B, Abraham S (2004) Microarchitecture optimizations for exploiting memory-level parallelism. In: Proceedings. 31st annual international symposium on computer architecture, 2004., pp. 76–87 (2004). IEEE

  18. Khan K, Pasricha S, Kim RG (2020) A survey of resource management for processing-in-memory and near-memory processing architectures. J Low Power Electr Appl. https://doi.org/10.3390/jlpea10040030

    Article  Google Scholar 

  19. Rosenfeld P, Cooper-Balis E, Farrell T, Resnick D, Jacob B (2012) Peering over the memory wall: design space and performance analysis of the hybrid memory cube. Univ. of Maryland Systems and Computer Architecture Group, Tech. Rep. UMD-SCA-2012-10-01

  20. Cabarcas F, Rico A, Etsion Y, Ramirez A (2010) Interleaving granularity on high bandwidth memory architecture for cmps. In: 2010 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation, pp. 250–257 https://doi.org/10.1109/ICSAMOS.2010.5642060

  21. Hsieh K, Ebrahimi E, Kim G, Chatterjee N, O’Connor M, Vijaykumar N, Mutlu O, Keckler SW (2016) Transparent offloading and mapping (tom) enabling programmer-transparent near-data processing in GPU systems. ACM SIGARCH Comput Archit News 44(3):204–216

    Article  Google Scholar 

  22. Loh GH (2008) 3d-stacked memory architectures for multi-core processors. ACM SIGARCH Comput Archit News 36(3):453–464

    Article  Google Scholar 

  23. Ibrahim KZ, Fatollahi-Fard F, Donofrio D, Shalf J (2016) Characterizing the performance of hybrid memory cube using apexmap application probes. In: Proceedings of the second international symposium on memory systems, pp. 429–436

  24. Hadidi R, Asgari B, Young J, Mudassar BA, Garg K, Krishna T, Kim H (2018) Performance implications of NOCS on 3d-stacked memories: insights from the hybrid memory cube. In: 2018 ISPASS. IEEE

  25. Gokhale M, Lloyd S, Macaraeg C (2015) Hybrid memory cube performance characterization on data-centric workloads. In: Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms. IA3 ’15. Association for Computing Machinery, New York, NY, USA

  26. Chen R, Singapura SG, Prasanna VK (2017) Optimal dynamic data layouts for 2d FFT on 3d memory integrated FPGA. J Supercomput 73(2):652–663

    Article  Google Scholar 

  27. Oliveira G, Gómez-Luna J, Orosa L, Ghose S, Vijaykumar N, Fernandez I, Sadrosadati M, Mutlu O (2021) A new methodology and open-source benchmark suite for evaluating data movement bottlenecks: a near-data processing case study. In: SIGMETRICS

  28. Herruzo JM, Fernandez I, González-Navarro S, Plata O (2021) Enabling fast and energy-efficient FM-index exact matching using processing-near-memory. J Supercomput 77(9):10226–10251

    Article  Google Scholar 

  29. Zhang J, Khoram S, Li J (2017) Boosting the performance of fpga-based graph processor using hybrid memory cube: a case for breadth first search. Association for Computing Machinery

  30. Wang X, Leidel JD, Chen Y (2018) Memory coalescing for hybrid memory cube. In: Proceedings of the 47th International Conference on Parallel Processing. ICPP. Association for Computing Machinery

  31. Schmidt J, Fröning H, Brüning U (2016) Exploring time and energy for complex accesses to a hybrid memory cube. In: Proceedings of the Second international symposium on memory systems https://doi.org/10.1145/2989081.2989099

  32. Yu C, Liu S, Khan S (2021) Multipim: a detailed and configurable multi-stack processing-in-memory simulator. IEEE Comput Archit Lett 20(1):54–57. https://doi.org/10.1109/LCA.2021.3061905

    Article  Google Scholar 

  33. Huang J, Reddy Puli R, Majumder P, Kim S, Boyapati R, Yum KH, Kim EJ (2019) Active-routing: Compute on the way for near-data processing. In: 2019 IEEE International symposium on high performance computer architecture (HPCA), pp. 674–686 https://doi.org/10.1109/HPCA.2019.00018

  34. Mutlu O, Ghose S, Gómez-Luna J, Ausavarungnirun R (2019) Processing data where it makes sense: enabling in-memory computation. Microprocess Microsys 67:28–41

    Article  Google Scholar 

  35. Pugsley SH, Jestes J, Zhang H, Balasubramonian R, Srinivasan V, Buyuktosunoglu A, Davis A, Li F (2014) NDC: analyzing the impact of 3d-stacked memory+ logic devices on mapreduce workloads. In: 2014 ISPASS, pp. 190–200. IEEE

  36. Jeddeloh J, Keeth B (2012) Hybrid memory cube new dram architecture increases density and performance. In: 2012 Symposium on VLSI Technology (VLSIT)

  37. Che S, Boyer M, Meng J, Tarjan D, Sheaffer JW, Lee S-H, Skadron K (2009) Rodinia: A benchmark suite for heterogeneous computing. In: 2009 IEEE International symposium on workload characterization (IISWC). Ieee

  38. Jeon D-I, Chung K-S (2016) Cashmc: a cycle-accurate simulator for hybrid memory cube. IEEE Comput Archit Lett 16(1):10–13

    Article  Google Scholar 

  39. Binkert N, Beckmann B, Black G, Reinhardt SK, Saidi A, Basu A, Hestness J, Hower DR, Krishna T, Sardashti S et al (2011) The gem5 simulator. ACM SIGARCH Comput Archit News 39(2):1–7

    Article  Google Scholar 

  40. Cheveresan R, Ramsay M, Feucht C, Sharapov I (2007) Characteristics of workloads used in high performance and technical computing. In: Proceedings of the 21st Annual International Conference on Supercomputing. ICS ’07, pp. 73–82. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/1274971.1274984

  41. Rosenfeld P (2014) Performance exploration of the hybrid memory cube. PhD thesis

  42. Lee J, Kim H, Vuduc R (2012) When prefetching works, when it doesn’t, and why. ACM Transact Archit Code Optimiz (TACO) 9(1):1–29

    Article  Google Scholar 

  43. Ahn J, Yoo S, Choi K (2016) Low-power hybrid memory cubes with link power management and two-level prefetching. IEEE Transact Very Large Scale Integrat (VLSI) Systems. https://doi.org/10.1109/TVLSI.2015.2420315

    Article  Google Scholar 

  44. Technical Introduction to Bufferbloat. https://www.bufferbloat.net/projects/

  45. Medhi J (2002) Stochastic models in queueing theory. Elsevier, Armsterdam

    MATH  Google Scholar 

  46. Gulur N et al (2014) Anatomy: An analytical model of memory system performance. ACM SIGMETRICS Performance Eval. Review

  47. Flynn M (2007) Computer architecture. Wiley, New Jersey

    Google Scholar 

  48. Gandhi A et al. (2013) Exact analysis of the m/m/k/setup class of markov chains via recursive renewal reward. In: ACM International Conference on Measurement and Modeling of Computer Systems

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shubhang Pandey.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pandey, S., Venkatesh, T.G. Performance investigation of packet-based communication in 3D-memories. J Supercomput 78, 19070–19096 (2022). https://doi.org/10.1007/s11227-022-04605-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-022-04605-1

Keywords

Navigation