Performance investigation of packet-based communication in 3D-memories

Pandey, Shubhang; Venkatesh, T. G.

doi:10.1007/s11227-022-04605-1

Performance investigation of packet-based communication in 3D-memories

Published: 15 June 2022

Volume 78, pages 19070–19096, (2022)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

320 Accesses
Explore all metrics

Abstract

Recent advances in 3D fabrication have allowed the development of 3D memory over the logic die. The 3D memory presents itself as a viable solution to the memory wall problem. The 3D memory has stacked DRAM layers connected with Through Silicon Vias (TSVs). In the coming future, data-intensive applications on memory-centric network architecture will rely on packet-based communication to ensure scalability and reliable data transfers. The paper studies the performance of 3D memory that uses a packet-based communication protocol for communication between the CPU and off-chip memory. Our study provides insight into the internal flit traffic for different configurations of 3D memory when observed under the diverse memory access patterns and workload characteristics. We use CasHMC to capture the effect on performance for packet-based communication protocol and have integrated it with the gem5 simulator to study the workloads from Rodinia Benchmarks Suite. Our evaluation focuses on the following metrics- total memory bandwidth utilization, off-chip link bandwidth utilization, latency, & power consumption. We look at the performance characteristics of the 3D stacked memory, under the variation of the number of banks and vaults in the structure. Further, the effect of varying the packet size and the number of communication links on off-chip link bandwidth and latency have been studied. We further examine different off-chip link power optimization strategies. Finally, we observe the impact of varying buffer sizes on the latency at the off-chip links buffer and at the vault buffer of 3D memory. Our study provides more perspective into further developments of Data Centric Computing architectures and insight into proper flit management strategies in future memory architectures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Overcoming and Analyzing the Bottleneck of Interposer Network in 2.5D NoC Architecture

Performance Analysis of Data Communication Using Hybrid NoC for Low Latency and High Throughput on FPGA

Coarse Granularity Data Migration Based Power Management Mechanism for 3D DRAM Cache

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

Reinsel D, Gantz J, Rydning J et al (2018) The digitization of the world from edge to core. Framingham Int Data Corporat 16
Wulf WA, McKee SA (1995) Hitting the memory wall: implications of the obvious. ACM SIGARCH Comp Archit News 23(1):20–24
Article Google Scholar
Ahn J, Yoo S, Mutlu O, Choi K (2015) Pim-enabled instructions: a low-overhead, locality-aware processing-in-memory architecture. In: 2015 ACM/IEEE 42nd annual international symposium on computer architecture (ISCA), pp. 336–348. IEEE
Hadidi R, Nai L, Kim H, Kim H (2017) Cairo: a compiler-assisted technique for enabling instruction-level offloading of processing-in-memory. ACM Trans Archit Code Optimizat (TACO) 14(4):1–25
Article Google Scholar
Zhang C, Meng T, Sun G (2018) Pm3: power modeling and power management for processing-in-memory. In: 2018 IEEE International symposium on high performance computer architecture (HPCA), pp. 558–570. IEEE
Pawlowski JT (2011) Hybrid memory cube (hmc). In: 2011 IEEE hot chips 23 symposium (HCS), pp. 1–24. https://doi.org/10.1109/HOTCHIPS.2011.7477494
Macri J (2015) Amd’s next generation gpu and high bandwidth memory architecture: fury. In: 2015 IEEE hot chips 27 symposium (HCS), pp. 1–26. https://doi.org/10.1109/HOTCHIPS.2015.7477461
Samsung speeds a with processing in memory. IEEE Spectrum
Kim G, Kim J, Ahn JH, Kim J (2013) Memory-centric system interconnect design with hybrid memory cubes. In: Proceedings of the 22nd international Conference on Parallel Architectures and Compilation Techniques, pp. 145–155. IEEE
Penney DD, Chen L (2019) A survey of machine learning applied to computer architecture design. arXiv preprint arXiv:1909.12373
DiTomaso D, Sikder A, Kodi A, Louri A (2017) Machine learning enabled power-aware network-on-chip design. In: Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017, pp. 1354–1359. IEEE
Henessey J, Patterson D (1990) Computer architecture: a quantitative approach mogran kaufman publishers. Palo Alto, CA
Google Scholar
Consortium H et al (2013) Hybrid memory cube specification 2.1. Retrieved from hybridmemorycube.org. https://www.nuvation.com/sites/default/files/Nuvation-Engineering-Images/Articles/FPGAs-and-HMC/HMC-30G-VSR_HMCC_Specification.pdf
Hadidi R, Asgari B, Mudassar BA, Mukhopadhyay S, Yalamanchili S, Kim H (2017) Demystifying the characteristics of 3d-stacked memories: a case study for hybrid memory cube. In: 2017 IEEE international symposium on workload characterization (IISWC). IEEE
Menon S, Murugan VI (2020) Validating and characterizing a 2.5d high bandwidth memory subsystem. In: 2020 IEEE International Test Conference India, pp. 1–9. https://doi.org/10.1109/ITCIndia49857.2020.9171795
Glew A (1998) MLP yes! ILP no. ASPLOS wild and crazy idea session 98
Chou Y, Fahs B, Abraham S (2004) Microarchitecture optimizations for exploiting memory-level parallelism. In: Proceedings. 31st annual international symposium on computer architecture, 2004., pp. 76–87 (2004). IEEE
Khan K, Pasricha S, Kim RG (2020) A survey of resource management for processing-in-memory and near-memory processing architectures. J Low Power Electr Appl. https://doi.org/10.3390/jlpea10040030
Article Google Scholar
Rosenfeld P, Cooper-Balis E, Farrell T, Resnick D, Jacob B (2012) Peering over the memory wall: design space and performance analysis of the hybrid memory cube. Univ. of Maryland Systems and Computer Architecture Group, Tech. Rep. UMD-SCA-2012-10-01
Cabarcas F, Rico A, Etsion Y, Ramirez A (2010) Interleaving granularity on high bandwidth memory architecture for cmps. In: 2010 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation, pp. 250–257 https://doi.org/10.1109/ICSAMOS.2010.5642060
Hsieh K, Ebrahimi E, Kim G, Chatterjee N, O’Connor M, Vijaykumar N, Mutlu O, Keckler SW (2016) Transparent offloading and mapping (tom) enabling programmer-transparent near-data processing in GPU systems. ACM SIGARCH Comput Archit News 44(3):204–216
Article Google Scholar
Loh GH (2008) 3d-stacked memory architectures for multi-core processors. ACM SIGARCH Comput Archit News 36(3):453–464
Article Google Scholar
Ibrahim KZ, Fatollahi-Fard F, Donofrio D, Shalf J (2016) Characterizing the performance of hybrid memory cube using apexmap application probes. In: Proceedings of the second international symposium on memory systems, pp. 429–436
Hadidi R, Asgari B, Young J, Mudassar BA, Garg K, Krishna T, Kim H (2018) Performance implications of NOCS on 3d-stacked memories: insights from the hybrid memory cube. In: 2018 ISPASS. IEEE
Gokhale M, Lloyd S, Macaraeg C (2015) Hybrid memory cube performance characterization on data-centric workloads. In: Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms. IA3 ’15. Association for Computing Machinery, New York, NY, USA
Chen R, Singapura SG, Prasanna VK (2017) Optimal dynamic data layouts for 2d FFT on 3d memory integrated FPGA. J Supercomput 73(2):652–663
Article Google Scholar
Oliveira G, Gómez-Luna J, Orosa L, Ghose S, Vijaykumar N, Fernandez I, Sadrosadati M, Mutlu O (2021) A new methodology and open-source benchmark suite for evaluating data movement bottlenecks: a near-data processing case study. In: SIGMETRICS
Herruzo JM, Fernandez I, González-Navarro S, Plata O (2021) Enabling fast and energy-efficient FM-index exact matching using processing-near-memory. J Supercomput 77(9):10226–10251
Article Google Scholar
Zhang J, Khoram S, Li J (2017) Boosting the performance of fpga-based graph processor using hybrid memory cube: a case for breadth first search. Association for Computing Machinery
Wang X, Leidel JD, Chen Y (2018) Memory coalescing for hybrid memory cube. In: Proceedings of the 47th International Conference on Parallel Processing. ICPP. Association for Computing Machinery
Schmidt J, Fröning H, Brüning U (2016) Exploring time and energy for complex accesses to a hybrid memory cube. In: Proceedings of the Second international symposium on memory systems https://doi.org/10.1145/2989081.2989099
Yu C, Liu S, Khan S (2021) Multipim: a detailed and configurable multi-stack processing-in-memory simulator. IEEE Comput Archit Lett 20(1):54–57. https://doi.org/10.1109/LCA.2021.3061905
Article Google Scholar
Huang J, Reddy Puli R, Majumder P, Kim S, Boyapati R, Yum KH, Kim EJ (2019) Active-routing: Compute on the way for near-data processing. In: 2019 IEEE International symposium on high performance computer architecture (HPCA), pp. 674–686 https://doi.org/10.1109/HPCA.2019.00018
Mutlu O, Ghose S, Gómez-Luna J, Ausavarungnirun R (2019) Processing data where it makes sense: enabling in-memory computation. Microprocess Microsys 67:28–41
Article Google Scholar
Pugsley SH, Jestes J, Zhang H, Balasubramonian R, Srinivasan V, Buyuktosunoglu A, Davis A, Li F (2014) NDC: analyzing the impact of 3d-stacked memory+ logic devices on mapreduce workloads. In: 2014 ISPASS, pp. 190–200. IEEE
Jeddeloh J, Keeth B (2012) Hybrid memory cube new dram architecture increases density and performance. In: 2012 Symposium on VLSI Technology (VLSIT)
Che S, Boyer M, Meng J, Tarjan D, Sheaffer JW, Lee S-H, Skadron K (2009) Rodinia: A benchmark suite for heterogeneous computing. In: 2009 IEEE International symposium on workload characterization (IISWC). Ieee
Jeon D-I, Chung K-S (2016) Cashmc: a cycle-accurate simulator for hybrid memory cube. IEEE Comput Archit Lett 16(1):10–13
Article Google Scholar
Binkert N, Beckmann B, Black G, Reinhardt SK, Saidi A, Basu A, Hestness J, Hower DR, Krishna T, Sardashti S et al (2011) The gem5 simulator. ACM SIGARCH Comput Archit News 39(2):1–7
Article Google Scholar
Cheveresan R, Ramsay M, Feucht C, Sharapov I (2007) Characteristics of workloads used in high performance and technical computing. In: Proceedings of the 21st Annual International Conference on Supercomputing. ICS ’07, pp. 73–82. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/1274971.1274984
Rosenfeld P (2014) Performance exploration of the hybrid memory cube. PhD thesis
Lee J, Kim H, Vuduc R (2012) When prefetching works, when it doesn’t, and why. ACM Transact Archit Code Optimiz (TACO) 9(1):1–29
Article Google Scholar
Ahn J, Yoo S, Choi K (2016) Low-power hybrid memory cubes with link power management and two-level prefetching. IEEE Transact Very Large Scale Integrat (VLSI) Systems. https://doi.org/10.1109/TVLSI.2015.2420315
Article Google Scholar
Technical Introduction to Bufferbloat. https://www.bufferbloat.net/projects/
Medhi J (2002) Stochastic models in queueing theory. Elsevier, Armsterdam
MATH Google Scholar
Gulur N et al (2014) Anatomy: An analytical model of memory system performance. ACM SIGMETRICS Performance Eval. Review
Flynn M (2007) Computer architecture. Wiley, New Jersey
Google Scholar
Gandhi A et al. (2013) Exact analysis of the m/m/k/setup class of markov chains via recursive renewal reward. In: ACM International Conference on Measurement and Modeling of Computer Systems

Download references

Author information

Authors and Affiliations

Electrical Engineering Department, Indian Institute of Technology Madras, Chennai, India
Shubhang Pandey & T. G. Venkatesh

Authors

Shubhang Pandey
View author publications
You can also search for this author inPubMed Google Scholar
T. G. Venkatesh
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Shubhang Pandey.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pandey, S., Venkatesh, T.G. Performance investigation of packet-based communication in 3D-memories. J Supercomput 78, 19070–19096 (2022). https://doi.org/10.1007/s11227-022-04605-1

Download citation

Accepted: 12 May 2022
Published: 15 June 2022
Issue Date: November 2022
DOI: https://doi.org/10.1007/s11227-022-04605-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Performance investigation of packet-based communication in 3D-memories

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Overcoming and Analyzing the Bottleneck of Interposer Network in 2.5D NoC Architecture

Performance Analysis of Data Communication Using Hybrid NoC for Low Latency and High Throughput on FPGA

Coarse Granularity Data Migration Based Power Management Mechanism for 3D DRAM Cache

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now