HBM-Resident Prefetching for Heterogeneous Memory System

Islam, Mahzabeen; Kavi, Krishna M.; Meswani, Mitesh; Banerjee, Soumik; Jayasena, Nuwan

doi:10.1007/978-3-319-54999-6_10

Mahzabeen Islam¹⁸,
Krishna M. Kavi¹⁸,
Mitesh Meswani¹⁹,
Soumik Banerjee²⁰ &
…
Nuwan Jayasena²⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10172))

Included in the following conference series:

International Conference on Architecture of Computing Systems

1270 Accesses
5 Citations
1 Altmetric

Abstract

To meet the increasing demands for very large memory capacities, bandwidth and energy efficiency, researchers are exploring the use of heterogeneous memory systems that combine faster 3D-DRAMs, DDRx DRAM and non-volatile memories (NVMs). In this paper we evaluate prefetching in a flat-addressable heterogeneous memory comprising High Bandwidth Memory (HBM) and phase change memory (PCM). We find that large prefetch buffers (64 MB) can outperform smaller buffer sizes (2 MB), however it is not feasible to place such large buffers on the processor die. Hence, in this paper we evaluate an HBM-resident prefetch buffer that provides larger capacity and takes advantage of HBM’s higher memory bandwidth. We also present new prefetching policies that accommodate the differences in data path as compared to traditional prefetchers. We show that, reserving a small fraction (1/16th) of HBM memory to host a hardware prefetch buffer can improve IPC for a set of SPEC CPU2006 and HPC benchmarks by an average of 34% and a maximum of 98% over a baseline system with no-prefetching. Prefetching reduces total PCM traffic by 10% on average, which results in more memory traffic to the faster HBM, providing overall performance improvement. We found that such prfetching outperforms CAMEO and Alloy cache schemes on average by 60% and 10%, respectively.

M. Meswani—The author did the work while employed at AMD.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

HANA Memory Usage. http://saphanatutorial.com/sap-hana-memory-usage-explained/
Mutlu, O.: Memory scaling: a systems architecture perspective. In: International Memory Workshop. IEEE (2013)
Google Scholar
Meswani, M.R., et al.: Heterogeneous memory architectures: a HW/SW approach for mixing die-stacked and off-package memories. In: HPCA, pp. 126–136. IEEE (2015)
Google Scholar
Qureshi, M.K., et al.: Phase change memory: from devices to systems. Synth. Lect. Comput. Archit. 6(4), 1–134 (2011)
Article Google Scholar
Qureshi, M.K., et al.: Scalable high performance main memory system using phase-change memory technology. ACM SIGARCH Comput. Archit. News 37(3), 24–33 (2009)
Article Google Scholar
Su, C., et al.: HPMC: an energy-aware management system of multi-level memory architectures. In: MEMSYS, pp. 167–178. ACM (2015)
Google Scholar
Micron NVDIMM. https://www.micron.com/products/dram-modules/nvdimm#/
3D-XPoint. http://www.intel.com/newsroom/kits/nvm/3dxpoint/pdfs/Launch_Keynote.pdf
Sim, J., et al.: Transparent hardware management of stacked dram as part of memory. In: MICRO, pp. 13–24. IEEE (2014).
Google Scholar
Oskin, M., Loh, G.H.: A software-managed approach to die-stacked DRAM. In: PACT, pp. 188–200. IEEE (2015)
Google Scholar
Chou, C., et al.: CAMEO: a two-level memory organization with capacity of main memory and flexibility of hardware-managed cache. In: MICRO, pp. 1–12. IEEE Computer Society (2014)
Google Scholar
3D-ICs. https://www.jedec.org/category/technology-focus-area/3d-ics
Numonyx: PCM. http://www.pdl.cmu.edu/SDI/2009/slides/Numonyx.pdf
Qureshi, M.K., Loh, G.H.: Fundamental latency trade-off in architecting DRAM caches: outperforming impractical SRAM-Tags with a simple and practical design. In: MICRO, pp. 235–246. IEEE Computer Society (2012)
Google Scholar
Jevdjic, D., et al.: Unison cache: a scalable and effective die-stacked dram cache. In: MICRO, pp. 25–37. IEEE (2014)
Google Scholar
Jouppi, N.P.: Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In: ISCA, pp. 364–373. IEEE (1990)
Google Scholar
Beckmann, N., Sanchez, D.: Meeting midway: improving CMP performance with memory-side prefetching. In: PACT, pp. 289–298. IEEE (2013)
Google Scholar
Kandiraju, G.B., Sivasubramaniam, A.: Going the distance for TLB prefetching: an application-driven study. In: IEEE Computer Society, vol. 30 (2002)
Google Scholar
Islam, M., et al.: Prefetching as a potentially effective technique for hybrid memory optimization. In: MEMSYS. ACM (2016)
Google Scholar
Hybrid Memory Cube Consortium. http://www.hybridmemorycube.org/
Kim, J., Kim, Y.: HBM: memory solution for bandwidth-hungry processors. In: Hot Chips: A Symposium on High Performance Chips (2014)
Google Scholar
Yoon, H., et al.: Efficient data mapping and buffering techniques for multilevel cell phase-change memories. TACO 11(4), 40 (2015). ACM
Google Scholar
Wang, H., et al.: Duang: fast and lightweight page migration in asymmetric memory systems. In: HPCA, pp. 481–493. IEEE (2016)
Google Scholar
Fu, J.W., et al.: Stride directed prefetching in scalar processors. ACM SIGMICRO Newslett. 23(1–2), 102–110 (1992)
Article Google Scholar
Joseph, D., Grunwald, D.: Prefetching using Markov predictors. In: ACM SIGARCH Computer Architecture News, vol. 25, pp. 252–263. ACM (1997)
Google Scholar
Ahn, J., et al.: Low-power hybrid memory cubes with link power management and two-level prefetching. Trans. VLSI Syst. 24(2), 453–464 (2016). IEEE
Article Google Scholar
Yoon, H., et al.: Row buffer locality aware caching policies for hybrid memories. In: International Conference on Computer Design, pp. 337–344. IEEE (2012).
Google Scholar
Nesbit, K.J., Smith, J.E.: Data cache prefetching using a global history buffer. In: IEE Proceedings Software, p. 96. IEEE (2004)
Google Scholar
Jiang, X., et al.: Chop: adaptive filter-based dram caching for CMP server platforms. In: HPCA, pp. 1–12. IEEE (2010)
Google Scholar
Kim, Y., et al.: Ramulator: a fast and extensible dram simulator. In: Computer Architecture Letters (2015)
Google Scholar
Nair, P.J., et al.: Reducing read latency of phase change memory via early read and turbo read. In: HPCA, pp. 309–319. IEEE (2015).
Google Scholar
Intel PinPlay. https://software.intel.com/en-us/articles/program-recordreplay-toolkit
Shelor, C.F., Kavi, K.M.: Moola: multicore cache simulator. In: International Conference on Computers and Their Applications (2015)
Google Scholar
SPEC CPU 2006. https://www.spec.org/cpu2006/
Proxy-Apps for Neutronics. https://cesar.mcs.anl.gov/content/software/neutronics
Lawrence Livermore National Laboratory: Hydrodynamics challenge problem. In: Technical report LLNL-TR-490254
Google Scholar
Mohd-Yusof, J., et al.: Co-design for molecular dynamics: an exascale proxy application (2013)
Google Scholar
Heroux, M., Hammond, S.: MiniFE: finite element solver. https://portal.nersc.gov/project/CAL/designforward.htm#MiniFE

Download references

Author information

Authors and Affiliations

University of North Texas, Denton, USA
Mahzabeen Islam & Krishna M. Kavi
ARM, Austin, USA
Mitesh Meswani
Advanced Micro Devices, Inc., Austin, USA
Soumik Banerjee & Nuwan Jayasena

Authors

Mahzabeen Islam
View author publications
You can also search for this author in PubMed Google Scholar
Krishna M. Kavi
View author publications
You can also search for this author in PubMed Google Scholar
Mitesh Meswani
View author publications
You can also search for this author in PubMed Google Scholar
Soumik Banerjee
View author publications
You can also search for this author in PubMed Google Scholar
Nuwan Jayasena
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Mahzabeen Islam or Krishna M. Kavi .

Editor information

Editors and Affiliations

Vienna University of Technology, Vienna, Austria
Jens Knoop
Karlsruhe Institute of Technology, Karlsruhe, Germany
Wolfgang Karl
Lawrence Livermore National Laboratory, Livermore, USA
Martin Schulz
Kyushu University, Fukuoka, Japan
Koji Inoue
Otto-von-Guericke Universität Magdeburg, Magdeburg, Germany
Thilo Pionteck

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Islam, M., Kavi, K.M., Meswani, M., Banerjee, S., Jayasena, N. (2017). HBM-Resident Prefetching for Heterogeneous Memory System. In: Knoop, J., Karl, W., Schulz, M., Inoue, K., Pionteck, T. (eds) Architecture of Computing Systems - ARCS 2017. ARCS 2017. Lecture Notes in Computer Science(), vol 10172. Springer, Cham. https://doi.org/10.1007/978-3-319-54999-6_10

Download citation

DOI: https://doi.org/10.1007/978-3-319-54999-6_10
Published: 04 March 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-54998-9
Online ISBN: 978-3-319-54999-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics