Skip to main content

HBM-Resident Prefetching for Heterogeneous Memory System

  • Conference paper
  • First Online:
Architecture of Computing Systems - ARCS 2017 (ARCS 2017)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10172))

Included in the following conference series:

Abstract

To meet the increasing demands for very large memory capacities, bandwidth and energy efficiency, researchers are exploring the use of heterogeneous memory systems that combine faster 3D-DRAMs, DDRx DRAM and non-volatile memories (NVMs). In this paper we evaluate prefetching in a flat-addressable heterogeneous memory comprising High Bandwidth Memory (HBM) and phase change memory (PCM). We find that large prefetch buffers (64 MB) can outperform smaller buffer sizes (2 MB), however it is not feasible to place such large buffers on the processor die. Hence, in this paper we evaluate an HBM-resident prefetch buffer that provides larger capacity and takes advantage of HBM’s higher memory bandwidth. We also present new prefetching policies that accommodate the differences in data path as compared to traditional prefetchers. We show that, reserving a small fraction (1/16th) of HBM memory to host a hardware prefetch buffer can improve IPC for a set of SPEC CPU2006 and HPC benchmarks by an average of 34% and a maximum of 98% over a baseline system with no-prefetching. Prefetching reduces total PCM traffic by 10% on average, which results in more memory traffic to the faster HBM, providing overall performance improvement. We found that such prfetching outperforms CAMEO and Alloy cache schemes on average by 60% and 10%, respectively.

M. Meswani—The author did the work while employed at AMD.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. HANA Memory Usage. http://saphanatutorial.com/sap-hana-memory-usage-explained/

  2. Mutlu, O.: Memory scaling: a systems architecture perspective. In: International Memory Workshop. IEEE (2013)

    Google Scholar 

  3. Meswani, M.R., et al.: Heterogeneous memory architectures: a HW/SW approach for mixing die-stacked and off-package memories. In: HPCA, pp. 126–136. IEEE (2015)

    Google Scholar 

  4. Qureshi, M.K., et al.: Phase change memory: from devices to systems. Synth. Lect. Comput. Archit. 6(4), 1–134 (2011)

    Article  Google Scholar 

  5. Qureshi, M.K., et al.: Scalable high performance main memory system using phase-change memory technology. ACM SIGARCH Comput. Archit. News 37(3), 24–33 (2009)

    Article  Google Scholar 

  6. Su, C., et al.: HPMC: an energy-aware management system of multi-level memory architectures. In: MEMSYS, pp. 167–178. ACM (2015)

    Google Scholar 

  7. Micron NVDIMM. https://www.micron.com/products/dram-modules/nvdimm#/

  8. 3D-XPoint. http://www.intel.com/newsroom/kits/nvm/3dxpoint/pdfs/Launch_Keynote.pdf

  9. Sim, J., et al.: Transparent hardware management of stacked dram as part of memory. In: MICRO, pp. 13–24. IEEE (2014).

    Google Scholar 

  10. Oskin, M., Loh, G.H.: A software-managed approach to die-stacked DRAM. In: PACT, pp. 188–200. IEEE (2015)

    Google Scholar 

  11. Chou, C., et al.: CAMEO: a two-level memory organization with capacity of main memory and flexibility of hardware-managed cache. In: MICRO, pp. 1–12. IEEE Computer Society (2014)

    Google Scholar 

  12. 3D-ICs. https://www.jedec.org/category/technology-focus-area/3d-ics

  13. Numonyx: PCM. http://www.pdl.cmu.edu/SDI/2009/slides/Numonyx.pdf

  14. Qureshi, M.K., Loh, G.H.: Fundamental latency trade-off in architecting DRAM caches: outperforming impractical SRAM-Tags with a simple and practical design. In: MICRO, pp. 235–246. IEEE Computer Society (2012)

    Google Scholar 

  15. Jevdjic, D., et al.: Unison cache: a scalable and effective die-stacked dram cache. In: MICRO, pp. 25–37. IEEE (2014)

    Google Scholar 

  16. Jouppi, N.P.: Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In: ISCA, pp. 364–373. IEEE (1990)

    Google Scholar 

  17. Beckmann, N., Sanchez, D.: Meeting midway: improving CMP performance with memory-side prefetching. In: PACT, pp. 289–298. IEEE (2013)

    Google Scholar 

  18. Kandiraju, G.B., Sivasubramaniam, A.: Going the distance for TLB prefetching: an application-driven study. In: IEEE Computer Society, vol. 30 (2002)

    Google Scholar 

  19. Islam, M., et al.: Prefetching as a potentially effective technique for hybrid memory optimization. In: MEMSYS. ACM (2016)

    Google Scholar 

  20. Hybrid Memory Cube Consortium. http://www.hybridmemorycube.org/

  21. Kim, J., Kim, Y.: HBM: memory solution for bandwidth-hungry processors. In: Hot Chips: A Symposium on High Performance Chips (2014)

    Google Scholar 

  22. Yoon, H., et al.: Efficient data mapping and buffering techniques for multilevel cell phase-change memories. TACO 11(4), 40 (2015). ACM

    Google Scholar 

  23. Wang, H., et al.: Duang: fast and lightweight page migration in asymmetric memory systems. In: HPCA, pp. 481–493. IEEE (2016)

    Google Scholar 

  24. Fu, J.W., et al.: Stride directed prefetching in scalar processors. ACM SIGMICRO Newslett. 23(1–2), 102–110 (1992)

    Article  Google Scholar 

  25. Joseph, D., Grunwald, D.: Prefetching using Markov predictors. In: ACM SIGARCH Computer Architecture News, vol. 25, pp. 252–263. ACM (1997)

    Google Scholar 

  26. Ahn, J., et al.: Low-power hybrid memory cubes with link power management and two-level prefetching. Trans. VLSI Syst. 24(2), 453–464 (2016). IEEE

    Article  Google Scholar 

  27. Yoon, H., et al.: Row buffer locality aware caching policies for hybrid memories. In: International Conference on Computer Design, pp. 337–344. IEEE (2012).

    Google Scholar 

  28. Nesbit, K.J., Smith, J.E.: Data cache prefetching using a global history buffer. In: IEE Proceedings Software, p. 96. IEEE (2004)

    Google Scholar 

  29. Jiang, X., et al.: Chop: adaptive filter-based dram caching for CMP server platforms. In: HPCA, pp. 1–12. IEEE (2010)

    Google Scholar 

  30. Kim, Y., et al.: Ramulator: a fast and extensible dram simulator. In: Computer Architecture Letters (2015)

    Google Scholar 

  31. Nair, P.J., et al.: Reducing read latency of phase change memory via early read and turbo read. In: HPCA, pp. 309–319. IEEE (2015).

    Google Scholar 

  32. Intel PinPlay. https://software.intel.com/en-us/articles/program-recordreplay-toolkit

  33. Shelor, C.F., Kavi, K.M.: Moola: multicore cache simulator. In: International Conference on Computers and Their Applications (2015)

    Google Scholar 

  34. SPEC CPU 2006. https://www.spec.org/cpu2006/

  35. Proxy-Apps for Neutronics. https://cesar.mcs.anl.gov/content/software/neutronics

  36. Lawrence Livermore National Laboratory: Hydrodynamics challenge problem. In: Technical report LLNL-TR-490254

    Google Scholar 

  37. Mohd-Yusof, J., et al.: Co-design for molecular dynamics: an exascale proxy application (2013)

    Google Scholar 

  38. Heroux, M., Hammond, S.: MiniFE: finite element solver. https://portal.nersc.gov/project/CAL/designforward.htm#MiniFE

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Mahzabeen Islam or Krishna M. Kavi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Islam, M., Kavi, K.M., Meswani, M., Banerjee, S., Jayasena, N. (2017). HBM-Resident Prefetching for Heterogeneous Memory System. In: Knoop, J., Karl, W., Schulz, M., Inoue, K., Pionteck, T. (eds) Architecture of Computing Systems - ARCS 2017. ARCS 2017. Lecture Notes in Computer Science(), vol 10172. Springer, Cham. https://doi.org/10.1007/978-3-319-54999-6_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-54999-6_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-54998-9

  • Online ISBN: 978-3-319-54999-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics