skip to main content
10.1145/3149393.3149394acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Architecting HBM as a high bandwidth, high capacity, self-managed last-level cache

Published:12 November 2017Publication History

ABSTRACT

Due to the recent growth in the number of on-chip cores available in today's multi-core processors, there is an increased demand for memory bandwidth and capacity. However, off-chip DRAM is not scaling at the rate necessary for the growth in number of on-chip cores. Stacked DRAM last-level caches have been proposed to alleviate these bandwidth constraints, however, many of these ideas are not practical for real systems, or may not take advantage of the features available in today's stacked DRAM variants.

In this paper, we design a last-level, stacked DRAM cache that is practical for real-world systems and takes advantage of High Bandwidth Memory (HBM) [1]. Our HBM cache only requires one minor change to existing memory controllers to support communication. It uses HBM's built-in logic die to handle tag storage and lookups. We also introduce novel tag/data storage that enables faster lookups, associativity, and more capacity than previous designs.

References

  1. JEDEC Standard, "High Bandwidth Memory (HBM) DRAM," in JESD235A, 2015.Google ScholarGoogle Scholar
  2. M. K. Qureshi and G. H. Loh, "Fundamental latency tradeoff in architecting DRAM caches: Outperforming impractical SRAM-tags with a simple and practical design", International Symposium on Microarchitecture, 2012, pp. 235--246. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. Milojevic, S. Idgunji, D. Jevdjic, E. Ozer, P. Lotfi-Kamran, A. Panteli, A. Prodromou, C. Nicopoulos, D. Hardy, B. Falsari et al., "Thermal characterization of cloud workloads on a power-efficient server-on-chip", International Conference on Computer Design (ICCD), 2012, pp. 175--182. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. R. Meswani, S. Blagodurov, D. Roberts, J. Slice, M. Ignatowski, and G. Loh, "Heterogeneous Memory Architectures: A HW/SW Approach for Mixing Die-stacked and Off-package Memories", International Symposium on High Performance Computer Architecture (HPCA), 2015.Google ScholarGoogle ScholarCross RefCross Ref
  5. S. Mittal, and J.S. Vetter, "A Survey Of Techniques for Architecting DRAM Caches", IEEE Transactions on Parallel and Distributed Systems, 2015.Google ScholarGoogle Scholar
  6. R. Kalla, B. Sinharoy, W.J. Starke, and M. Floyd, "Power7: IBM's Next-Generation Server Processor", IEEE Micro, 2010, vol. 30, no. 2, pp. 7--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M.-T. Chang, P. Rosenfeld, S.-L. Lu, and B. Jacob, "Technology Comparison for Large Last-Level Caches (L3Cs): Low-Leakage SRAM, Low Write-Energy STT-RAM, and Refresh-Optimized eDRAM", International Symposium on High Performance Computer Architecture (HPCA), 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Y. Kim, V. Seshadri, D. Lee, J. Liu, and O. Mutlu, "A case for exploiting subarray-level parallelism (SALP) in DRAM", International Symposium on Computer Architecture (ISCA), 2012, pp. 368--379. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. (2014). {Online}. Available: http://wccftech.com/intel-xeon-phiknights-landing-processors-stacked-dram-hmc-16gb/Google ScholarGoogle Scholar
  10. (2015). {Online}. Available: http://www.amd.com/en-us/innovations/software-technologies/hbmGoogle ScholarGoogle Scholar
  11. B. Pourshirazi and Z. Zhu, "Refree: A Refresh-Free Hybrid DRAM/PCM Main Memory System", International Parallel and Distributed Processing Symposium (IPDPS), 2016, pp. 566--575.Google ScholarGoogle ScholarCross RefCross Ref
  12. N. Gulur, M. Mehendale, R. Manikantan, and R. Govindarajan, "Bi-Modal DRAM Cache: Improving Hit Rate, Hit Latency and Bandwidth", International Symposium on Microarchitecture (MICRO), 2014, pp. 38--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. L. Zhao, R. Iyer, R. Illikkal, and D. Newell, "Exploring DRAM cache architectures for CMP server platforms", International Conference on Computer Design (ICCD), 2007, pp. 55--62.Google ScholarGoogle ScholarCross RefCross Ref
  14. N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood, "The gem5 simulator", SIGARCH Comput. Archit. News, vol. 39, no. 2, pp. 1--7, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Poremba, T. Zhang, and Y. Xie, "NVMain 2.0: Architectural Simulator to Model (Non-)Volatile Memory Systems", Computer Architecture Letters (CAL), 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. O. Naji, A. Hansson, C. Weis, M. Jung, N. Wehn, "A High-Level DRAM Timing, Power and Area Exploration Tool", IEEE International Conference on Embedded Computer Systems Architectures Modeling and Simulation (SAMOS), 2015.Google ScholarGoogle Scholar
  17. JEDEC Standard, "DDR4 SDRAM Standard," in JESD79-4A, 2013.Google ScholarGoogle Scholar
  18. P. K. Tschirhart, "Multi-Level Main Memory Systems: Technology Choices, Design Considerations, and Trade-off Analysis.", 2015.Google ScholarGoogle Scholar
  19. C. Bienia, K. Sanjeev, J.P. Singh, and K. Li, "The PARSEC benchmark suite: characterization and architectural implications", Parallel Architectures and Compilation Techniques (PACT), 2008, pp. 72--81. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D. Bailey, E. Barszcz, J. Barton, D. Browning, R. Carter, L. Dagum, R. Fatoohi, S. Fineberg, P. Frederickson, T. Lasinski, R. Schreiber, H. Simon, V. Venkatakrishnan, and S. Weeratunga, "The NAS Parallel Benchmarks", International Journal of High Performance Computing Applications, vol. 5, no. 3, pp. 63--73, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. Architecting HBM as a high bandwidth, high capacity, self-managed last-level cache

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      PDSW-DISCS '17: Proceedings of the 2nd Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems
      November 2017
      74 pages
      ISBN:9781450351348
      DOI:10.1145/3149393

      Copyright © 2017 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 November 2017

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate17of41submissions,41%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader