research-article

Architecting HBM as a high bandwidth, high capacity, self-managed last-level cache

Authors:
Tyler Stocksdale

North Carolina State University

North Carolina State University
View Profile

,
Mu-Tien Chang

Samsung Semiconductor Inc.

Samsung Semiconductor Inc.
View Profile

,
Hongzhong Zheng

Samsung Semiconductor Inc.

Samsung Semiconductor Inc.
View Profile

,
Frank Mueller

North Carolina State University

North Carolina State University
View Profile

PDSW-DISCS '17: Proceedings of the 2nd Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing SystemsNovember 2017Pages 31–36https://doi.org/10.1145/3149393.3149394

Published:12 November 2017Publication History

PDSW-DISCS '17: Proceedings of the 2nd Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems

Pages 31–36

ABSTRACT

Due to the recent growth in the number of on-chip cores available in today's multi-core processors, there is an increased demand for memory bandwidth and capacity. However, off-chip DRAM is not scaling at the rate necessary for the growth in number of on-chip cores. Stacked DRAM last-level caches have been proposed to alleviate these bandwidth constraints, however, many of these ideas are not practical for real systems, or may not take advantage of the features available in today's stacked DRAM variants.

In this paper, we design a last-level, stacked DRAM cache that is practical for real-world systems and takes advantage of High Bandwidth Memory (HBM) [1]. Our HBM cache only requires one minor change to existing memory controllers to support communication. It uses HBM's built-in logic die to handle tag storage and lookups. We also introduce novel tag/data storage that enables faster lookups, associativity, and more capacity than previous designs.

References

JEDEC Standard, "High Bandwidth Memory (HBM) DRAM," in JESD235A, 2015.Google Scholar
M. K. Qureshi and G. H. Loh, "Fundamental latency tradeoff in architecting DRAM caches: Outperforming impractical SRAM-tags with a simple and practical design", International Symposium on Microarchitecture, 2012, pp. 235--246. Google ScholarDigital Library
D. Milojevic, S. Idgunji, D. Jevdjic, E. Ozer, P. Lotfi-Kamran, A. Panteli, A. Prodromou, C. Nicopoulos, D. Hardy, B. Falsari et al., "Thermal characterization of cloud workloads on a power-efficient server-on-chip", International Conference on Computer Design (ICCD), 2012, pp. 175--182. Google ScholarDigital Library
M. R. Meswani, S. Blagodurov, D. Roberts, J. Slice, M. Ignatowski, and G. Loh, "Heterogeneous Memory Architectures: A HW/SW Approach for Mixing Die-stacked and Off-package Memories", International Symposium on High Performance Computer Architecture (HPCA), 2015.Google ScholarCross Ref
S. Mittal, and J.S. Vetter, "A Survey Of Techniques for Architecting DRAM Caches", IEEE Transactions on Parallel and Distributed Systems, 2015.Google Scholar
R. Kalla, B. Sinharoy, W.J. Starke, and M. Floyd, "Power7: IBM's Next-Generation Server Processor", IEEE Micro, 2010, vol. 30, no. 2, pp. 7--15. Google ScholarDigital Library
M.-T. Chang, P. Rosenfeld, S.-L. Lu, and B. Jacob, "Technology Comparison for Large Last-Level Caches (L3Cs): Low-Leakage SRAM, Low Write-Energy STT-RAM, and Refresh-Optimized eDRAM", International Symposium on High Performance Computer Architecture (HPCA), 2013. Google ScholarDigital Library
Y. Kim, V. Seshadri, D. Lee, J. Liu, and O. Mutlu, "A case for exploiting subarray-level parallelism (SALP) in DRAM", International Symposium on Computer Architecture (ISCA), 2012, pp. 368--379. Google ScholarDigital Library
(2014). {Online}. Available: http://wccftech.com/intel-xeon-phiknights-landing-processors-stacked-dram-hmc-16gb/Google Scholar
(2015). {Online}. Available: http://www.amd.com/en-us/innovations/software-technologies/hbmGoogle Scholar
B. Pourshirazi and Z. Zhu, "Refree: A Refresh-Free Hybrid DRAM/PCM Main Memory System", International Parallel and Distributed Processing Symposium (IPDPS), 2016, pp. 566--575.Google ScholarCross Ref
N. Gulur, M. Mehendale, R. Manikantan, and R. Govindarajan, "Bi-Modal DRAM Cache: Improving Hit Rate, Hit Latency and Bandwidth", International Symposium on Microarchitecture (MICRO), 2014, pp. 38--50. Google ScholarDigital Library
L. Zhao, R. Iyer, R. Illikkal, and D. Newell, "Exploring DRAM cache architectures for CMP server platforms", International Conference on Computer Design (ICCD), 2007, pp. 55--62.Google ScholarCross Ref
N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood, "The gem5 simulator", SIGARCH Comput. Archit. News, vol. 39, no. 2, pp. 1--7, 2011. Google ScholarDigital Library
M. Poremba, T. Zhang, and Y. Xie, "NVMain 2.0: Architectural Simulator to Model (Non-)Volatile Memory Systems", Computer Architecture Letters (CAL), 2015. Google ScholarDigital Library
O. Naji, A. Hansson, C. Weis, M. Jung, N. Wehn, "A High-Level DRAM Timing, Power and Area Exploration Tool", IEEE International Conference on Embedded Computer Systems Architectures Modeling and Simulation (SAMOS), 2015.Google Scholar
JEDEC Standard, "DDR4 SDRAM Standard," in JESD79-4A, 2013.Google Scholar
P. K. Tschirhart, "Multi-Level Main Memory Systems: Technology Choices, Design Considerations, and Trade-off Analysis.", 2015.Google Scholar
C. Bienia, K. Sanjeev, J.P. Singh, and K. Li, "The PARSEC benchmark suite: characterization and architectural implications", Parallel Architectures and Compilation Techniques (PACT), 2008, pp. 72--81. Google ScholarDigital Library
D. Bailey, E. Barszcz, J. Barton, D. Browning, R. Carter, L. Dagum, R. Fatoohi, S. Fineberg, P. Frederickson, T. Lasinski, R. Schreiber, H. Simon, V. Venkatakrishnan, and S. Weeratunga, "The NAS Parallel Benchmarks", International Journal of High Performance Computing Applications, vol. 5, no. 3, pp. 63--73, 1991. Google ScholarDigital Library

Architecting HBM as a high bandwidth, high capacity, self-managed last-level cache
1. Hardware
  1. Integrated circuits
    1. Semiconductor memory

Recommendations

Efficient STT-RAM last-level-cache architecture to replace DRAM cache
MEMSYS '17: Proceedings of the International Symposium on Memory Systems

Recent research has proposed die-stacked Last Level Cache (LLC) to overcome the Memory Wall. Lately, Spin-Transfer-Torque Random Access Memory (STT-RAM) caches have been recommended as they provide improved energy efficiency compared to DRAM caches. ...
Read More
Coding Last Level STT-RAM Cache for High Endurance and Low Power

STT-RAM technology has recently emerged as one of the most promising memory technologies. However, its major problems, limited write endurance and high write energy, are still preventing it from being used as a drop-in replacement of SRAM cache. In this ...
Read More
Architecting the Last-Level Cache for GPUs using STT-RAM Technology
Special Issue on Reliable, Resilient, and Robust Design of Circuits and Systems

Future GPUs should have larger L2 caches based on the current trends in VLSI technology and GPU architectures toward increase of processing core count. Larger L2 caches inevitably have proportionally larger power consumption. In this article, having ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PDSW-DISCS '17: Proceedings of the 2nd Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems
November 2017
74 pages
ISBN:9781450351348
DOI:10.1145/3149393
Program Chairs:
Kathryn Mohror
Lawrence Livermore National Laboratory
,
Brent Welch
Google
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 November 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate17of41submissions,41%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 366
  Total Downloads
- Downloads (Last 12 months)53
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Architecting HBM as a high bandwidth, high capacity, self-managed last-level cache

PDSW-DISCS '17: Proceedings of the 2nd Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems

ABSTRACT

References

Cited By

Recommendations

Efficient STT-RAM last-level-cache architecture to replace DRAM cache

Coding Last Level STT-RAM Cache for High Endurance and Low Power

Architecting the Last-Level Cache for GPUs using STT-RAM Technology

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Architecting HBM as a high bandwidth, high capacity, self-managed last-level cache

PDSW-DISCS '17: Proceedings of the 2nd Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems

ABSTRACT

References

Cited By

Recommendations

Efficient STT-RAM last-level-cache architecture to replace DRAM cache

Coding Last Level STT-RAM Cache for High Endurance and Low Power

Architecting the Last-Level Cache for GPUs using STT-RAM Technology

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media