research-article

Fine-grained DRAM: energy-efficient DRAM for extreme bandwidth systems

Authors:

Niladrish Chatterjee,

Aditya Agrawal,

Stephen W. Keckler,

William J. DallyAuthors Info & Claims

MICRO-50 '17: Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture

Pages 41 - 54

https://doi.org/10.1145/3123939.3124545

Published: 14 October 2017 Publication History

Abstract

Future GPUs and other high-performance throughput processors will require multiple TB/s of bandwidth to DRAM. Satisfying this bandwidth demand within an acceptable energy budget is a challenge in these extreme bandwidth memory systems. We propose a new high-bandwidth DRAM architecture, Fine-Grained DRAM (FGDRAM), which improves bandwidth by 4× and improves the energy efficiency of DRAM by 2× relative to the highest-bandwidth, most energy-efficient contemporary DRAM, High Bandwidth Memory (HBM2). These benefits are in large measure achieved by partitioning the DRAM die into many independent units, called grains, each of which has a local, adjacent I/O. This approach unlocks the bandwidth of all the banks in the DRAM to be used simultaneously, eliminating shared buses interconnecting various banks. Furthermore, the on-DRAM data movement energy is significantly reduced due to the much shorter wiring distance between the cell array and the local I/O. This FGDRAM architecture readily lends itself to leveraging existing techniques to reducing the effective DRAM row size in an area efficient manner, reducing wasteful row activate energy in applications with low locality. In addition, when FGDRAM is paired with a memory controller optimized to exploit the additional concurrency provided by the independent grains, it improves GPU system performance by 19% over an iso-bandwidth and iso-capacity future HBM baseline. Thus, this energy-efficient, high-bandwidth FGDRAM architecture addresses the needs of future extreme-bandwidth memory systems.

References

[1]

M. F. Adams, J. Brown, J. Shalf, B. V. Straalen, E. Strohmaier, and S. Williams. 2014. HPGMG 1.0: A Benchmark for Ranking High Performance Computing Systems. Technical Report. Lawrence Berkley National Laboratory. LBNL-6630E.

[2]

T. Aila and T. Karras. 2010. Architecture Considerations for Tracing Incoherent Rays. In Proceedings of High Performance Graphics.

Digital Library

[3]

M. Andersch, J. Lucas, M. Alvarez-Mesa, and B. Juurlink. 2015. On Latency in GPU Throughput Microarchitectures. In Proceedings of the International Symposium on Performance Analysis of Systems and Software (ISPASS). 169--170.

[4]

M. Burtscher, R. Nasre, and K. Pingali. 2012. A Quantitative Study of Irregular Programs on GPUs. In Proceedings of the International Symposium on Workload Characterization (IISWC). 141 -- 151.

Digital Library

[5]

S. Cha, S. O, H. Shin, S. Hwang, K. Park, S. J. Jang, J. S. Choi, G. Y. Jin, Y. H. Son, H. Cho, J. H. Ahn, and N. S. Kim. 2017. Defect Analysis and Cost-Effective Resilience Architecture for Future DRAM Devices. In Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA).

[6]

N. Chatterjee, M. O'Connor, D. Lee, D. R. Johnson, S. W. Keckler, M. Rhu, and W. J. Dally. 2017. Architecting an Energy-Efficient DRAM System For GPUs. In Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA).

[7]

N. Chatterjee, M. O'Connor, G. H. Loh, N. Jayasena, and R. Balasubramonian. 2014. Managing DRAM Latency Divergence in Irregular GPGPU Applications. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC).

Digital Library

[8]

S. Che, M. Boyer, J. Meng, D. Tarjan, J.W. Sheaffer, S-.H. Lee, and K. Skadron. 2009. Rodinia: A Benchmark Suite for Heterogeneous Computing. In Proceedings of the International Symposium on Workload Characterization (IISWC). 44--54.

Digital Library

[9]

E. Cooper-Balis and B. Jacob. 2010. Fine-Grained Activation for Power Reduction in DRAM. IEEE Micro 30, 3 (May/June 2010), 34--47.

Digital Library

[10]

Coral 2014. CORAL Benchmarks. https://asc.llnl.gov/CORAL-benchmarks/. (2014).

[11]

J. Dongarra and P. Luszczek. 2005. Introduction to the HPCChallenege Benchmark Suite. ICL Technical Report ICL-UT-05--01. (2005).

[12]

H. Ha, A. Pedram, S. Richardson, S. Kvatinsky, and M. Horowitz. 2016. Improving Energy Efficiency of DRAM by Exploiting Half Page Row Access. In Proceedings of the International Symposium on Microarchitecture (MICRO).

[13]

Q. Harvard and R. J. Baker. 2011. A Scalable I/O Architecture for Wide I/O DRAM. In Proceedings of the International Midwest Symposium on Circuits and Systems (MWSCAS).

[14]

M. A Heroux, D. W. Doerfler, Paul S. Crozier, J. M. Wilenbring, H. C. Edwards, A. Williams, M. Rajan, E. R. Keiter, H. K. Thornquist, and R. W. Numrich. 2009. Improving Performance via Mini-applications. Sandia Report SAND 2008--5574. (2009).

[15]

Intel. 2016. An Intro to MCDRAM (High Bandwidth Memory) on Knights Landing. (2016). https://software.intel.com/en-us/blogs/2016/01/20/an-intro-to-mcdram-high-bandwidth-memory-on-knights-landing.

[16]

D. James. 2010. Recent Advances in DRAM Manufacturing. In Proceedings of the SEMI Advanced Semiconductor Manufacturing Conference. 264--269.

[17]

J. Jeddeloh and B. Keeth. 2012. Hybrid Memory Cube - New DRAM Architecture Increases Density and Performance. In Symposium on VLSI Technology.

[18]

JEDEC. 2009. JEDEC Standard JESD212: GDDR5 SGRAM. JEDEC Solid State Technology Association, Virginia, USA.

[19]

JEDEC. 2012. JESD79--4: JEDEC Standard DDR4 SDRAM. JEDEC Solid State Technology Association, Virginia, USA.

[20]

JEDEC. 2013. JEDEC Standard JESD235: High Bandwidth Memory (HBM) DRAM. JEDEC Solid State Technology Association, Virginia, USA.

[21]

JEDEC. 2014. GDDR3 Specific SGRAM Functions in JEDEC Standard JESD21-C: JEDEC Configurations for Solid State Memories. JEDEC Solid State Technology Association, Virginia, USA.

[22]

JEDEC. 2014. JESD209--4: Low Power Double Data Rate 4 (LPDDR4). JEDEC Solid State Technology Association, Virginia, USA.

[23]

JEDEC. 2015. JEDEC Standard JESD235A: High Bandwidth Memory (HBM) DRAM. JEDEC Solid State Technology Association, Virginia, USA.

[24]

JEDEC. 2016. JEDEC Standard JESD232A: Graphics Double Data Rate (GDDR5X) SGRAM Standard. JEDEC Solid State Technology Association, Virginia, USA.

[25]

B. Keeth, R. J. Baker, B. Johnson, and F. Lin. 2008. DRAM Circuit Design - Fundamental and High-Speed Topics. IEEE Press.

Digital Library

[26]

Y. Kim, V. Seshadri, D. Lee, J. Liu, and O. Mutlu. 2012. A Case for Exploiting Subarray-Level Parallelism (SALP) in DRAM. In Proceedings of the International Symposium on Computer Architecture (ISCA). 368--379.

Digital Library

[27]

S. Layton, N. Sakharnykh, and K. Clark. 2015. GPU Implementation of HPGMG-FV. In HPGMG BoF, Supercomputing.

[28]

Y. Lee, H. Kim, S. Hong, S. Hong, and S. Kim. 2017. Partial Row Activation for Low-Power DRAM System. In Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA).

[29]

J. Mohd-Yusof and N. Sakharnykh. 2014. Optimizing CoMD: A Molecular Dynamics Proxy Application Study. In GPU Technology Conference (GTC).

[30]

NVIDIA. 2016. NVIDIA Tesla P100 Whitepaper. (2016). https://images.nvidia.com/content/pdf/tesla/whitepaper/pascal-architecture-whitepaper.pdf.

[31]

NVIDIA 2017. NVIDIA GeForce GTX 1080: Gaming Perfected. (2017). http://international.download.nvidia.com/geforce-com/international/pdfs/GeForce_GTX_1080_Whitepaper_FINAL.pdf.

[32]

S. O, Y. H. Son, N. S. Kim, and J. H. Ahn. 2014. Row-Buffer Decoupling: A Case for Low-Latency DRAM Microarchitecture. In Proceedings of the International Symposium on Computer Architecture (ISCA). 337--348.

Digital Library

[33]

T. Pawlowski. 2011. Hybrid Memory Cube (HMC). In HotChips 23.

[34]

J. Poulton, W. Dally, X. Chen, J. Eyles, T. Greer, S. Tell, J. Wilson, and T. Gray. 2013. A 0.54pJ/b 20Gb/s Ground-Referenced Single-Ended Short-Reach Serial Link in 28nm CMOS for Advanced Packaging Applications. IEEE Journal of Solid-State Circuits 48, 12 (December 2013), 3206--3218.

[35]

M. Rhu, M. Sullivan, J. Leng, and M. Erez. 2013. A Locality-Aware Memory Hierarchy for Energy-Efficient GPU Architectures. In Proceedings of the International Symposium on Microarchitecture (MICRO). 86--98.

Digital Library

[36]

T. Schloesser, F. Jakubowski, J. v. Kluge, A. Graham, S. Selsazeck, M. Popp, P. Baars, K. Muemmler, P. Moll, K. Wilson, A. Buerke, D. Koehler, J. Radecker, E. Erben, U. Zimmerman, T. vorrath, B. Fischer, G. Aichmayr, R. Agaiby, W. Pamler, and T. Scheuster. 2008. A 6f² Buried Wordline DRAM Cell for 40nm and Beyond. In Proceedings of the International Electron Devices Meeting (IEDM). 1--4.

[37]

R. Schmitt, J.-H. Kim, W. Kim, D. Oh, J. Feng, C. Yuan, L. Luo, and J. Wilson. 2008. Analyzing the Impact of Simultaneous Switching Noise on System Margin in Gigabit Single-Ended Memory Systems. In DesignCon.

[38]

Y. H. Son, S. O, H. Yang, D. Jung, J. H. Ahn, J. Kim, J. Kim, and J. W. Lee. 2014. Microbank: Architecting Through-Silicon Interposer-Based Main Memory Systems. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC).

Digital Library

[39]

M. R. Stan and W. P. Burleson. 1995. Bus-Invert Coding for Low-Power I/O. IEEE Transactions on Very Large Scale Integraion (VLSI) Systems 3, 1 (March 1995), 49--58.

Digital Library

[40]

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erha, V. Vanhoucke, and A. Rabinovich. 2015. Going Deeper With Convolutions. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR).

[41]

K. Tran and J. Ahn. 2014. HBM: Memory Solution for High Performance Processors. In Proceedings of MemCon.

[42]

A. N. Udipi, N. Muralimanohar, N. Chatterjee, R. Balasubramonian, A. Davis, and N. Jouppi. 2010. Rethinking DRAM Design and Organization for Energy-Constrained Multi-Cores. In Proceedings of the International Symposium on Computer Architecture (ISCA). 175--186.

Digital Library

[43]

O. Villa, D. R. Johnson, M. O'Connor, E. Bolotin, D. Nellans, J. Luitjens, N. Sakharnykh, P. Wang, P. Micikevicius, A. Scudiero, S. W. Keckler, and W. J. Dally. 2014. Scaling the Power Wall: A Path to Exascale. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC).

Digital Library

[44]

T. Vogelsang. 2010. Understanding the Energy Consumption of Dynamic Random Access Memories. In Proceedings of the International Symposium on Microarchitecture (MICRO). 363--374.

Digital Library

[45]

T. Zhang, K. Chen, C. Xu, G. Sun, T. Wang, and Y. Xie. 2014. Half-DRAM: a High-bandwdith and Low-power DRAM System from the Rethinking of Fine-grained Activation. In Proceedings of the International Symposium on Computer Architecture (ISCA). 349--360.

Digital Library

Cited By

Olgun ABostanci FFrancisco de Oliveira Junior GTugrul YBera RYaglikci AHassan HErgin OMutlu O(2024)Sectored DRAM: A Practical Energy-Efficient and High-Performance Fine-Grained DRAM ArchitectureACM Transactions on Architecture and Code Optimization10.1145/3673653Online publication date: 14-Jun-2024
https://doi.org/10.1145/3673653
Du HQin YChen SKang Y(2024)FASA-DRAM: Reducing DRAM Latency with Destructive Activation and Delayed RestorationACM Transactions on Architecture and Code Optimization10.1145/364945521:2(1-27)Online publication date: 21-May-2024
https://dl.acm.org/doi/10.1145/3649455
Park JChoi JKyung KKim MKwon YKim NAhn JTsafrir DMusuvathi MGupta RAbu-Ghazaleh N(2024)AttAcc! Unleashing the Power of PIM for Batched Transformer-based Generative Model InferenceProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640422(103-119)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3620665.3640422
Show More Cited By

Index Terms

Fine-grained DRAM: energy-efficient DRAM for extreme bandwidth systems

Recommendations

Sectored DRAM: A Practical Energy-Efficient and High-Performance Fine-Grained DRAM Architecture
Modern computing systems access data in main memory at coarse granularity (e.g., at 512-bit cache block granularity). Coarse-grained access leads to wasted energy because the system does not use all individually accessed small portions (e.g., words, each ...
Power management of hybrid DRAM/PRAM-based main memory
DAC '11: Proceedings of the 48th Design Automation Conference

Hybrid main memory consisting of DRAM and non-volatile memory is attractive since the non-volatile memory can give the advantage of low standby power while DRAM provides high performance and better active power. In this work, we address the power ...
SRAM-DRAM hybrid memory with applications to efficient register files in fine-grained multi-threading
ISCA '11

Large register files are common in highly multi-threaded architectures such as GPUs. This paper presents a hybrid memory design that tightly integrates embedded DRAM into SRAM cells with a main application to reducing area and power consumption of multi-...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MICRO-50 '17: Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture

October 2017

850 pages

ISBN:9781450349529

DOI:10.1145/3123939

General Chairs:
Hillery Hunter
IBM Research
,
Jaime Moreno
IBM Research
,
Program Chairs:
Joel Emer
NVIDIA and MIT
,
Daniel Sanchez
MIT

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing
IEEE-CS\DATC: IEEE Computer Society

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 October 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MICRO-50

Sponsor:

SIGMICRO
IEEE-CS\DATC

MICRO-50: The 50th Annual IEEE/ACM International Symposium on Microarchitecture

October 14 - 18, 2017

Massachusetts, Cambridge

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

126
Total Citations
View Citations
1,809
Total Downloads

Downloads (Last 12 months)525
Downloads (Last 6 weeks)66

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Olgun ABostanci FFrancisco de Oliveira Junior GTugrul YBera RYaglikci AHassan HErgin OMutlu O(2024)Sectored DRAM: A Practical Energy-Efficient and High-Performance Fine-Grained DRAM ArchitectureACM Transactions on Architecture and Code Optimization10.1145/3673653Online publication date: 14-Jun-2024
https://doi.org/10.1145/3673653
Du HQin YChen SKang Y(2024)FASA-DRAM: Reducing DRAM Latency with Destructive Activation and Delayed RestorationACM Transactions on Architecture and Code Optimization10.1145/364945521:2(1-27)Online publication date: 21-May-2024
https://dl.acm.org/doi/10.1145/3649455
Park JChoi JKyung KKim MKwon YKim NAhn JTsafrir DMusuvathi MGupta RAbu-Ghazaleh N(2024)AttAcc! Unleashing the Power of PIM for Batched Transformer-based Generative Model InferenceProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640422(103-119)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3620665.3640422
Davies MMcDougall IAnandaraj SMachchhar DJain RSankaralingam KTsafrir DMusuvathi MGupta RAbu-Ghazaleh N(2024)A Journey of a 1,000 Kernels Begins with a Single Step: A Retrospective of Deep Learning on GPUsProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640367(20-36)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3620665.3640367
Lyu DLi ZChen YWang GHe WXu NHe G(2024)FLNA: Flexibly Accelerating Feature Learning Networks for Large-Scale Point Clouds With Efficient Dataflow DecouplingIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2024.335512632:4(739-751)Online publication date: 30-Jan-2024
https://dl.acm.org/doi/10.1109/TVLSI.2024.3355126
Gao CDelbruck TLiu S(2024)Spartus: A 9.4 TOp/s FPGA-Based LSTM Accelerator Exploiting Spatio-Temporal SparsityIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.318020935:1(1098-1112)Online publication date: Jan-2024
https://doi.org/10.1109/TNNLS.2022.3180209
Afifi SThakkar IPasricha S(2024)ARTEMIS: A Mixed Analog-Stochastic In-DRAM Accelerator for Transformer Neural NetworksIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.344671943:11(3336-3347)Online publication date: Nov-2024
https://doi.org/10.1109/TCAD.2024.3446719
Singh SSingh SGudaparthi SFan XBalasubramonian R(2024)Hyena: Balancing Packing, Reuse, and Rotations for Encrypted Inference2024 IEEE Symposium on Security and Privacy (SP)10.1109/SP54263.2024.00107(3091-3108)Online publication date: 19-May-2024
https://doi.org/10.1109/SP54263.2024.00107
Kim SKim JChoi JAhn J(2024)CiFHER: A Chiplet-Based FHE Accelerator with a Resizable Structure2024 International Symposium on Secure and Private Execution Environment Design (SEED)10.1109/SEED61283.2024.00022(119-130)Online publication date: 16-May-2024
https://doi.org/10.1109/SEED61283.2024.00022
Masciari ENapolitano E(2024)The Environmental Cost of High Performance Computing System Simulation2024 32nd Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)10.1109/PDP62718.2024.00048(289-292)Online publication date: 20-Mar-2024
https://doi.org/10.1109/PDP62718.2024.00048
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten