skip to main content
10.1145/1815961.1815983acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

Rethinking DRAM design and organization for energy-constrained multi-cores

Published: 19 June 2010 Publication History

Abstract

DRAM vendors have traditionally optimized the cost-per-bit metric, often making design decisions that incur energy penalties. A prime example is the overfetch feature in DRAM, where a single request activates thousands of bit-lines in many DRAM chips, only to return a single cache line to the CPU. The focus on cost-per-bit is questionable in modern-day servers where operating costs can easily exceed the purchase cost. Modern technology trends are also placing very different demands on the memory system: (i)queuing delays are a significant component of memory access time, (ii) there is a high energy premium for the level of reliability expected for business-critical computing, and (iii) the memory access stream emerging from multi-core systems exhibits limited locality. All of these trends necessitate an overhaul of DRAM architecture, even if it means a slight compromise in the cost-per-bit metric.
This paper examines three primary innovations. The first is a modification to DRAM chip microarchitecture that re tains the traditional DDRx SDRAMinterface. Selective Bit-line Activation (SBA) waits for both RAS (row address) and CAS (column address) signals to arrive before activating exactly those bitlines that provide the requested cache line. SBA reduces energy consumption while incurring slight area and performance penalties. The second innovation, Single Subarray Access (SSA), fundamentally re-organizes the layout of DRAM arrays and the mapping of data to these arrays so that an entire cache line is fetched from a single subarray. It requires a different interface to the memory controller, reduces dynamic and background energy (by about 6X), incurs a slight area penalty (4%), and can even lead to performance improvements (54% on average) by reducing queuing delays. The third innovation further penalizes the cost-per-bit metric by adding a checksum feature to each cache line. This checksum error-detection feature can then be used to build stronger RAID-like fault tolerance, including chipkill-level reliability. Such a technique is especially crucial for the SSA architecture where the entire cache line is localized to a single chip. This DRAM chip microarchitectural change leads to a dramatic reduction in the energy and storage overheads for reliability. The proposed architectures will also apply to other emerging memory technologies (such as resistive memories) and will be less disruptive to standards, interfaces, and the design flow if they can be incorporated into first-generation designs.

References

[1]
CACTI: An Integrated Cache and Memory Access Time, Cycle Time, Area, Leakage, and Dynamic Power Model. http://www.hpl.hp.com/research/cacti/.
[2]
HP Advanced Memory Protection Technologies - Technology Brief. http://www.hp.com.
[3]
Micron System Power Calculator. http://www.micron.com/support/part info/powercalc.
[4]
STREAM - Sustainable Memory Bandwidth in High Performance Computers. http://www.cs.virginia.edu/stream/.
[5]
Virtutech Simics Full System Simulator. http://www.virtutech.com.
[6]
M. Abbott et al. Durable Memory RS/6000 System Design. In Proceedings of International Symposium on Fault-Tolerant Computing, 1994.
[7]
J. Ahn, J. Leverich, R. S. Schreiber, and N. Jouppi. Multicore DIMM: an Energy Efficient Memory Module with Independently Controlled DRAMs. IEEE Computer Architecture Letters, vol.7(1), 2008.
[8]
J. H. Ahn, N. P. Jouppi, C. Kozyrakis, J. Leverich, and R. S. Schreiber. Future Scaling of Processor-Memory Interfaces. In Proceedings of SC, 2009.
[9]
D. Bailey et al. The NAS Parallel Benchmarks. International Journal of Supercomputer Applications, 5(3):63--73, Fall 1991.
[10]
L. Barroso. The Price of Performance. Queue, 3(7):48--53, 2005.
[11]
L. Barroso and U. Holzle. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines. Morgan & Claypool, 2009.
[12]
S. Beamer et al. Re-Architecting DRAM Memory Systems with Monolithically Integrated Silicon Photonics. In Proceedings of ISCA, 2010.
[13]
C. Benia, S. Kumar, J. P. Singh, and K. Li. The PARSEC Benchmark Suite: Characterization and Architectural Implications. Technical report, Department of Computer Science, Princeton University, 2008.
[14]
P. Burns et al. Dynamic Tracking of Page Miss Ratio Curve for Memory Management. In Proceedings of ASPLOS, 2004.
[15]
V. Cuppu and B. Jacob. Concurrency, Latency, or System Overhead: Which Has the Largest Impact on Uniprocessor DRAM-System Performance. In Proceedings of ISCA, 2001.
[16]
V. Delaluz et al. DRAM Energy Management Using Software and Hardware Directed Power Mode Control. In Proceedings of HPCA, 2001.
[17]
V. Delaluz et al. Scheduler-based DRAM Energy Management. In Proceedings of DAC, 2002.
[18]
T. J. Dell. A Whitepaper on the Benefits of Chipkill-Correct ECC for PC Server Main Memory. Technical report, IBM Microelectronics Division, 1997.
[19]
X. Fan, H. Zeng, and C. Ellis. Memory Controller Policies for DRAM Power Management. In Proceedings of ISLPED, 2001.
[20]
J. L. Hennessy and D. A. Patterson. Computer Architecture: A Quantitative Approach. Elsevier, 4th edition, 2007.
[21]
H. Huang, P. Pillai, and K. G. Shin. Design And Implementation Of Power-Aware Virtual Memory. In Proceedings Of The Annual Conference On Usenix Annual Technical Conference, 2003.
[22]
H. Huang, K. Shin, C. Lefurgy, and T. Keller. Improving Energy Efficiency by Making DRAM Less Randomly Accessed. In Proceedings of ISLPED, 2005.
[23]
I. Hur and C. Lin. A Comprehensive Approach to DRAM Power Management. In Proceedings of HPCA, 2008.
[24]
E. Ipek, O. Mutlu, J. Martinez, and R. Caruana. Self Optimizing Memory Controllers: A Reinforcement Learning Approach. In Proceedings of ISCA, 2008.
[25]
K. Itoh. VLSI Memory Chip Design. Springer, 2001.
[26]
ITRS. International Technology Roadmap for Semiconductors, 2007 Edition. http://www.itrs.net/Links/2007ITRS/Home2007.htm.
[27]
B. Jacob, S. W. Ng, and D. T. Wang. Memory Systems - Cache, DRAM, Disk. Elsevier, 2008.
[28]
M. Kumanoya et al. An Optimized Design for High-Performance Megabit DRAMs. Electronics and Communications in Japan, 72(8), 2007.
[29]
O. La. SDRAM having posted CAS function of JEDEC standard, 2002. United States Patent, Number 6483769.
[30]
A. Lebeck, X. Fan, H. Zeng, and C. Ellis. Power Aware Page Allocation. In Proceedings of ASPLOS, 2000.
[31]
C. Lee, O. Mutlu, V. Narasiman, and Y. Patt. Prefetch-Aware DRAM Controllers. In Proceedings of MICRO, 2008.
[32]
C. Lefurgy et al. Energy management for commercial servers. IEEE Computer, 36(2):39--48, 2003.
[33]
K. Lim et al. Understanding and Designing New Server Architectures for Emerging Warehouse-Computing Environments. In Proceedings of ISCA, 2008.
[34]
K. Lim et al. Disaggregated Memory for Expansion and Sharing in Blade Servers. In Proceedings of ISCA, 2009.
[35]
D. Locklear. Chipkill Correct Memory Architecture. Technical report, Dell, 2000.
[36]
G. Loh. 3D-Stacked Memory Architectures for Multi-Core Processors. In Proceedings of ISCA, 2008.
[37]
D. Meisner, B. Gold, and T. Wenisch. PowerNap: Eliminating Server Idle Power. In Proceedings of ASPLOS, 2009.
[38]
Micron Technology Inc. Micron DDR2 SDRAM Part MT47H256M8, 2006.
[39]
N. Muralimanohar, R. Balasubramonian, and N. Jouppi. Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0. In Proceedings of MICRO, 2007.
[40]
O. Mutlu and T. Moscibroda. Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors. In Proceedings of MICRO, 2007.
[41]
O. Mutlu and T. Moscibroda. Parallelism-Aware Batch Scheduling: Enhancing Both Performance and Fairness of Shared DRAM Systems. In Proceedings of ISCA, 2008.
[42]
U. Nawathe et al. An 8-Core 64-Thread 64b Power-Efficient SPARC SoC. In Proceedings of ISSCC, 2007.
[43]
V. Pandey, W. Jiang, Y. Zhou, and R. Bianchini. DMA-Aware Memory Energy Management. In Proceedings of HPCA, 2006.
[44]
B. Rogers et al. Scaling the Bandwidth Wall: Challenges in and Avenues for CMP Scaling. In Proceedings of ISCA, 2009.
[45]
V. Romanchenko. Quad-Core Opteron: Architecture and Roadmaps. http://www.digital-daily.com/cpu/quad core opteron.
[46]
B. Schroeder, E. Pinheiro, and W. Weber. DRAM Errors in the Wild: A Large-Scale Field Study. In Proceedings of SIGMETRICS, 2009.
[47]
K. Sudan, N. Chatterjee, D. Nellans, M. Awasthi, R. Balasubramonian, and A. Davis. Micro-Pages: Increasing DRAM Efficiency with Locality-Aware Data Placement. In Proceedings of ASPLOS-XV, 2010.
[48]
R. Swinburne. Intel Core i7 - Nehalem Architecture Dive. http://www.bit-tech.net/hardware/2008/11/03/intel-core-i7-nehalem-architecture-dive/.
[49]
S. Thoziyoor, N. Muralimanohar, and N. Jouppi. CACTI 5.0. Technical report, HP Laboratories, 2007.
[50]
U.S. Environmental Protection Agency - Energy Star Program. Report To Congress on Server and Data Center Energy Efficiency - Public Law 109-431, 2007.
[51]
D. Vantrease et al. Corona: System Implications of Emerging Nanophotonic Technology. In Proceedings of ISCA, 2008.
[52]
D. Wang et al. DRAMsim: A Memory-System Simulator. In SIGARCH Computer Architecture News, volume 33, September 2005.
[53]
F. A. Ware and C. Hampel. Improving Power and Data Efficiency with Threaded Memory Modules. In Proceedings of ICCD, 2006.
[54]
D. Wentzlaff et al. On-Chip Interconnection Architecture of the Tile Processor. In IEEE Micro, volume 22, 2007.
[55]
D. Yoon and M. Erez. Virtualized and Flexible ECC for Main Memory. In Proceedings of ASPLOS, 2010.
[56]
H. Zheng et al. Mini-Rank: Adaptive DRAM Architecture For Improving Memory Power Efficiency. In Proceedings of MICRO, 2008.

Cited By

View all
  • (2025)Optimizing Bandwidth Utilization Through Word Based Compression in Main Memories2025 38th International Conference on VLSI Design and 2024 23rd International Conference on Embedded Systems (VLSID)10.1109/VLSID64188.2025.00029(91-96)Online publication date: 4-Jan-2025
  • (2024)A Highly Parallel DRAM Architecture to Mitigate Large Access Latency and Improve Energy Efficiency of Modern DRAM SystemsIEEE Access10.1109/ACCESS.2024.351217612(182998-183023)Online publication date: 2024
  • (2023)Rethinking DRAM's Page Mode With STT-MRAMIEEE Transactions on Computers10.1109/TC.2022.320713172:5(1503-1517)Online publication date: 1-May-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ISCA '10: Proceedings of the 37th annual international symposium on Computer architecture
June 2010
520 pages
ISBN:9781450300537
DOI:10.1145/1815961
  • cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 38, Issue 3
    ISCA '10
    June 2010
    508 pages
    ISSN:0163-5964
    DOI:10.1145/1816038
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

  • IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 June 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. chipkill
  2. dram architecture
  3. energy-efficiency
  4. locality
  5. subarrays

Qualifiers

  • Research-article

Conference

ISCA '10
Sponsor:

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)141
  • Downloads (Last 6 weeks)20
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Optimizing Bandwidth Utilization Through Word Based Compression in Main Memories2025 38th International Conference on VLSI Design and 2024 23rd International Conference on Embedded Systems (VLSID)10.1109/VLSID64188.2025.00029(91-96)Online publication date: 4-Jan-2025
  • (2024)A Highly Parallel DRAM Architecture to Mitigate Large Access Latency and Improve Energy Efficiency of Modern DRAM SystemsIEEE Access10.1109/ACCESS.2024.351217612(182998-183023)Online publication date: 2024
  • (2023)Rethinking DRAM's Page Mode With STT-MRAMIEEE Transactions on Computers10.1109/TC.2022.320713172:5(1503-1517)Online publication date: 1-May-2023
  • (2023)Cache Bank-Aware Denial-of-Service Attacks on Multicore ARM Processors2023 IEEE 29th Real-Time and Embedded Technology and Applications Symposium (RTAS)10.1109/RTAS58335.2023.00023(198-208)Online publication date: May-2023
  • (2023)CoolDRAM: An Energy-Efficient and Robust DRAM2023 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)10.1109/ISLPED58423.2023.10244464(1-6)Online publication date: 7-Aug-2023
  • (2023)High-Performance and Power-Saving Mechanism for Page Activations Based on Full Independent DRAM Sub-Arrays in Multi-Core SystemsIEEE Access10.1109/ACCESS.2023.329984811(79801-79822)Online publication date: 2023
  • (2023)Variation aware power management for GPU memoriesMicroprocessors & Microsystems10.1016/j.micpro.2022.10471196:COnline publication date: 1-Feb-2023
  • (2021)SAM: Accelerating Strided Memory AccessesMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480091(324-336)Online publication date: 18-Oct-2021
  • (2021)QUAC-TRNG: High-Throughput True Random Number Generation Using Quadruple Row Activation in Commodity DRAM Chips2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA52012.2021.00078(944-957)Online publication date: Jun-2021
  • (2021)PF-DRAM: A Precharge-Free DRAM Structure2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA52012.2021.00019(126-138)Online publication date: Jun-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media