skip to main content
10.1145/1736020.1736064acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

Virtualized and flexible ECC for main memory

Published: 13 March 2010 Publication History

Abstract

We present a general scheme for virtualizing main memory error-correction mechanisms, which map redundant information needed to correct errors into the memory namespace itself. We rely on this basic idea, which increases flexibility to increase error protection capabilities, improve power efficiency, and reduce system cost; with only small performance overheads. We augment the virtual memory system architecture to detach the physical mapping of data from the physical mapping of its associated ECC information. We then use this mechanism to develop two-tiered error protection techniques that separate the process of detecting errors from the rare need to also correct errors, and thus save energy. We describe how to provide strong chipkill and double-chip kill protection using existing DRAM and packaging technology. We show how to maintain access granularity and redundancy overheads, even when using ×8 DRAM chips. We also evaluate error correction for systems that do not use ECC DIMMs. Overall, analysis of demanding SPEC CPU 2006 and PARSEC benchmarks indicates that performance overhead is only 1% with ECC DIMMs and less than 10% using standard Non-ECC DIMM configurations, that DRAM power savings can be as high as 27%, and that the system energy-delay product is improved by 12% on average.

References

[1]
Calculating memory system power for DDR2. Technical Report TN-47-04, Micron Technology, 2005.
[2]
N. Aggarwal, J. E. Smith, K. K. Saluja, N. P. Jouppi, and P. Ranganathan. Implementing high availability memory with a duplication cache. In Proc. the 41st IEEE/ACM Int'l Symp. Microarchitecture (MICRO), Nov. 2008.
[3]
J. H. Ahn, N. P. Jouppi, C. Kozyrakis, J. Leverich, and R. S. Schreiber. Future scaling of processor-memmory interfaces. In Proc. the Int'l Conf. High Performance Computing, Networking, Storage and Analysis (SC), Nov. 2009.
[4]
J. H. Ahn, J. Leverich, R. Schreiber, and N. P. Jouppi. Multicore DIMM: An energy efficient memory module with independently controlled DRAMs. IEEE Computer Architecture Letters, 8(1):5--8, Jan. -- Jun. 2009.
[5]
AMD. BIOS and kernel developer's guide for AMD NPT family 0Fh processors, Jul. 2007. URL http://support.amd.com/us/Processor_TechDocs/32559.pdf.
[6]
S. Ankireddi and T. Chen. Challenges in thermal management of memory modules. URL http://electronics-cooling.com/html/2008_feb_a3.php.
[7]
C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: Characterization and architectural implications. Technical Report TR-811-08, Princeton Univ., Jan. 2008.
[8]
D. Brooks, V. Tiwari, and M. Martonosi. Wattch: A framework for architectural-level power analysis and optimizations. In Proc. the 27th Ann. Int'l Sump. Computer Architecure (ISCA), Jun. 2000.
[9]
C. L. Chen. Symbol error correcting codes for memory applications. In Proc. the 26th Ann. Int'l Symp. Fault-Tolerant Computing (FTCS), Jun. 1996.
[10]
C. L. Chen and M. Y. Hsiao. Error-correcting codes for semiconductor memory applications: A state-of-the-art review. IBM J. Research and Development, 28: 124--134, Mar. 1984.
[11]
R. Danilak. Transparent error correction code memory system and method. US Patent, US 7,117,421, Oct. 2006.
[12]
T. J. Dell. A white paper on the benefits of chipkill-correct ECC for PC server main memory. IBM Microelectronics Division, Nov. 1997.
[13]
T. J. Dell. System RAS implications of DRAM soft errors. IBM J. Research and Development, 52(3):307--314, 2008.
[14]
Earl Joseph II. GUPS (giga-updates per second) benchmark. URL http://www.dgate.org/~brg/files/dis/gups/.
[15]
M. J. Haertel, R. S. Polzin, A. Kocev, and M. B. Steinman. ECC implementation in non-ECC components. US Patent Pending, Serial No. 725,922, Sep. 2008.
[16]
G. Hamerly, E. Perelman, J. Lau, and B. Calder. SimPoint 3.0: Faster and more exible program analysis. In Proc. the Workshop on Modeling, Benchmarking and Simulation, Jun. 2005.
[17]
R. W. Hamming. Error correcting and error detecting codes. Technical J., 29:147--160, Apr. 1950.
[18]
HP. Server power calculators. comconfigurator/powercalcs.asp.
[19]
Bell System URL http://h30099.www3.hp.
[20]
M. Y. Hsiao. A class of optimal minimum odd-weight-column SEC-DED codes. IBM J. Research and Development, 14:395--301, 1970.
[21]
IBM. Enhancing IBM Netfinity server reliability, 1999.
[22]
B. Jacob, S. Ng, and D. Wang. Memory Systems: Cache, DRAM, Disk. Morgan Kaufmann, 2007.
[23]
R. Kuppuswamy, S. R. Sawant, S. Balasubramanian, P. Kaushik, N. Natarajan,
[24]
and J. D. Gilbert. Over one million TPCC with a 45nm 6-core Xeon CPU. In Proc. Int'l Solid State Circuits Conf. (ISSCC), Feb. 2009.
[25]
H.-H. S. Lee, G. S. Tyson, and M. K. Farrens. Eager writeback -- a technique for improving bandwidth utilization. In Proc. the 33rd IEEE/ACM Int'l Symp. Microarchitecture (MICRO), Nov.-Dec. 2000.
[26]
K. Lim, P. Ranganathan, J. Chang, C. Patel, T. Mudge, and S. Reinhardt. Understanding and designing new server architectures for emerging warehouse-computing environments. In Proc. the 35th Ann. Int'l Symp. Computer Architecture (ISCA), Jun. 2008.
[27]
S. Lin and D. J. C. Jr. Error Control Coding: Fundamentals and Applications. Prentice-Hall, Inc., Englewood Cliffs, NJ, 1983.
[28]
C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. PIN: Building customized program analysis tools with dynamic instrumentation. In Proc. the ACM SIGPLAN Conf. Programming Language Design and Implementation (PLDI), Jun. 2005.
[29]
P. S. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hallberg, J. Hog-berg, F. Larsson, A. Moestedt, and B. Werner. SIMICS: A full system simulation platform. IEEE Computer, 35:50--58, Feb. 2002.
[30]
M. M. K. Martin, D. J. Sorin, B. M. Beckmann, M. R. Marty, M. Xu, A. R. Alameldeen, K. E. Moore, M. D. Hill, and D. A. Wood. Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset. SIGARCH Computer Architecture News (CAN), 33:92--99, Nov. 2005.
[31]
J. D. McCalpin. STREAM: Sustainable memory bandwidth in high performance computers. URL http://www.cs.virginia.edu/stream/.
[32]
U. Nawathe, M.Hassan, L. Warriner, K. Yen, B. Upputuri, D.Greenhill, A.Kumar, and H. Park. An 8-core, 64-thread, 64-bit, power efficient SPARC SoC. In Proc. the Int'l Solid State Circuits Conf. (ISSCC), Feb. 2007.
[33]
NVIDIA. Fermi architecture. fermi_architecture.html. http://www.nvidia.com/object/
[34]
I. S. Reed and G. Solomon. Polynomial codes over certain finite fields. J. Soc. for Industrial and Applied Math., 8:300--304, Jun. 1960.
[35]
N. N. Sadler and D. J. Sorin. Choosing an error protection scheme for a microprocessor's L1 data cache. In Proc. the Int'l Conf. Computer Design (ICCD), Oct. 2006.
[36]
B. Schroeder, E. Pinheiro, and W.-D. Weber. DRAM errors in the wild: A large-scale field study. In Proc. the 11th Int'l Joint Conf. Measurement and Modeling of Computer Systems (SIGMETRICS), Jun. 2009.
[37]
A. Silberschatz, P. B. Galvin, and G. Gagne. Operating System Concepts. Wiley, Dec. 2004.
[38]
C. Slayman. Impact of error correction code and dynamic memory reconfiguration on high-reliability/low-cost server memory. In Proc. IEEE Int'l Integrated Reliability Workshop (IIRW), Oct. 2006.
[39]
Standard Performance Evaluation Corporation. SPEC CPU 2006, 2006. URL http://www.spec.org/cpu2006/.
[40]
J. Standards. JESD 79-2e DDR2 SDRAM specification, 2008.
[41]
J. Standards. JESD 79-3b DDR3 SDRAM specification, 2008.
[42]
OpenSPARC T2 System-On-Chip (SOC) Microarchitecture Specification. Sun Microsystems Inc., May 2008.
[43]
UltraSPARC R III Cu. Sun Microsystems Inc., Jan. 2004.
[44]
M. Talluri and M. D. Hill. Surpassing the TLB performance of superpages with less operating system support. In Proc. the 6th Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS), Oct. 1994.
[45]
S. Thoziyoor, N. Muralimanohar, J. H. Ahn, and N. P. Jouppi. CACTI 5.1. Technical report, HP Laboratories, Apr. 2008.
[46]
Violin Memory Inc. Scalable memory applicance. violin-memory.com/DRAM.
[47]
D. Wang, B. Ganesh, N. Tuaycharoen, K. Baynes, A. Jaleel, and B. Jacob. DRAMsim: A memory-system simulator. SIGARCH Computer Architecture News (CAN), 33:100--107, Sep. 2005.
[48]
P. M. Wells, K. Chakraborty, and G. S. Sohi. Mixed-mode multicore reliability. In Proc. the 14th Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS), Mar. 2009.
[49]
E. Witchel, J. Cates, and K. Asanovic. Mondrian memory protection. In Proc. the 10th Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS), Oct. 2002.
[50]
D. H. Yoon and M. Erez. Flexible cache error protection using an ECC FIFO. In Proc. the Int'l Conf. High Performance Computing, Networking, Storage, and Analysis (SC), Nov. 2009.
[51]
D. H. Yoon and M. Erez. Memory mapped ECC: Low-cost error protection for last level caches. In Proc. the 36th Ann. Int'l Symp. Computer Architecture (ISCA), Jun. 2009.
[52]
Z. Zhang, Z. Zhu, and X. Zhang. A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality. In Proc. the 33rd IEEE/ACM Int'l Symp. Microarchitecture (MICRO), Dec. 2000.
[53]
H. Zheng, J. Lin, Z. Zhang, E. Gorbatov, H. David, and Z. Zhu. Mini-rank: Adaptive DRAM architecture for improving memory power efficiency. In Proc. the 41st IEEE/ACM Int'l Symp. Microarchitecture (MICRO), Nov. 2008.

Cited By

View all
  • (2024)Counter-light Memory Encryption2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00058(724-738)Online publication date: 29-Jun-2024
  • (2024)Rethinking the Producer-Consumer Relationship in Modern DRAM-Based SystemsIEEE Access10.1109/ACCESS.2024.351437712(196207-196239)Online publication date: 2024
  • (2024)Fault Tolerant ArchitecturesHandbook of Computer Architecture10.1007/978-981-97-9314-3_11(277-320)Online publication date: 21-Dec-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ASPLOS XV: Proceedings of the fifteenth International Conference on Architectural support for programming languages and operating systems
March 2010
422 pages
ISBN:9781605588391
DOI:10.1145/1736020
  • General Chair:
  • James C. Hoe,
  • Program Chair:
  • Vikram S. Adve
  • cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 45, Issue 3
    ASPLOS '10
    March 2010
    399 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/1735971
    Issue’s Table of Contents
  • cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 38, Issue 1
    ASPLOS '10
    March 2010
    399 pages
    ISSN:0163-5964
    DOI:10.1145/1735970
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 March 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. error correction
  2. fault tolerance
  3. memory systems
  4. reliability

Qualifiers

  • Research-article

Conference

ASPLOS '10

Acceptance Rates

ASPLOS XV Paper Acceptance Rate 32 of 181 submissions, 18%;
Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)85
  • Downloads (Last 6 weeks)5
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Counter-light Memory Encryption2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00058(724-738)Online publication date: 29-Jun-2024
  • (2024)Rethinking the Producer-Consumer Relationship in Modern DRAM-Based SystemsIEEE Access10.1109/ACCESS.2024.351437712(196207-196239)Online publication date: 2024
  • (2024)Fault Tolerant ArchitecturesHandbook of Computer Architecture10.1007/978-981-97-9314-3_11(277-320)Online publication date: 21-Dec-2024
  • (2023)Structural Coding: A Low-Cost Scheme to Protect CNNs from Large-Granularity Memory FaultsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607084(1-17)Online publication date: 12-Nov-2023
  • (2023)Unity ECC: Unified Memory Protection Against Bit and Chip ErrorsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607081(1-16)Online publication date: 12-Nov-2023
  • (2023)Memory Controller with Adaptive ECC for Reliable System Operation2023 36th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)10.1109/SBCCI60457.2023.10261959(1-6)Online publication date: 28-Aug-2023
  • (2023)Review of Memory RAS for Data CentersIEEE Access10.1109/ACCESS.2023.332998411(124782-124796)Online publication date: 2023
  • (2023)Fault Tolerant ArchitecturesHandbook of Computer Architecture10.1007/978-981-15-6401-7_11-1(1-44)Online publication date: 17-Feb-2023
  • (2021)DvéProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00048(526-539)Online publication date: 14-Jun-2021
  • (2021)CARE: Coordinated Augmentation for Elastic Resilience on DRAM Errors in Data Centers2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00052(533-544)Online publication date: Feb-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media