skip to main content
10.1145/1006209.1006212acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
Article

Enhancing data cache reliability by the addition of a small fully-associative replication cache

Published: 26 June 2004 Publication History

Abstract

Soft error conscious cache design is a necessity for reliable computing. ECC or parity-based integrity checking technique in use today either compromises performance for reliability or vice versa, and the N modular redundancy (NMR) scheme is too costly for microprocessors and applications with stringent cost constraint. This paper proposes a novel and cost-effective solution to enhance data reliability with minimum impact on performance. The idea is to add a small fully-associative cache to store the replica(s) of every write to the L1 data cache. The replicas can be used to detect and correct soft errors. The replication cache can also be used to increase performance by reducing the L1 data cache miss rate. Our experiments show that more than 97% read hits of the L1 data cache can find replicas available in a replication cache of 8 blocks.

References

[1]
J. Karlsson, P. Ledan, P. Dahlgren, and R. Johansson. Using heavy-ion radiation to validate fault handling mechanisms. IEEE Micro, 14(1):8--23, February 1994.
[2]
J. Sosnowski. Transient fault tolerance in digital systems. IEEE Micro, 14(1):24--35, February 1994.
[3]
S. Kim and A. Somani. Area efficient architectures for information integrity checking in cache memories. Proceedings of International Symposium on Computer Architecture, May 1999, pp. 246--256.
[4]
P.Shivakumar, M. Kistler, S. Keckler, D. Burger and L. Alvisi. Modeling the effect of technology trends on soft error rate of combinational logic. Proceedings of the International Conference on Dependable Systems and Networks, June, 2002.
[5]
J. Ray, J. C Hoe and B. Falsafi. Dual use of superscalar datapath for transient-fault detection and recovery. MICRO, December, 2001.
[6]
P. Sweazey. SRAM organization, control, and speed, and their effect on cache memory design. Midcon/87, pages 434--437, Septembe, 1987.
[7]
H. Imai. Essentials of error-control coding techniques. Academic Press, San Diego, CA, 1990.
[8]
C. L. Chen and M. Y Hsiao. Error-correcting codes for semiconductor memory applications: a state of the art review. In Reliable Computer Systems - Design and Evaluation, pages 771--786, Digital Press, 2nd edition, 1992.
[9]
W. Zhang, S. Gurumurthi, M. kandemir and A. Sivasubramaniam. ICR: in-cache replication for enhancing data cache reliability. Proceedings of DSN, 2003.
[10]
M. Hamada and E. Fujiwara. A class of error control codes for byte organized memory system-SbEC-(Sb+S)ED codes. IEEE Trans. on Computers, 46(1):105--110, January 1997.
[11]
S. Park and B. Bose. Burst asymmetric/unidirectional error correcting/detecting codes. Proceeding of International Symposium on Fault-Tolerant Computing, pages 273--280, June 1990.
[12]
E. Rotenburg. AR-SMT: A microarchitectural approach to fault tolerance in microprocessors. In Proceedings of the 29th International Symposium on Fault-Tolerant Computing Systems, June 1999.
[13]
S. K. Reinhardt and S. S. Mukherjee. Transient fault detection via simultaneous multithreading. In Proceedings of the 27th International Symposium on Computer Architecture, June 2000.
[14]
T. Austin. DIVA: a reliable substrate for deep submicron microarchitecture design. In Proceedings of the 32th International Symposium on Microarchitecture, Nov 1999.
[15]
V. Degalahal, N. Vijaykrishnan, M. J Irwin, "Analyzing Soft Errors in Leakage Optimized SRAM Design", In the Proceedings of VLSI Design conference January 2003.
[16]
Understanding Soft and Firm Errors in Semiconductor Devices. Actel Whitepaper, 2002.
[17]
A.J. Smith. Cache memories. Computing Surveys, Vol. 14, No. 3, September 1982.
[18]
J. Hennessy and D. Patterson. Computer architecture: a quantitative approach. Morgan Kaufmann Publishers, 1995.
[19]
N. P. Jouppi. Improving direct-mapped cache performance by the audition of a small fully-associative cache and prefetch buffers. In Proceedings of the 17th Annual International Symposium on Computer Architecture, May, 1990.
[20]
http://www.simplescalar.com.
[21]
http://www.spec.org.
[22]
AMD Athlon 64 FX Processor Data Sheet.
[23]
P. Shivakumar and N. Jouppi. CACTI 3.0: An integrated cache timing, power and area model. WRL Research Report 2001.
[24]
J. Kin, M. Gupta and W.H. Mangione-Smith. The filter cache: an energy efficient memory structure. In Proceedings of International Symposium on Microarchitecture, pages 184--193, 1997.

Cited By

View all
  • (2022)Featherweight Soft Error Resilience for GPUs2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO56248.2022.00030(245-262)Online publication date: Oct-2022
  • (2022)Error DetectionFault Tolerant Computer Architecture10.1007/978-3-031-01723-0_2(19-59)Online publication date: 5-Mar-2022
  • (2019)Soft Error Resilience in Chip Multiprocessor Cache using a Markov Model Based Re-usability Predictor2019 IEEE 37th International Conference on Computer Design (ICCD)10.1109/ICCD46524.2019.00072(468-476)Online publication date: Nov-2019
  • Show More Cited By

Index Terms

  1. Enhancing data cache reliability by the addition of a small fully-associative replication cache

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICS '04: Proceedings of the 18th annual international conference on Supercomputing
    June 2004
    360 pages
    ISBN:1581138393
    DOI:10.1145/1006209
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 26 June 2004

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. in-cache replication
    2. soft error
    3. write-back cache

    Qualifiers

    • Article

    Conference

    ICS04
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 629 of 2,180 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Featherweight Soft Error Resilience for GPUs2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO56248.2022.00030(245-262)Online publication date: Oct-2022
    • (2022)Error DetectionFault Tolerant Computer Architecture10.1007/978-3-031-01723-0_2(19-59)Online publication date: 5-Mar-2022
    • (2019)Soft Error Resilience in Chip Multiprocessor Cache using a Markov Model Based Re-usability Predictor2019 IEEE 37th International Conference on Computer Design (ICCD)10.1109/ICCD46524.2019.00072(468-476)Online publication date: Nov-2019
    • (2019)Dynamic Selective Warp Scheduling for GPUs Using L1 Data Cache Locality Information10.1007/978-981-13-5907-1_24(230-239)Online publication date: 8-Feb-2019
    • (2014)SPMCloudACM Transactions on Design Automation of Electronic Systems10.1145/261175519:3(1-45)Online publication date: 23-Jun-2014
    • (2014)Embedded RAIDs-on-chip for bus-based chip-multiprocessorsACM Transactions on Embedded Computing Systems10.1145/253331613:4(1-36)Online publication date: 10-Mar-2014
    • (2013)Durable data storage in distributed non persistent caching environmentProceedings of the 6th ACM India Computing Convention10.1145/2522548.2523128(1-7)Online publication date: 22-Aug-2013
    • (2013)Free ECC: An efficient error protection for compressed last-level caches2013 IEEE 31st International Conference on Computer Design (ICCD)10.1109/ICCD.2013.6657054(278-285)Online publication date: Oct-2013
    • (2012)Replicating tag entries for reliability enhancement in cache tag arraysIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2011.211146920:4(643-654)Online publication date: 1-Apr-2012
    • (2012)Software Controlled Memories for Scalable Many-Core ArchitecturesProceedings of the 2012 IEEE International Conference on Embedded and Real-Time Computing Systems and Applications10.1109/RTCSA.2012.60(1-10)Online publication date: 19-Aug-2012
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media