ABSTRACT
Caches are known to consume a large part of total microprocessor power. Traditionally, voltage scaling has been used to reduce both dynamic and leakage power in caches. However, aggressive voltage reduction causes process-variation-induced failures in cache SRAM arrays, which compromise cache reliability. We present Multi-Copy Cache (MC2), a new cache architecture that achieves significant reduction in energy consumption through aggressive voltage scaling, while maintaining high error resilience (reliability) by exploiting multiple copies of each data item in the cache. Unlike many previous approaches, MC2 does not require any error map characterization and therefore is responsive to changing operating conditions (e.g., Vdd-noise, temperature and leakage) of the cache. MC2 also incurs significantly lower overheads compared to other ECC-based caches. Our experimental results on embedded benchmarks demonstrate that MC2 achieves up to 60% reduction in energy and energy-delay product (EDP) with only 3.5% reduction in IPC and no appreciable area overhead.
- International Technology Roadmap for Semiconductors, 2008. www.itrs.netGoogle Scholar
- W. Wong, C. Koh, et al., "VOSCH: Voltage scaled cache hierarchies," in Proc. ICCD 2007.Google Scholar
- C. Zhang, F. Vahid, and W. Najjar, "A highly configurable cache for low energy embedded systems," ACM TECS, vol. 4, 2005. Google ScholarDigital Library
- F. Behmann, "Embedded.com - The ITRS process roadmap and nextgen embedded multicore SoC design," Mar. 2009.Google Scholar
- S. Mukhopadhyay, H. Mahmoodi, and K. Roy, "Modeling of failure probability and statistical design of SRAM array for yield enhancement in nanoscaled CMOS," IEEE TCAD, vol. 24, 2005. Google ScholarDigital Library
- C. Wilkerson, H. Gao, et al., "Trading off Cache Capacity for Reliability to Enable Low Voltage Operation," in Proc. ISCA 2008. Google ScholarDigital Library
- J. Fritts and W. Wolf, "Multi-level cache hierarchy evaluation for programmable media processors," in Proc. IEEE SiPS 2000.Google Scholar
- J. Fritts, W. Wolf, and B. Liu, "Understanding Multimedia Application Characteristics for Designing Programmable Media Processors," in Proc. SPIE 1999.Google Scholar
- M. Guthaus, J. Ringenberg, et al., "A free, commercially representative embedded benchmark suite," in Proc. IEEE WWC 2001. Google ScholarDigital Library
- M. Y. Hsiao, "A Class of Optimal Minimum Odd-weight-column SEC-DED Codes," IBM JRD, 1970. Google ScholarDigital Library
- ARM Inc., "ARM Cortex-A8 Technical Reference Manual." http://www.arm.com/products/CPUs/ARM_Cortex-A8.htmlGoogle Scholar
- G. Sohi, "Cache memory organization to enhance the yield of high performance VLSI processors," IEEE TC, vol. 38, 1989. Google ScholarDigital Library
- A. Agarwal, B. Paul, et al., "A process-tolerant cache architecture for improved yield in nanoscale technologies," IEEE TVLSI, vol. 13, 2005. Google ScholarDigital Library
- A.K. Djahromi, A.M. Eltawil, et al., "Cross Layer Error Exploitation for Aggressive Voltage Scaling," in Proc. ISQED 2007. Google ScholarDigital Library
- M. Makhzan, A. Khajeh, et al., "Limits on voltage scaling for caches utilizing fault tolerant techniques," in Proc. ICCD 2007.Google Scholar
- A. Sasan, H. Homayoun, et al., "A fault tolerant cache architecture for sub 500mV operation: resizable data composer cache (RDC-cache)," in Proc. CASES 2009. Google ScholarDigital Library
- P. Shirvani and E. McCluskey, "PADded cache: a new fault-tolerance technique for cache memories," in Proc. IEEE VTS, 1999. Google ScholarDigital Library
- Wei Zhang, S. Gurumurthi, et al., "ICR: in-cache replication for enhancing data cache reliability," in Proc. IEEE DSN 2003.Google Scholar
- Q. Chen, H. Mahmoodi, et al., "Modeling and testing of SRAM for new failure mechanisms due to process variations in nanoscale CMOS," in Proc. IEEE VTS 2005. Google ScholarDigital Library
- S. Mukhopadhyay, H. Mahmoodi, and K. Roy, "Modeling of failure probability and statistical design of SRAM array for yield enhancement in nanoscaled CMOS," IEEE TCAD vol. 24, 2005. Google ScholarDigital Library
- B. Calhoun and A. Chandrakasan, "A 256kb Sub-threshold SRAM in 65nm CMOS," in Proc. ISSCC 2006.Google ScholarCross Ref
- A. Khajeh, A. Gupta, et al., "TRAM: A tool for Temperature and Reliability Aware Memory Design," in Proc. DATE 2009. Google ScholarDigital Library
- L. Chang, D. Fried, et al., "Stable SRAM cell design for the 32 nm node and beyond," in Proc. VLSI Tech 2005.Google Scholar
- J. Kulkarni, K. Kim, and K. Roy, "A 160 mV Robust Schmitt Trigger Based Subthreshold SRAM," IEEE JSSC, vol. 42, 2007.Google Scholar
- B. Calhoun and A. Chandrakasan, "A 256kb Sub-threshold SRAM in 65nm CMOS," in Proc. ISSCC 2006.Google ScholarCross Ref
- S. Schuster, "Multiple word/bit line redundancy for semiconductor memories," IEEE JSSC, vol. 13, 1978.Google Scholar
- A. Sasan, H. Homayoun, et al., "Process Variation Aware SRAM/Cache for aggressive voltage-frequency scaling," in Proc. DATE 2009. Google ScholarDigital Library
- P. Genua, "A Cache Primer," Application Note, Freescale Semiconductors, 2004.Google Scholar
- J. Kim, N. Hardavellas, et al.., "Multi-bit Error Tolerant Caches Using Two-Dimensional Error Coding," in Proc. MICRO 2007. Google ScholarDigital Library
- R. Naseer and J. Draper, "Parallel double error correcting code design to mitigate multi-bit upsets in SRAMs," in Proc. ESSCIRC 2008.Google ScholarCross Ref
- P. Mazumder, "Design of a Fault-Tolerant Three-Dimensional Dynamic Random-Access Memory with On-Chip Error-Correcting Circuit," IEEE TC, vol. 42, 1993. Google ScholarDigital Library
- T. Austin, E. Larson, and D. Ernst, "SimpleScalar: an infrastructure for computer system modeling," IEEE Computer, vol. 35, 2002. Google ScholarDigital Library
- W. Zhao and Y. Cao, "Predictive technology model for nano-CMOS design exploration," J. Emerg. Technol. Comput. Syst., vol. 3, 2007. Google ScholarDigital Library
- M. Mamidipaka and N. Dutt, "eCACTI: An enhanced power estimation model for on-chip caches," in Technical Report R-04--28, CECS, UCI, 2004.Google Scholar
- M. Huang, J. Renau, et al., "L1 data cache decomposition for energy efficiency," in Proc. ISLPED, 2001. Google ScholarDigital Library
- N. AbouGhazaleh, A. Ferreira, et al., "Integrated CPU and L2 cache voltage scaling using machine learning," in Proc. LCTES 2007. Google ScholarDigital Library
- S. Lin and D.J. Costello, Error control coding: fundamentals and applications, Prentice Hall, 1983.Google ScholarDigital Library
- M. Khellah, D. Somasekhar, et al., "A 256-Kb Dual-VCC SRAM Building Block in 65-nm CMOS Process With Actively Clamped Sleep Transistor," IEEE JSSC, vol. 42, 2007.Google Scholar
- D. Tarjan, S. Thoziyoor, and N.P. Jouppi, "CACTI 4.0," HP Laboratories, Technical Report, 2006.Google Scholar
- M. Meterelliyoz, J. P. Kulkarni, et al., "Thermal analysis of 8-T SRAM for nano-scaled technologies", in Proc. ISLPED 2008. Google ScholarDigital Library
- A. Diril, Y.S. Dhillon, et al., "Level-Shifter Free Design of Low Power Dual Supply Voltage CMOS Circuits Using Dual Threshold Voltages", in Proc. VLSID 2005. Google ScholarDigital Library
- Z. Chishti, A. Alameldeen, et al., "Improving cache lifetime reliability at ultra-low voltages", in Proc. MICRO 2009. Google ScholarDigital Library
- Predictive Technology Model (PTM) http://ptm.asu.eduGoogle Scholar
- A. Chakraborty, H. Homayoun, et al., "Multi-Copy Cache: A Highly Energy Efficient Cache Architecture" CECS, UC Irvine, Technical Report CECS-TR-10-05, 2010Google Scholar
- H. Homayoun, Mohammad Makhzan, Alex Veidenbaum, "Multiple sleep mode leakage control for cache peripheral circuits in embedded processors", in Proc. CASES 2008 Google ScholarDigital Library
- H. Homayoun et al., ZZ-HVS: "Zig-Zag Horizontal and Vertical Sleep Transistor Sharing to Reduce Leakage Power in On-Chip SRAM Peripheral Circuits". In Proc. ICCD, 2008.Google ScholarCross Ref
Index Terms
- E < MC2: less energy through multi-copy cache
Recommendations
Multicopy Cache: A Highly Energy-Efficient Cache Architecture
Special Issue on Risk and Trust in Embedded Critical Systems, Special Issue on Real-Time, Embedded and Cyber-Physical Systems, Special Issue on Virtual Prototyping of Parallel and Embedded Systems (ViPES)Caches are known to consume a large part of total microprocessor energy. Traditionally, voltage scaling has been used to reduce both dynamic and leakage power in caches. However, aggressive voltage reduction causes process-variation-induced failures in ...
A fault tolerant cache architecture for sub 500mV operation: resizable data composer cache (RDC-cache)
CASES '09: Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systemsIn this paper we introduce Resizable Data Composer-Cache (RDC-Cache). This novel cache architecture operates correctly at sub 500 mV in 65 nm technology tolerating large number of Manufacturing Process Variation induced defects. Based on a smart ...
An efficient direct mapped instruction cache for application-specific embedded systems
CODES+ISSS '05: Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesisCaches may consume half of a microprocessor's total power and cache misses incur accessing off-chip memory, which is both time consuming and energy costly. Therefore, minimizing cache power consumption and reducing cache misses are important to reduce ...
Comments