Abstract
Voltage scaling can be applied to cache memories to reduce their energy consumptions. However, reduced supply voltage to the cache memories increases the number of defective SRAM cells due to process variations, which will decrease their yields and nullify the benefits of voltage scaling. To mitigate this problem, we propose a fault buffer-based scheme for L1 caches. Faults are identified and isolated at the granularity of individual words in the L1 caches. Actively used faulty cache words are dynamically allocated in the fault buffers. The fault buffers are organized as multiple banks for low cost implementation and can be dynamically reconfigured to reflect varying performance demands of programs. This dynamic scheme is shown to be more energy- and area-efficient than, and to be performing comparably to, the previously proposed static schemes.
Similar content being viewed by others
Notes
The word “Word” should be interpreted in the context of the cited reference.
Content-addressable-memory.
For example, eight Defect-mark bits increase the path effort by 38 % (assuming tagline=19tag+1v+1lru) but effort delay is within 5 % for N greater than 6 (D e =N×F −N).
References
Agarwal A, Paul B, Mahmoodi H, Datta A, Roy K (2005) A process-tolerant cache architecture for improved yield in nanoscale technologies. IEEE Trans Very Large Scale Integr (VLSI) Syst 13(1):27–38
Agarwal K, Hayes J, Barth J, Jacunski M, Nowka K, Kirihata T, Iyer S (2010) In-situ measurement of variability in 45-nm SOI embedded dram arrays. In: 2010 IEEE symposium on VLSI circuits (VLSIC)
Ansari A, Gupta S, Feng S, Mahlke S (2009) Zerehcache: armoring cache architectures in high defect density technologies. In: 42nd annual IEEE/ACM international symposium on microarchitecture, 2009
Ansari A, Feng S, Gupta S, Mahlke S (2011) Archipelago: a polymorphic cache design for enabling robust near-threshold operation. In: 2011 IEEE 17th international symposium on high performance computer architecture (HPCA)
ARM Information Center (2012) Cortex-A9 MBIST controller. Technical Reference Manual, ARM DDI 0414I, URL http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0414i/I1006347.html. Accessed 1 Oct 2012
Augustine C, Mojumder N, Fong X, Choday S, Park S, Roy K (2012) Spin-transfer torque MRAMS for low power memories: perspective and prospective. IEEE Sens J 12(4):756–766
Austin T, Larson E, Ernst D (2002) Simplescalar: an infrastructure for computer system modeling. Computer 35(2):59–67
Bhavnagarwala A, Tang X, Meindl J (2001) The impact of intrinsic device fluctuations on CMOS SRAM cell stability. IEEE J Solid-State Circuits 36(4):658–665
Burd T, Pering T, Stratakos A, Brodersen R (2000) A dynamic voltage scaled microprocessor system. In: 2000 IEEE international solid-state circuits conference, 2000. Digest of technical papers. ISSCC
Chang L, Montoye R, Nakamura Y, Batson K, Eickemeyer R, Dennard R, Haensch W, Jamsek D (2008) An 8T-SRAM for variability tolerance and low-voltage operation in high-performance caches. IEEE J Solid-State Circuits 43(4):956–963
Contreras G, Martonosi M, Peng J, Ju R, Lueh GY (2004) XTREM: a power simulator for the intel XSCALE core. In: Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on languages, compilers, and tools for embedded systems, LCTES’04. ACM, New York
Gerosa G et al. (2008) A sub 2W low power IA processor for mobile internet devices in 45 nm HI-K metal gate CMOS. In: IEEE Asian solid-state circuits conference, 2008, A-SSCC’08
Guthaus M, Ringenberg J, Ernst D, Austin T, Mudge T, Brown R (2001) Mibench: a free, commercially representative embedded benchmark suite. In: IEEE international workshop on workload characterization
Howard J (2010) A 48-core IA-32 message-passing processor with DVFS in 45 nm CMOS. In: 2010 IEEE international solid-state circuits conference digest of technical papers (ISSCC)
Hsu WN, Wu TH, Huang TC (2009) Three-transistor DRAM-based content addressable memory design for reliability and area efficiency. In: IEEE international workshop on memory technology, design, and testing, MTDT ’09
Hussain M, Mutyam M (2008) Block remap with turnoff: a variation-tolerant cache design technique. In: Asia and South Pacific design automation conference, 2008, ASPDAC 2008
Hopper J (2009) Reduce Linux power consumption, part 1: the cpufreq sub-system. IBM DeveloperWorks. URL http://www.ibm.com/developerworks/linux/library/l-cpufreq-1/index.html. Accessed 29 Feb 2012
Jouppi N (1990) Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In: Proceedings. 17th annual international symposium on computer architecture, 1990
Kaxiras S, Hu Z, Martonosi M (2001) Cache decay: exploiting generational behavior to reduce cache leakage power. In: Proceedings 28th annual international symposium on computer architecture, 2001
Kulkarni J, Kim K, Roy K (2007) A 160 mV, fully differential, robust Schmitt trigger based sub-threshold SRAM. In: ACM/IEEE international symposium on low power electronics and design (ISLPED), 2007
Kumar R, Hinton G (2009) A family of 45 nm IA processors. In: IEEE international solid-state circuits conference—digest of technical papers, 2009, ISSCC 2009
Kurd N et al. (2010) Westmere: a family of 32 nm IA processors. In: IEEE international solid-state circuits conference digest of technical papers (ISSCC), 2010
Ladas N, Sazeides Y, Desmet V (2010) Performance-effective operation below vcc-min. In: IEEE international symposium on performance analysis of systems software (ISPASS), 2010
Lee H, Cho S, Childers B (2007) Performance of graceful degradation for cache faults. In: IEEE computer society annual symposium on VLSI, ISVLSI ’07
Mahmood T, Kim S (2010) Fine-grained fault tolerance for process variation-aware caches. In: IEEE computer society annual symposium on VLSI (ISVLSI), 2010
Miller J, Conary J, DiMarco D (2000) A 16 Gb/s, 0.18 um cache tile for integrated L2 caches from 256 kB to 2 MB. In: Digest of technical papers. Symposium on VLSI circuits, 2000
Mohammad B, Bassett P, Abraham J, Aziz A (2006) Cache organization for embeded processors: cam-vs-sram. In: IEEE international SOC conference, 2006
Mukhopadhyay S, Mahmoodi H, Roy K (2005) Modeling of failure probability and statistical design of SRAM array for yield enhancement in nanoscaled CMOS. IEEE Trans Comput-Aided Des Integr Circuits Syst 24(12):1859–1880
Mutyam M, Narayanan V (2007) Working with process variation aware caches. In: Design, automation test in Europe conference exhibition, 2007, DATE ’07
Packan P (2009) High performance 32 nm logic technology featuring 2nd generation high-k + metal gate transistors. In: IEEE international electron devices meeting (IEDM), 2009
Park J, Shin D, Chang N, Pedram M (2010) Accurate modeling and calculation of delay and energy overheads of dynamic voltage scaling in modern high-performance microprocessors. In: ACM/IEEE international symposium on low-power electronics and design (ISLPED), 2010
Patwary A, Geuskens B, Lu SL (2009) Content addressable memory for low-power and high-performance applications. In: WRI world congress on computer science and information engineering, vol 3
Riedlinger R, Bhatia R, Biro L, Bowhill B, Fetzer E, Gronowski P, Grutkowski T (2011) A 32 nm 3.1 billion transistor 12-wide-issue itanium processor for mission-critical servers. In: IEEE international solid-state circuits conference digest of technical papers (ISSCC), 2011
Sasan A, Homayoun H, Eltawil A, Kurdahi F (2011) Inquisitive defect cache: a means of combating manufacturing induced process variation. IEEE Trans Very Large Scale Integr (VLSI) Syst 19(9):1597–1609
Shirvani P, McCluskey E (1999) Padded cache: a new fault-tolerance technique for cache memories. In: Proceedings 17th IEEE VLSI test symposium, 1999
Weng H (2011) Basic PBIST configuration and influence on current consumption. Texas Instruments, Application report SPNA128C. URL http://www.ti.com/lit/an/spna128c/spna128c.pdf. Accessed 1 Oct 2012
Wilkerson C, Gao H, Alameldeen A, Chishti Z, Khellah M, Lu SL (2008) Trading of cache capacity for reliability to enable low voltage operation. In: 35th international symposium on computer architecture, 2008, ISCA ’08
Wilton S, Jouppi N (1996) CACTI: an enhanced cache access and cycle time model. IEEE J Solid-State Circuits 31(5):677–688
Wu CW (2006) Memory testing and built-in self-test VLSI test principles and architectures: design for testability. The Morgan Kaufmann series in systems on silicon
Xu W, Zhang T, Chen Y (2010) Design of spin-torque transfer magnetoresistive RAM and CAM/TCAM with high sensing and search speed. IEEE Trans Very Large Scale Integr (VLSI) Syst 18(1):66–74
Zhang W (2005) Replication cache: a small fully associative cache to improve data cache reliability. IEEE Trans Comput 54(12):1547–1555
Zhao W, Cao Y (2006) New generation of predictive technology model for sub-45 nm early design exploration. IEEE Trans Electron Devices 53(11):2816–2823
Acknowledgements
This research was supported by the National Research Foundation of Korea (NRF) grants funded by the Ministry of Education, Science and Technology (2011-0005378, 2012-0000980) and by the Ministry of Knowledge and Economics (10041313).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Mahmood, T., Kim, S. Fault buffers. Des Autom Embed Syst 17, 411–438 (2013). https://doi.org/10.1007/s10617-012-9104-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10617-012-9104-z