Skip to main content

Advertisement

Log in

Fault buffers

Enabling near-true voltage scaling in variation-sensitive L1 caches

  • Published:
Design Automation for Embedded Systems Aims and scope Submit manuscript

Abstract

Voltage scaling can be applied to cache memories to reduce their energy consumptions. However, reduced supply voltage to the cache memories increases the number of defective SRAM cells due to process variations, which will decrease their yields and nullify the benefits of voltage scaling. To mitigate this problem, we propose a fault buffer-based scheme for L1 caches. Faults are identified and isolated at the granularity of individual words in the L1 caches. Actively used faulty cache words are dynamically allocated in the fault buffers. The fault buffers are organized as multiple banks for low cost implementation and can be dynamically reconfigured to reflect varying performance demands of programs. This dynamic scheme is shown to be more energy- and area-efficient than, and to be performing comparably to, the previously proposed static schemes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20

Similar content being viewed by others

Notes

  1. The word “Word” should be interpreted in the context of the cited reference.

  2. Content-addressable-memory.

  3. For example, eight Defect-mark bits increase the path effort by 38 % (assuming tagline=19tag+1v+1lru) but effort delay is within 5 % for N greater than 6 (D e =N×F N).

References

  1. Agarwal A, Paul B, Mahmoodi H, Datta A, Roy K (2005) A process-tolerant cache architecture for improved yield in nanoscale technologies. IEEE Trans Very Large Scale Integr (VLSI) Syst 13(1):27–38

    Article  Google Scholar 

  2. Agarwal K, Hayes J, Barth J, Jacunski M, Nowka K, Kirihata T, Iyer S (2010) In-situ measurement of variability in 45-nm SOI embedded dram arrays. In: 2010 IEEE symposium on VLSI circuits (VLSIC)

    Google Scholar 

  3. Ansari A, Gupta S, Feng S, Mahlke S (2009) Zerehcache: armoring cache architectures in high defect density technologies. In: 42nd annual IEEE/ACM international symposium on microarchitecture, 2009

    Google Scholar 

  4. Ansari A, Feng S, Gupta S, Mahlke S (2011) Archipelago: a polymorphic cache design for enabling robust near-threshold operation. In: 2011 IEEE 17th international symposium on high performance computer architecture (HPCA)

    Google Scholar 

  5. ARM Information Center (2012) Cortex-A9 MBIST controller. Technical Reference Manual, ARM DDI 0414I, URL http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0414i/I1006347.html. Accessed 1 Oct 2012

  6. Augustine C, Mojumder N, Fong X, Choday S, Park S, Roy K (2012) Spin-transfer torque MRAMS for low power memories: perspective and prospective. IEEE Sens J 12(4):756–766

    Article  Google Scholar 

  7. Austin T, Larson E, Ernst D (2002) Simplescalar: an infrastructure for computer system modeling. Computer 35(2):59–67

    Article  Google Scholar 

  8. Bhavnagarwala A, Tang X, Meindl J (2001) The impact of intrinsic device fluctuations on CMOS SRAM cell stability. IEEE J Solid-State Circuits 36(4):658–665

    Article  Google Scholar 

  9. Burd T, Pering T, Stratakos A, Brodersen R (2000) A dynamic voltage scaled microprocessor system. In: 2000 IEEE international solid-state circuits conference, 2000. Digest of technical papers. ISSCC

    Google Scholar 

  10. Chang L, Montoye R, Nakamura Y, Batson K, Eickemeyer R, Dennard R, Haensch W, Jamsek D (2008) An 8T-SRAM for variability tolerance and low-voltage operation in high-performance caches. IEEE J Solid-State Circuits 43(4):956–963

    Article  Google Scholar 

  11. Contreras G, Martonosi M, Peng J, Ju R, Lueh GY (2004) XTREM: a power simulator for the intel XSCALE core. In: Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on languages, compilers, and tools for embedded systems, LCTES’04. ACM, New York

    Google Scholar 

  12. Gerosa G et al. (2008) A sub 2W low power IA processor for mobile internet devices in 45 nm HI-K metal gate CMOS. In: IEEE Asian solid-state circuits conference, 2008, A-SSCC’08

    Google Scholar 

  13. Guthaus M, Ringenberg J, Ernst D, Austin T, Mudge T, Brown R (2001) Mibench: a free, commercially representative embedded benchmark suite. In: IEEE international workshop on workload characterization

    Google Scholar 

  14. Howard J (2010) A 48-core IA-32 message-passing processor with DVFS in 45 nm CMOS. In: 2010 IEEE international solid-state circuits conference digest of technical papers (ISSCC)

    Google Scholar 

  15. Hsu WN, Wu TH, Huang TC (2009) Three-transistor DRAM-based content addressable memory design for reliability and area efficiency. In: IEEE international workshop on memory technology, design, and testing, MTDT ’09

    Google Scholar 

  16. Hussain M, Mutyam M (2008) Block remap with turnoff: a variation-tolerant cache design technique. In: Asia and South Pacific design automation conference, 2008, ASPDAC 2008

    Google Scholar 

  17. Hopper J (2009) Reduce Linux power consumption, part 1: the cpufreq sub-system. IBM DeveloperWorks. URL http://www.ibm.com/developerworks/linux/library/l-cpufreq-1/index.html. Accessed 29 Feb 2012

  18. Jouppi N (1990) Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In: Proceedings. 17th annual international symposium on computer architecture, 1990

    Google Scholar 

  19. Kaxiras S, Hu Z, Martonosi M (2001) Cache decay: exploiting generational behavior to reduce cache leakage power. In: Proceedings 28th annual international symposium on computer architecture, 2001

    Google Scholar 

  20. Kulkarni J, Kim K, Roy K (2007) A 160 mV, fully differential, robust Schmitt trigger based sub-threshold SRAM. In: ACM/IEEE international symposium on low power electronics and design (ISLPED), 2007

    Google Scholar 

  21. Kumar R, Hinton G (2009) A family of 45 nm IA processors. In: IEEE international solid-state circuits conference—digest of technical papers, 2009, ISSCC 2009

    Google Scholar 

  22. Kurd N et al. (2010) Westmere: a family of 32 nm IA processors. In: IEEE international solid-state circuits conference digest of technical papers (ISSCC), 2010

    Google Scholar 

  23. Ladas N, Sazeides Y, Desmet V (2010) Performance-effective operation below vcc-min. In: IEEE international symposium on performance analysis of systems software (ISPASS), 2010

    Google Scholar 

  24. Lee H, Cho S, Childers B (2007) Performance of graceful degradation for cache faults. In: IEEE computer society annual symposium on VLSI, ISVLSI ’07

    Google Scholar 

  25. Mahmood T, Kim S (2010) Fine-grained fault tolerance for process variation-aware caches. In: IEEE computer society annual symposium on VLSI (ISVLSI), 2010

    Google Scholar 

  26. Miller J, Conary J, DiMarco D (2000) A 16 Gb/s, 0.18 um cache tile for integrated L2 caches from 256 kB to 2 MB. In: Digest of technical papers. Symposium on VLSI circuits, 2000

    Google Scholar 

  27. Mohammad B, Bassett P, Abraham J, Aziz A (2006) Cache organization for embeded processors: cam-vs-sram. In: IEEE international SOC conference, 2006

    Google Scholar 

  28. Mukhopadhyay S, Mahmoodi H, Roy K (2005) Modeling of failure probability and statistical design of SRAM array for yield enhancement in nanoscaled CMOS. IEEE Trans Comput-Aided Des Integr Circuits Syst 24(12):1859–1880

    Article  Google Scholar 

  29. Mutyam M, Narayanan V (2007) Working with process variation aware caches. In: Design, automation test in Europe conference exhibition, 2007, DATE ’07

    Google Scholar 

  30. Packan P (2009) High performance 32 nm logic technology featuring 2nd generation high-k + metal gate transistors. In: IEEE international electron devices meeting (IEDM), 2009

    Google Scholar 

  31. Park J, Shin D, Chang N, Pedram M (2010) Accurate modeling and calculation of delay and energy overheads of dynamic voltage scaling in modern high-performance microprocessors. In: ACM/IEEE international symposium on low-power electronics and design (ISLPED), 2010

    Google Scholar 

  32. Patwary A, Geuskens B, Lu SL (2009) Content addressable memory for low-power and high-performance applications. In: WRI world congress on computer science and information engineering, vol 3

    Google Scholar 

  33. Riedlinger R, Bhatia R, Biro L, Bowhill B, Fetzer E, Gronowski P, Grutkowski T (2011) A 32 nm 3.1 billion transistor 12-wide-issue itanium processor for mission-critical servers. In: IEEE international solid-state circuits conference digest of technical papers (ISSCC), 2011

    Google Scholar 

  34. Sasan A, Homayoun H, Eltawil A, Kurdahi F (2011) Inquisitive defect cache: a means of combating manufacturing induced process variation. IEEE Trans Very Large Scale Integr (VLSI) Syst 19(9):1597–1609

    Article  Google Scholar 

  35. Shirvani P, McCluskey E (1999) Padded cache: a new fault-tolerance technique for cache memories. In: Proceedings 17th IEEE VLSI test symposium, 1999

    Google Scholar 

  36. Weng H (2011) Basic PBIST configuration and influence on current consumption. Texas Instruments, Application report SPNA128C. URL http://www.ti.com/lit/an/spna128c/spna128c.pdf. Accessed 1 Oct 2012

  37. Wilkerson C, Gao H, Alameldeen A, Chishti Z, Khellah M, Lu SL (2008) Trading of cache capacity for reliability to enable low voltage operation. In: 35th international symposium on computer architecture, 2008, ISCA ’08

    Google Scholar 

  38. Wilton S, Jouppi N (1996) CACTI: an enhanced cache access and cycle time model. IEEE J Solid-State Circuits 31(5):677–688

    Article  Google Scholar 

  39. Wu CW (2006) Memory testing and built-in self-test VLSI test principles and architectures: design for testability. The Morgan Kaufmann series in systems on silicon

    Google Scholar 

  40. Xu W, Zhang T, Chen Y (2010) Design of spin-torque transfer magnetoresistive RAM and CAM/TCAM with high sensing and search speed. IEEE Trans Very Large Scale Integr (VLSI) Syst 18(1):66–74

    Article  Google Scholar 

  41. Zhang W (2005) Replication cache: a small fully associative cache to improve data cache reliability. IEEE Trans Comput 54(12):1547–1555

    Article  Google Scholar 

  42. Zhao W, Cao Y (2006) New generation of predictive technology model for sub-45 nm early design exploration. IEEE Trans Electron Devices 53(11):2816–2823

    Article  Google Scholar 

Download references

Acknowledgements

This research was supported by the National Research Foundation of Korea (NRF) grants funded by the Ministry of Education, Science and Technology (2011-0005378, 2012-0000980) and by the Ministry of Knowledge and Economics (10041313).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tayyeb Mahmood.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mahmood, T., Kim, S. Fault buffers. Des Autom Embed Syst 17, 411–438 (2013). https://doi.org/10.1007/s10617-012-9104-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10617-012-9104-z

Keywords

Navigation