Skip to main content
Log in

Accurate and Simplified Prediction of AVF for Delay and Energy Efficient Cache Design

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

With continuous technology scaling, on-chip structures are becoming more and more susceptible to soft errors. Architectural vulnerability factor (AVF) has been introduced to quantify the architectural vulnerability of on-chip structures to soft errors. Recent studies have found that designing soft error protection techniques with the awareness of AVF is greatly helpful to achieve a tradeoff between performance and reliability for several structures (i.e., issue queue, reorder buffer). Cache is one of the most susceptible components to soft errors and is commonly protected with error correcting codes (ECC). However, protecting caches closer to the processor (i.e., L1 data cache (L1D)) using ECC could result in high overhead. Protecting caches without accurate knowledge of the vulnerability characteristics may lead to over-protection. Therefore, designing AVF-aware ECC is attractive for designers to balance among performance, power and reliability for cache, especially at early design stage. In this paper, we improve the methodology of cache AVF computation and develop a new AVF estimation framework, soft error reliability analysis based on SimpleScalar. Then we characterize dynamic vulnerability behavior of L1D and detect the correlations between L1D AVF and various performance metrics. We propose to employ Bayesian additive regression trees to accurately model the variation of L1D AVF and to quantitatively explain the important effects of several key performance metrics on L1D AVF. Then, we employ bump hunting technique to reduce the complexity of L1D AVF prediction and extract some simple selecting rules based on several key performance metrics, thus enabling a simplified and fast estimation of L1D AVF. Based on the simplified and fast estimation of L1D AVF, intervals of high L1D AVF can be identified online, enabling us to develop the AVF-aware ECC technique to reduce the overhead of ECC. Experimental results show that compared with traditional ECC technique which provides complete ECC protection throughout the entire lifetime of a program, AVF-aware ECC technique reduces the L1D access latency by 35% and saves power consumption by 14% for SPEC2K benchmarks averagely.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Cai Y, Schmitz M T, Ejlali A, AlHashimi B M, Reddy S M. Cache size selection for performance, energy and reliability of time-constrained systems. In Proc. the 2006 Asia and South Pacific Design Automation Conference (ASP-DAC), Yokohama, Japan, Jan. 24–27, 2006, pp.923–928.

  2. Gaisler J. Evaluation of a 32-bit microprocessor with built-in concurrent error-detection. In Proc. the 27th International Symposium on Fault-Tolerant Computing (FTCS 1997), Seattle, USA, June 25–27, 1997, p.42.

  3. Mitra S, Seifert N, Zhang M, Shi Q, Kim K. Robust system design with built-in soft-error resilience. IEEE Computer, February, 2005, 38(1): 43–52.

    Google Scholar 

  4. Baumann R C. The impact of technology scaling on soft error rate performance and limits to the efficacy of error correction. In Proc. International Electron Devices Meeting (IEDM2002), San Jose, USA, Feb. 26-Mar. 1, 2002, pp.329–332.

  5. Agarwal A, Paul B C, Mukhopadhyay S, Roy K. Process variation in embedded memories: Failure analysis and variation aware architecture. IEEE Journal of Solid-State Circuits, September, 2005, 40(9): 1804–1814.

    Article  Google Scholar 

  6. Lambert D, Baggio J, Ferlet C V, Flament O, Saigne F, Sagnes B, Buard N, Carriere T. Neutron-induced SEU in bulk SRAMs in terrestrial environment: Simulations and experiments. IEEE Trans. Nuc. Sci., 2004, 51(6): 3435–3441.

    Article  Google Scholar 

  7. Granlund T, Granbom B, Olsson N. Soft error rate increase for new generations of SRAMs. IEEE Trans. Nuc. Sci., 2003, 50(6): 2065–2068.

    Article  Google Scholar 

  8. Mukherjee S S, Weaver C, Emer J, Reinhardt S, Austin T. A systematic methodology to compute the Architectural Vulnerability Factors for a high-performance microprocessor. In Proc. the International Symposium on Microarchitecture (MICRO), San Diego, USA, Dec. 3–5, 2003, pp.29–40.

  9. Wang N J, Quek J, Rafacz T M, Patel S J. Characterizing the effects of transient faults on a high-performance processor pipeline. In Proc. the International Conference on Dependable Systems and Networks (DSN), Florence, Italy, Jun. 28-Jul. 1, 2004, pp.61–70.

  10. Fu X, Poe J, Li T, Fortes J. Characterizing microarchitecture soft error vulnerability phase behavior. In Proc. the International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), Monterey, USA, Sept. 11–14, 2006, pp.147–155.

  11. Walcott K R, Humphreys G, Gurumurthi S. Dynamic prediction of architectural vulnerability from microarchitectural state. In Proc. the International Symposium on Computer Architecture (ISCA), San Diego, USA, Jun. 9–13, 2007, pp.516–527.

  12. Duan L, Li B, Peng L. Versatile prediction and fast estimation of Architectural Vulnerability Factor from processor performance metrics. In Proc. the 15th IEEE International Symposium on High Performance Computer Architecture (HPCA), Raleigh, USA, Feb. 14–18, 2009, pp.129–140.

  13. Soundararajan N, Parashar A, Sivasubramaniam A. Mechanisms for bounding vulnerabilities of processor structures In Proc. the International Symposium on Computer Architecture (ISCA), San Diego, USA, Jun. 9–13, 2007, pp.506–515.

  14. Alpha 21264 Microprocessor Hardware Reference Manual. Digital Equipment Corporation, July 1999.

  15. Reick K, Sanda P N, Swaney S, Kellington J W, Mack M, Floyd M, Henderson D. Fault-tolerant design of the IBM Power6 microprocessor. IEEE Micro, March, 2008, 28(2): 30–38.

    Article  Google Scholar 

  16. AMD Athlon(TM) 64 Processor. http://www.amd.com, 2008.

  17. Intel Pentium 4 Processor Technical Documentation. http://www.intel.com/design/pentium4/documentation.htm, 2004.

  18. Rusu S, Muljono H, Cherkauer B. Itanium 2 processor 6M: Higher frequency and larger L3 cache. IEEE Micro, March, 2004, 24(2): 10–18.

    Article  Google Scholar 

  19. OpenSPARC T2 System-On-Chip (SOC) microarchitecture specification. Sun Microsystems Inc, May, 2008.

  20. Liu C, Gu Y, Sun L, Yan B, Wang D. R-ADMAD: High reliability provision for large-scale de-duplication archival storage systems. In Proc. the 23rd International Conference on Supercomputing, Yorktown Heights, USA, June 8–12, 2009, pp.370–379.

  21. Gao X, Chen Y J, Wang H D, Tang D, Hu W W. System architecture of Godson-3 multi-core processors. Journal of Computer Science and Technology, 2010, 25(2): 181–191.

    Article  Google Scholar 

  22. Shrivastava A, Lee J, Jeyapaul R. Cache vulnerability equations for protecting data in embedded processor caches from soft errors. In Proc. the ACM SIGPLAN/SIGBED 2010 Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES), Stockholm, Sweden, Apr. 13–15, 2010, pp.143–152.

  23. Li J F, Huang Y J. An error detection and correction scheme for RAMs with partial-write function. In Proc. the 2005 IEEE International Workshop on Memory Technology, Design, and Testing (MTDT), Taipei, China, Aug. 3–5, 2005, pp.115–120.

  24. Phelan R. Addressing soft errors in ARM core-based designs. Technical Report, ARM, 2003.

  25. Mohr K C, Clark L T. Delay and area efficient first-level cache soft error detection and correction. In Proc. International Conference on Computer Design, San Jose, USA, Oct. 1–4, 2006, pp.88–92.

  26. Kim J, Hardavellas N, Mai K, Falsafi B, Hoe J. Multi-bit error tolerant caches using two-dimensional error coding. In Proc. the 40th Annual IEEE/ACM International Symposium on Microarchitecture, Chicago, USA, Dec. 1–5, 2007, pp.197–209.

  27. Li L, Degalahal V, Vijaykrishnan N, Kandemir M, Irwini M. Soft error and energy consumption interactions: A data cache perspective. In Proc. International Symposium on Low Power Electronics and Design (ISLPED), Newport Beach, USA, Aug. 9–11, 2004, pp.132–137.

  28. Sadler N N, Sorin D J. Choosing an error protection scheme for a microprocessor’s L1 data cache. In Proc. International Conference on Computer Design (ICCD), San Jose, USA, Oct. 1–4, 2006, pp.499–505.

  29. Yoon D, Erez M. Memory mapped ECC: Low-cost error protection for last level caches. In Proc. the 36th International Symposium Computer Architecture (ISCA), Austin, USA, Jun. 20–24, 2009, pp.116–127.

  30. Yoon D, Erez M. Virtualized and flexible ECC for main memory. In Proc. the 15th Architectural Support for Programming Languages and Operating Systems (ASPLOS), Pittsburgh, USA, Mar. 13–17, 2010, pp.397–408.

  31. Chipman H A, George E I, McCulloch R E. Bayesian ensemble learning. Neural Information Processing Systems, 19, Scholkopf B, Platt J, Hoffman T (eds.), Cambridge: MIT Press, MA, 2007.

  32. Friedman J, Fisher N. Bump hunting in high-dimensional data. Statistics and Computing, 1999, 9(2): 123–143.

    Article  Google Scholar 

  33. Biswas A, Cheveresan R, Emer J, Mukherjee S S, Racunas P B, Rangan R. Computing architectural vulnerability factors for address-based structures. In Proc. the International Symposium on Computer Architecture (ISCA), Madison, USA, Jun. 4–8, 2005, pp.532–543.

  34. Fu X, Li T, Fortes J. Sim-SODA: A framework for microarchitecture reliability analysis. In Proc. the Workshop on Modeling, Benchmarking and Simulation (Held in conjunction with International Symposium on Computer Architecture), June, 2006.

  35. Li X, Adve S V, Bose P, Rivers A. Soft-Arch: An architecturelevel tool for modeling and analyzing soft errors. In Proc. the International Conference on Dependable Systems and Networks (DSN), Yokohama, Japan, Jun. 28-Jul. 1, 2005, pp.496–505.

  36. SimAlpha Homepage. http://www.arch.cs.titech.ac.jp/~kise/SimAlpha/index.htm, 2003.

  37. Burger D, Austin T. The SimpleScalar Toolset, Version 3.0. http://www.simplescalar.com, 2001.

  38. Friedman J H. Multivariate adaptive regression splines. Annals of Statistics, 1991, 19(1): 1–67.

    Article  MathSciNet  MATH  Google Scholar 

  39. Vapnik V N. The Nature of Statistical Learning Theory. New York: Springer-Verlag New York, Inc., NY, 1995.

  40. Breiman L. Random forests. Machine Learning, October 2001, 45(1): 5–32.

    Article  MATH  Google Scholar 

  41. Friedman J. Greedy function approximation: A gradient boosting machine. Annuals of Statistics, 2001, 29(5): 1189–1232.

    Article  MATH  Google Scholar 

  42. Sherwood T, Perelman E, Hamerly G, Calder B. Automatically characterizing large scale program behavior. In Proc. the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), San Jose, USA, Oct. 5–9, 2002, pp.45–57.

  43. Duesterwald E, Cascaval C, Dwarkadas S. Characterizing and predicting program behavior and its variability. In Proc. the 12th International Conference on Parallel Architectures and Compilation Techniques, New Orleans, USA, Sept. 27-Oct. 1, 2003, p.220.

  44. CACTI 6.0. http://www.cs.utah.edu/~rajeev/cacti6/, 2009.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to An-Guo Ma.

Additional information

Supported by the National Natural Science Foundation of China under Grant Nos. 60970036 and 60873016, the National High Technology Development 863 Program of China under Grant Nos. 2009AA01Z102 and 2009AA01Z124.

Electronic Supplementary Material

Below is the link to the electronic supplementary material.

(PDF 89.3 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ma, AG., Cheng, Y. & Xing, ZC. Accurate and Simplified Prediction of AVF for Delay and Energy Efficient Cache Design. J. Comput. Sci. Technol. 26, 504–519 (2011). https://doi.org/10.1007/s11390-011-1150-7

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-011-1150-7

Keywords

Navigation