Abstract
With continuous technology scaling, on-chip structures are becoming more and more susceptible to soft errors. Architectural vulnerability factor (AVF) has been introduced to quantify the architectural vulnerability of on-chip structures to soft errors. Recent studies have found that designing soft error protection techniques with the awareness of AVF is greatly helpful to achieve a tradeoff between performance and reliability for several structures (i.e., issue queue, reorder buffer). Cache is one of the most susceptible components to soft errors and is commonly protected with error correcting codes (ECC). However, protecting caches closer to the processor (i.e., L1 data cache (L1D)) using ECC could result in high overhead. Protecting caches without accurate knowledge of the vulnerability characteristics may lead to over-protection. Therefore, designing AVF-aware ECC is attractive for designers to balance among performance, power and reliability for cache, especially at early design stage. In this paper, we improve the methodology of cache AVF computation and develop a new AVF estimation framework, soft error reliability analysis based on SimpleScalar. Then we characterize dynamic vulnerability behavior of L1D and detect the correlations between L1D AVF and various performance metrics. We propose to employ Bayesian additive regression trees to accurately model the variation of L1D AVF and to quantitatively explain the important effects of several key performance metrics on L1D AVF. Then, we employ bump hunting technique to reduce the complexity of L1D AVF prediction and extract some simple selecting rules based on several key performance metrics, thus enabling a simplified and fast estimation of L1D AVF. Based on the simplified and fast estimation of L1D AVF, intervals of high L1D AVF can be identified online, enabling us to develop the AVF-aware ECC technique to reduce the overhead of ECC. Experimental results show that compared with traditional ECC technique which provides complete ECC protection throughout the entire lifetime of a program, AVF-aware ECC technique reduces the L1D access latency by 35% and saves power consumption by 14% for SPEC2K benchmarks averagely.
Similar content being viewed by others
References
Cai Y, Schmitz M T, Ejlali A, AlHashimi B M, Reddy S M. Cache size selection for performance, energy and reliability of time-constrained systems. In Proc. the 2006 Asia and South Pacific Design Automation Conference (ASP-DAC), Yokohama, Japan, Jan. 24–27, 2006, pp.923–928.
Gaisler J. Evaluation of a 32-bit microprocessor with built-in concurrent error-detection. In Proc. the 27th International Symposium on Fault-Tolerant Computing (FTCS 1997), Seattle, USA, June 25–27, 1997, p.42.
Mitra S, Seifert N, Zhang M, Shi Q, Kim K. Robust system design with built-in soft-error resilience. IEEE Computer, February, 2005, 38(1): 43–52.
Baumann R C. The impact of technology scaling on soft error rate performance and limits to the efficacy of error correction. In Proc. International Electron Devices Meeting (IEDM2002), San Jose, USA, Feb. 26-Mar. 1, 2002, pp.329–332.
Agarwal A, Paul B C, Mukhopadhyay S, Roy K. Process variation in embedded memories: Failure analysis and variation aware architecture. IEEE Journal of Solid-State Circuits, September, 2005, 40(9): 1804–1814.
Lambert D, Baggio J, Ferlet C V, Flament O, Saigne F, Sagnes B, Buard N, Carriere T. Neutron-induced SEU in bulk SRAMs in terrestrial environment: Simulations and experiments. IEEE Trans. Nuc. Sci., 2004, 51(6): 3435–3441.
Granlund T, Granbom B, Olsson N. Soft error rate increase for new generations of SRAMs. IEEE Trans. Nuc. Sci., 2003, 50(6): 2065–2068.
Mukherjee S S, Weaver C, Emer J, Reinhardt S, Austin T. A systematic methodology to compute the Architectural Vulnerability Factors for a high-performance microprocessor. In Proc. the International Symposium on Microarchitecture (MICRO), San Diego, USA, Dec. 3–5, 2003, pp.29–40.
Wang N J, Quek J, Rafacz T M, Patel S J. Characterizing the effects of transient faults on a high-performance processor pipeline. In Proc. the International Conference on Dependable Systems and Networks (DSN), Florence, Italy, Jun. 28-Jul. 1, 2004, pp.61–70.
Fu X, Poe J, Li T, Fortes J. Characterizing microarchitecture soft error vulnerability phase behavior. In Proc. the International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), Monterey, USA, Sept. 11–14, 2006, pp.147–155.
Walcott K R, Humphreys G, Gurumurthi S. Dynamic prediction of architectural vulnerability from microarchitectural state. In Proc. the International Symposium on Computer Architecture (ISCA), San Diego, USA, Jun. 9–13, 2007, pp.516–527.
Duan L, Li B, Peng L. Versatile prediction and fast estimation of Architectural Vulnerability Factor from processor performance metrics. In Proc. the 15th IEEE International Symposium on High Performance Computer Architecture (HPCA), Raleigh, USA, Feb. 14–18, 2009, pp.129–140.
Soundararajan N, Parashar A, Sivasubramaniam A. Mechanisms for bounding vulnerabilities of processor structures In Proc. the International Symposium on Computer Architecture (ISCA), San Diego, USA, Jun. 9–13, 2007, pp.506–515.
Alpha 21264 Microprocessor Hardware Reference Manual. Digital Equipment Corporation, July 1999.
Reick K, Sanda P N, Swaney S, Kellington J W, Mack M, Floyd M, Henderson D. Fault-tolerant design of the IBM Power6 microprocessor. IEEE Micro, March, 2008, 28(2): 30–38.
AMD Athlon(TM) 64 Processor. http://www.amd.com, 2008.
Intel Pentium 4 Processor Technical Documentation. http://www.intel.com/design/pentium4/documentation.htm, 2004.
Rusu S, Muljono H, Cherkauer B. Itanium 2 processor 6M: Higher frequency and larger L3 cache. IEEE Micro, March, 2004, 24(2): 10–18.
OpenSPARC T2 System-On-Chip (SOC) microarchitecture specification. Sun Microsystems Inc, May, 2008.
Liu C, Gu Y, Sun L, Yan B, Wang D. R-ADMAD: High reliability provision for large-scale de-duplication archival storage systems. In Proc. the 23rd International Conference on Supercomputing, Yorktown Heights, USA, June 8–12, 2009, pp.370–379.
Gao X, Chen Y J, Wang H D, Tang D, Hu W W. System architecture of Godson-3 multi-core processors. Journal of Computer Science and Technology, 2010, 25(2): 181–191.
Shrivastava A, Lee J, Jeyapaul R. Cache vulnerability equations for protecting data in embedded processor caches from soft errors. In Proc. the ACM SIGPLAN/SIGBED 2010 Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES), Stockholm, Sweden, Apr. 13–15, 2010, pp.143–152.
Li J F, Huang Y J. An error detection and correction scheme for RAMs with partial-write function. In Proc. the 2005 IEEE International Workshop on Memory Technology, Design, and Testing (MTDT), Taipei, China, Aug. 3–5, 2005, pp.115–120.
Phelan R. Addressing soft errors in ARM core-based designs. Technical Report, ARM, 2003.
Mohr K C, Clark L T. Delay and area efficient first-level cache soft error detection and correction. In Proc. International Conference on Computer Design, San Jose, USA, Oct. 1–4, 2006, pp.88–92.
Kim J, Hardavellas N, Mai K, Falsafi B, Hoe J. Multi-bit error tolerant caches using two-dimensional error coding. In Proc. the 40th Annual IEEE/ACM International Symposium on Microarchitecture, Chicago, USA, Dec. 1–5, 2007, pp.197–209.
Li L, Degalahal V, Vijaykrishnan N, Kandemir M, Irwini M. Soft error and energy consumption interactions: A data cache perspective. In Proc. International Symposium on Low Power Electronics and Design (ISLPED), Newport Beach, USA, Aug. 9–11, 2004, pp.132–137.
Sadler N N, Sorin D J. Choosing an error protection scheme for a microprocessor’s L1 data cache. In Proc. International Conference on Computer Design (ICCD), San Jose, USA, Oct. 1–4, 2006, pp.499–505.
Yoon D, Erez M. Memory mapped ECC: Low-cost error protection for last level caches. In Proc. the 36th International Symposium Computer Architecture (ISCA), Austin, USA, Jun. 20–24, 2009, pp.116–127.
Yoon D, Erez M. Virtualized and flexible ECC for main memory. In Proc. the 15th Architectural Support for Programming Languages and Operating Systems (ASPLOS), Pittsburgh, USA, Mar. 13–17, 2010, pp.397–408.
Chipman H A, George E I, McCulloch R E. Bayesian ensemble learning. Neural Information Processing Systems, 19, Scholkopf B, Platt J, Hoffman T (eds.), Cambridge: MIT Press, MA, 2007.
Friedman J, Fisher N. Bump hunting in high-dimensional data. Statistics and Computing, 1999, 9(2): 123–143.
Biswas A, Cheveresan R, Emer J, Mukherjee S S, Racunas P B, Rangan R. Computing architectural vulnerability factors for address-based structures. In Proc. the International Symposium on Computer Architecture (ISCA), Madison, USA, Jun. 4–8, 2005, pp.532–543.
Fu X, Li T, Fortes J. Sim-SODA: A framework for microarchitecture reliability analysis. In Proc. the Workshop on Modeling, Benchmarking and Simulation (Held in conjunction with International Symposium on Computer Architecture), June, 2006.
Li X, Adve S V, Bose P, Rivers A. Soft-Arch: An architecturelevel tool for modeling and analyzing soft errors. In Proc. the International Conference on Dependable Systems and Networks (DSN), Yokohama, Japan, Jun. 28-Jul. 1, 2005, pp.496–505.
SimAlpha Homepage. http://www.arch.cs.titech.ac.jp/~kise/SimAlpha/index.htm, 2003.
Burger D, Austin T. The SimpleScalar Toolset, Version 3.0. http://www.simplescalar.com, 2001.
Friedman J H. Multivariate adaptive regression splines. Annals of Statistics, 1991, 19(1): 1–67.
Vapnik V N. The Nature of Statistical Learning Theory. New York: Springer-Verlag New York, Inc., NY, 1995.
Breiman L. Random forests. Machine Learning, October 2001, 45(1): 5–32.
Friedman J. Greedy function approximation: A gradient boosting machine. Annuals of Statistics, 2001, 29(5): 1189–1232.
Sherwood T, Perelman E, Hamerly G, Calder B. Automatically characterizing large scale program behavior. In Proc. the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), San Jose, USA, Oct. 5–9, 2002, pp.45–57.
Duesterwald E, Cascaval C, Dwarkadas S. Characterizing and predicting program behavior and its variability. In Proc. the 12th International Conference on Parallel Architectures and Compilation Techniques, New Orleans, USA, Sept. 27-Oct. 1, 2003, p.220.
CACTI 6.0. http://www.cs.utah.edu/~rajeev/cacti6/, 2009.
Author information
Authors and Affiliations
Corresponding author
Additional information
Supported by the National Natural Science Foundation of China under Grant Nos. 60970036 and 60873016, the National High Technology Development 863 Program of China under Grant Nos. 2009AA01Z102 and 2009AA01Z124.
Electronic Supplementary Material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Ma, AG., Cheng, Y. & Xing, ZC. Accurate and Simplified Prediction of AVF for Delay and Energy Efficient Cache Design. J. Comput. Sci. Technol. 26, 504–519 (2011). https://doi.org/10.1007/s11390-011-1150-7
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-011-1150-7