Abstract
Semiconductor technology evolution suggests that permanent failure rates will increase dramatically with scaling, in particular for SRAM cells. While well known approaches such as error correcting codes exist to recover from failures and provide fault-free chips, they will not be affordable anymore in the future due to their growing cost. Consequently, other approaches like fine-grained disabling and reconfiguration of hardware elements (e.g. individual functional units or cache blocks) will become economically necessary. This fine-grained disabling will degrade performance compared to a fault-free execution. To the best of our knowledge, all static worst-case execution time (WCET) estimation methods assume fault-free processors. Their result is not safe anymore when fine-grained disabling of hardware components is used. In this paper we provide the first method that statically calculates a probabilistic WCET bound in the presence of permanent faults in instruction caches. The proposed method derives a probabilistic WCET bound for a program, cache configuration, and probability of cell failure. As our method relies on static analysis to bound the longest path, its probabilistic nature only stems from the probability that faults actually occur. Our method is computationally tractable because it does not require an exhaustive enumeration of all the possible combinations of faulty cache blocks. Experimental results show that it provides WCET estimates very close to, but never below, the method that derives probabilistic WCETs by enumerating all possible locations of faulty cache blocks. The proposed method not only allows to quantify the impact of permanent faults on WCET estimates, but, most importantly, can be used in architectural exploration frameworks to select the most appropriate fault management mechanisms and design parameters for current and future chip designs.
Similar content being viewed by others
Notes
Especially, an in-order pipeline with a constant latency per instruction is assumed like in Slijepcevic et al. (2013).
Due to the definition of CHMC FM, variable \(x_b\) is split into two distinct variables, one for the first execution of the block (\(x^{first}_b\)) and another for the next executions (\(x^{next}_b\) ).
Nachos web site, http://www.cs.washington.edu/homes/tom/nachos/.
References
Borkar S, Karnik T, Narendra S, Tschanz J, Keshavarzi A, De V (2003) Parameter variations and impact on circuits and microarchitecture. In DAC40, pp 338–342
Bowman K, Tschanz J, Wilkerson C, Lu SL, Karnik T, De V, Borkar S. (2009) Circuit techniques for dynamic variation tolerance. In: DAC46. ACM, New York, pp 4–7. doi:10.1145/1629911.1629915
Cazorla FJ, Quiñones E, Vardanega T, Cucu L, Triquet B, Bernat G, Berger E, Abella J, Wartel F, Houston M, Santinellei L, Kosmidis L, Lo C, Maxim D (2013) Proartis: probabilistically analysable real-time systems. ACM Trans Embed Comput Syst 12:79
Cheng L, Gupta P, Spanos CJ, Qian K, He L (2011) Physically justifiable die-level modeling of spatial variation in view of systematic across wafer variability. IEEE Trans CAD Integr Circuits Syst 30(3):388–401
Chevochot P, Puaut I (1999) Scheduling fault-tolerant distributed hard real-time tasks independently of the replication strategies. In: 6th international conference on real-time computing and applications symposium, pp 356–363
Colin A, Puaut I (2001) A modular and retargetable framework for tree-based WCET analysis. In: Euromicro conference on real-time systems (ECRTS), Delft, pp. 37–44
Engblom J (2002) Processor pipelines and static worst-case execution time analysis. PhD thesis, Uppsala University.
Ferdinand C, Wilhelm R (1998) On predicting data cache behavior for real-time systems. In LCTES ’98: proceedings of the ACM SIGPLAN workshop on languages, compilers, and tools for embedded systems, pp 16–30
Ghosh S, Melhem R, Mossé D (1997) Fault-tolerance through scheduling of aperiodic tasks in hard real-time multiprocessor systems. IEEE Trans Parallel Distrib Syst 8(3):272–284
Hamming R (1950) Error detecting and error correcting codes. Bell Syst Tech J 26(2):147–160
Hardy D, Puaut I (2008) WCET analysis of multi-level non-inclusive set-associative instruction caches. In: Proceedings of the 29th real-time systems symposium, pp 456–466
Hardy D, Puaut I (2013) Static probabilistic worst case execution time estimation for architectures with faulty instruction caches. In: 21st international conference on real-time networks and systems, RTNS 2013, Sophia Antipolis, October 17–18, pp 35–44
Hardy D, Sideris I, Ladas N, Sazeides Y (2012) The performance vulnerability of architectural and non-architectural arrays to permanent faults. In: Proceedings of the 45th annual IEEE/ACM international symposium on microarchitecture, MICRO’12, pp 48–59
Höfig K (2012) Failure-dependent timing analysis: a new methodology for probabilistic worst-case execution time analysis. In: Schmitt J (ed) Measurement, modelling, and evaluation of computing systems and dependability and fault tolerance. Lecture notes in computer science, vol 7201. Springer, Berlin, pp 61–75
Kosmidis L, Abella J, Quiñones E, Cazorla FJ (2013) A cache design for probabilistically analysable real-time systems. In: Proceedings of the conference on design, automation and test in Europe, DATE ’13, pp 513–518
Li X, Roychoudhury A, Mitra T (2006) Modeling out-of-order processors for WCET estimation. Real Time Syst J 34(3):195–227
Li YTS, Malik S (1995) Performance analysis of embedded software using implicit path enumeration. In: Gerber R, Marlowe T (eds) LCTES ’95: proceedings of the ACM SIGPLAN 1995 workshop on languages, compilers, & tools for real-time systems, vol 30, New York, pp 88–98
Maxim D, Houston M, Santinelli L, Bernat G, Davis RI, Cucu-Grosjean L (2012) Re-sampling for statistical timing analysis of real-time systems. In: Proceedings of the 20th international conference on real-time and network systems, RTNS ’12. ACM, New York, pp. 111–120. doi:10.1145/2392987.2393001
McNairy C, Mayfield, J (2005) Montecito error protection and mitigation. In: HPCRI ’05: 1st workshop on high performance computing reliability issues
Mueller F (2000) Timing analysis for instruction caches. Real Time Syst 18(2–3):217–247
Nassif SR, Mehta N, Cao Y (2010) A resilience roadmap. In: DATE, pp 1011–1016
Punnekkat S, Burns A, Davis R (2001) Analysis of checkpointing for real-time systems. Real Time Syst 20(1):83–102
Puschner P, Schedl AV (1997) Computing maximum task execution times: a graph based approach. Proc IEEE Real Time Syst Symp 13:67–91
Reineke J, Grund D, Berg C, Wilhelm R (2007) Timing predictability of cache replacement policies. Real Time Syst 37(2):99–122
Slijepcevic M, Kosmidis L, Abella J, Quinones E, Cazorla F (2013) DTM: degraded test mode for fault-aware probabilistic timing analysis. In: 2013 25th Euromicro conference on real-time systems (ECRTS), pp 237–248
Theiling H, Ferdinand C, Wilhelm R (2000) Fast and precise WCET prediction by separated cache and path analyses. Real Time Syst 18(2–3):157–179
Wilhelm R, Engblom J, Ermedahl A, Holsti N, Thesing S, Whalley D, Bernat G, Ferdinand C, Heckmann R, Mitra T, Mueller F, Puaut I, Puschner P, Staschulat J, Stenström P (2008) The worst-case execution-time problem-overview of methods and survey of tools. ACM Trans Embed Comput Syst 7(3):36:1–36:53. doi:10.1145/1347375.1347389
Zhou ST, Katariya S, Ghasemi H, Draper S, Kim NS (2010) Minimizing total area of low-voltage SRAM arrays through joint optimization of cell size, redundancy, and ECC. In: 2010 IEEE international conference on computer design (ICCD), pp 112–117. doi:10.1109/ICCD.2010.5647605
Acknowledgments
The authors would like to thank Jaume Abella, Benjamin Lesage, Bastien Pasdeloup, Erven Rohou and André Seznec for their fruitful comments on earlier versions of this paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Hardy, D., Puaut, I. Static probabilistic worst case execution time estimation for architectures with faulty instruction caches. Real-Time Syst 51, 128–152 (2015). https://doi.org/10.1007/s11241-014-9212-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11241-014-9212-x