Abstract
Protecting computation systems against soft errors is expensive. Unoptimized soft error mitigation schemes can cause area, power, and performance overheads. Therefore, efficient fault-tolerant design should be guided by assessing the cost of developing reliable systems. We present a method to quantify and evaluate trade-offs to protect the system. Using gem5Panalyzer, the toolset we developed to estimate vulnerability factors of microprocessors, we conducted sweeps of Program Vulnerability Factor (PVF) masking to collect the PVF responses to instruction-level masking. We evaluated the confidence in PVF estimations made by gem5Panalyzer with multiple benchmarks from the MiBench suite. Then, we analyzed PVF-masking sweep results. The sensitivity of vulnerability improvement to mitigation techniques varies with the types of applications. When the instruction-level masking effect is 90%, time-averaged PVF reductions of selected benchmarks range from a high of 67% to a low of 10%. The differences in PVF reduction inform designers whether it is worth improving the masking level. As the masking factor is correlated with the efforts to implement mitigations, our method can help to optimize system design choices.
This material is based upon work supported by the National Science Foundation (NSF) under Grant No. CNS-1629853 and CNS-1629839.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Chatzidimitriou, A., Gizopoulos, D.: rACE: reverse-order processor reliability analysis. In: 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 1115–1120. IEEE (2020)
Biswas, A., et al.: Computing architectural vulnerability factors for address-based structures. In: 32nd International Symposium on Computer Architecture (ISCA 2005), pp. 532–543. IEEE (2005)
Fang, B., et al.: ePVF: an enhanced program vulnerability factor methodology for cross-layer resilience analysis. In: 2016 Proceedings of the 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, pp. 168–179. IEEE (2016)
Shafique, M., Rehman, S., Aceituno, P.V., Henkel, J.: Exploiting program-level masking and error propagation for constrained reliability optimization. In: Proceedings of the 50th Annual Design Automation Conference, pp. 1–9 (2013)
Cheng, E., et al.: Tolerating soft errors in processor cores using clear (cross-layer exploration for architecting resilience). IEEE Trans. Comput. Aided Des. Integr. Circ. Syst. 37(9), 1839–1852 (2017)
Leem, L., et al.: ERSA: error resilient system architecture for probabilistic applications. In: 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010), pp. 1560–1565. IEEE (2010)
Qiu, H., et al.: Gem5Panalyzer: a light-weight tool for early-stage architectural reliability evaluation & prediction. In: 2020 IEEE 63rd International Midwest Symposium on Circuits and Systems (MWSCAS), pp. 482–485. IEEE (2020)
Guthaus, M.R., et al.: MiBench: a free, commercially representative embedded benchmark suite. In: Proceedings of the Fourth Annual IEEE International Workshop on Workload Characterization, pp. 3–14. IEEE (2001)
Sridharan, V., Kaeli, D.R.: Eliminating microarchitectural dependency from architectural vulnerability. In: 2009 IEEE 15th International Symposium on High Performance Computer Architecture, pp. 117–128. IEEE (2009)
Henkel, J., et al.: Reliable on-chip systems in the nano-era: lessons learnt and future trends. In: 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC), pp. 1–10. IEEE (2013)
Mukherjee, S.S., et al.: A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor. In: 2003 Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-36, pp. 29–40. IEEE (2003)
Quinn, H., et al.: Using benchmarks for radiation testing of microprocessors and FPGAs. IEEE Trans. Nucl. Sci. 62(6), 2547–2554 (2015)
Biswas, A., Soundararajan, N., Mukherjee, S.S., Gurumurthi, S.: Quantized AVF: a means of capturing vulnerability variations over small windows of time. In: IEEE Workshop on Silicon Errors in Logic-System Effects (2009)
Kaliorakis, M., Gizopoulos, D., Canal, R., Gonzalez, A.: MeRLiN: exploiting dynamic instruction behavior for fast and accurate microarchitecture level reliability assessment. In: Proceedings of the 44th Annual International Symposium on Computer Architecture, pp. 241–254 (2017)
Venkatagiri, R., et al.: Gem5-Approxilyzer: an open-source tool for application-level soft error analysis. In: Proceedings of the 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, pp. 214–221. IEEE (2019)
Jiao, J., Juan, D.-C., Marculescu, D., Fu, Y.: Exploiting component dependency for accurate and efficient soft error analysis via Probabilistic Graphical Models. Microelectron. Reliab. 55(1), 251–263 (2015)
Lee, J., Shrivastava, A.: A compiler optimization to reduce soft errors in register files. ACM Sigplan Not. 44(7), 41–49 (2009)
Rehman, S., Shafique, M., Kriebel, F., Henkel, J.: Reliable software for unreliable hardware: embedded code generation aiming at reliability. In: Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis, pp. 237–246. IEEE (2011)
Binkert, N., et al.: The gem5 simulator. SIGARCH Comput. Archit. News 39(2), 1–7 (2011)
Chatzidimitriou, A., et al.: Demystifying soft error assessment strategies on arm CPUs: microarchitectural fault injection vs. neutron beam experiments. In: Proceedings of the 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, pp. 26–38. IEEE (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Qiu, H., Lin, BT., Olowogemo, S.A., Robinson, W.H., Limbrick, D.B. (2021). Evaluating Soft Error Mitigation Trade-offs During Early Design Stages. In: Hochberger, C., Bauer, L., Pionteck, T. (eds) Architecture of Computing Systems. ARCS 2021. Lecture Notes in Computer Science(), vol 12800. Springer, Cham. https://doi.org/10.1007/978-3-030-81682-7_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-81682-7_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-81681-0
Online ISBN: 978-3-030-81682-7
eBook Packages: Computer ScienceComputer Science (R0)