Skip to main content

Evaluating Soft Error Mitigation Trade-offs During Early Design Stages

  • Conference paper
  • First Online:
Architecture of Computing Systems (ARCS 2021)

Abstract

Protecting computation systems against soft errors is expensive. Unoptimized soft error mitigation schemes can cause area, power, and performance overheads. Therefore, efficient fault-tolerant design should be guided by assessing the cost of developing reliable systems. We present a method to quantify and evaluate trade-offs to protect the system. Using gem5Panalyzer, the toolset we developed to estimate vulnerability factors of microprocessors, we conducted sweeps of Program Vulnerability Factor (PVF) masking to collect the PVF responses to instruction-level masking. We evaluated the confidence in PVF estimations made by gem5Panalyzer with multiple benchmarks from the MiBench suite. Then, we analyzed PVF-masking sweep results. The sensitivity of vulnerability improvement to mitigation techniques varies with the types of applications. When the instruction-level masking effect is 90%, time-averaged PVF reductions of selected benchmarks range from a high of 67% to a low of 10%. The differences in PVF reduction inform designers whether it is worth improving the masking level. As the masking factor is correlated with the efforts to implement mitigations, our method can help to optimize system design choices.

This material is based upon work supported by the National Science Foundation (NSF) under Grant No. CNS-1629853 and CNS-1629839.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chatzidimitriou, A., Gizopoulos, D.: rACE: reverse-order processor reliability analysis. In: 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 1115–1120. IEEE (2020)

    Google Scholar 

  2. Biswas, A., et al.: Computing architectural vulnerability factors for address-based structures. In: 32nd International Symposium on Computer Architecture (ISCA 2005), pp. 532–543. IEEE (2005)

    Google Scholar 

  3. Fang, B., et al.: ePVF: an enhanced program vulnerability factor methodology for cross-layer resilience analysis. In: 2016 Proceedings of the 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, pp. 168–179. IEEE (2016)

    Google Scholar 

  4. Shafique, M., Rehman, S., Aceituno, P.V., Henkel, J.: Exploiting program-level masking and error propagation for constrained reliability optimization. In: Proceedings of the 50th Annual Design Automation Conference, pp. 1–9 (2013)

    Google Scholar 

  5. Cheng, E., et al.: Tolerating soft errors in processor cores using clear (cross-layer exploration for architecting resilience). IEEE Trans. Comput. Aided Des. Integr. Circ. Syst. 37(9), 1839–1852 (2017)

    Article  Google Scholar 

  6. Leem, L., et al.: ERSA: error resilient system architecture for probabilistic applications. In: 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010), pp. 1560–1565. IEEE (2010)

    Google Scholar 

  7. Qiu, H., et al.: Gem5Panalyzer: a light-weight tool for early-stage architectural reliability evaluation & prediction. In: 2020 IEEE 63rd International Midwest Symposium on Circuits and Systems (MWSCAS), pp. 482–485. IEEE (2020)

    Google Scholar 

  8. Guthaus, M.R., et al.: MiBench: a free, commercially representative embedded benchmark suite. In: Proceedings of the Fourth Annual IEEE International Workshop on Workload Characterization, pp. 3–14. IEEE (2001)

    Google Scholar 

  9. Sridharan, V., Kaeli, D.R.: Eliminating microarchitectural dependency from architectural vulnerability. In: 2009 IEEE 15th International Symposium on High Performance Computer Architecture, pp. 117–128. IEEE (2009)

    Google Scholar 

  10. Henkel, J., et al.: Reliable on-chip systems in the nano-era: lessons learnt and future trends. In: 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC), pp. 1–10. IEEE (2013)

    Google Scholar 

  11. Mukherjee, S.S., et al.: A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor. In: 2003 Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-36, pp. 29–40. IEEE (2003)

    Google Scholar 

  12. Quinn, H., et al.: Using benchmarks for radiation testing of microprocessors and FPGAs. IEEE Trans. Nucl. Sci. 62(6), 2547–2554 (2015)

    Article  Google Scholar 

  13. Biswas, A., Soundararajan, N., Mukherjee, S.S., Gurumurthi, S.: Quantized AVF: a means of capturing vulnerability variations over small windows of time. In: IEEE Workshop on Silicon Errors in Logic-System Effects (2009)

    Google Scholar 

  14. Kaliorakis, M., Gizopoulos, D., Canal, R., Gonzalez, A.: MeRLiN: exploiting dynamic instruction behavior for fast and accurate microarchitecture level reliability assessment. In: Proceedings of the 44th Annual International Symposium on Computer Architecture, pp. 241–254 (2017)

    Google Scholar 

  15. Venkatagiri, R., et al.: Gem5-Approxilyzer: an open-source tool for application-level soft error analysis. In: Proceedings of the 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, pp. 214–221. IEEE (2019)

    Google Scholar 

  16. Jiao, J., Juan, D.-C., Marculescu, D., Fu, Y.: Exploiting component dependency for accurate and efficient soft error analysis via Probabilistic Graphical Models. Microelectron. Reliab. 55(1), 251–263 (2015)

    Article  Google Scholar 

  17. Lee, J., Shrivastava, A.: A compiler optimization to reduce soft errors in register files. ACM Sigplan Not. 44(7), 41–49 (2009)

    Article  Google Scholar 

  18. Rehman, S., Shafique, M., Kriebel, F., Henkel, J.: Reliable software for unreliable hardware: embedded code generation aiming at reliability. In: Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis, pp. 237–246. IEEE (2011)

    Google Scholar 

  19. Binkert, N., et al.: The gem5 simulator. SIGARCH Comput. Archit. News 39(2), 1–7 (2011)

    Article  Google Scholar 

  20. Chatzidimitriou, A., et al.: Demystifying soft error assessment strategies on arm CPUs: microarchitectural fault injection vs. neutron beam experiments. In: Proceedings of the 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, pp. 26–38. IEEE (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hao Qiu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Qiu, H., Lin, BT., Olowogemo, S.A., Robinson, W.H., Limbrick, D.B. (2021). Evaluating Soft Error Mitigation Trade-offs During Early Design Stages. In: Hochberger, C., Bauer, L., Pionteck, T. (eds) Architecture of Computing Systems. ARCS 2021. Lecture Notes in Computer Science(), vol 12800. Springer, Cham. https://doi.org/10.1007/978-3-030-81682-7_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-81682-7_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-81681-0

  • Online ISBN: 978-3-030-81682-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics