skip to main content
research-article

Improving Fault Tolerance for FPGA SoCs through Post-Radiation Design Analysis

Published: 30 September 2024 Publication History

Abstract

FPGAs have been shown to operate reliably within harsh radiation environments by employing single-event upset (SEU) mitigation techniques, such as configuration scrubbing, triple-modular redundancy, error correction coding, and radiation aware implementation techniques. The effectiveness of these techniques, however, is limited when using complex system-level designs that employ complex I/O interfaces with single-point failures. In previous work, a complex SoC system running Linux applied several of these techniques only to obtain an improvement of 14\(\times\) in mean time to failure (MTTF). A detailed post-radiation fault analysis found that the limitations in reliability were due to the DDR interface, the global clock network, and interconnect. This article applied a number of design-specific SEU mitigation techniques to address the limitations in reliability of this design. These changes include triplicating the global clock, optimizing the placement of the reduction output voters and input flip-flops, and employing a mapping technique called “striping.” The application of these techniques improved MTTF of the mitigated design by a factor of 1.54\(\times\) and thus provides a 22.8X\(\times\) MTTF improvement over the unmitigated design. A post-radiation fault analysis using BFAT was also performed to find the remaining design vulnerabilities.

References

[1]
2001. Measurement and Reporting of Alpha Particles and Terrestrial Cosmic Ray-Induced Soft Errors in Semiconductor Devices. Technical Report. JEDEC, New York, NY. JESD89.
[3]
Luis Alberto Aranda, Nils Johan Wessman, Lucana Santos, Alfonso Sánchez-Macián, Jan Andersson, Roland Weigand, and Juan Antonio Maestro. 2020. Analysis of the critical bits of a RISC-V processor implemented in an SRAM-based FPGA for space applications. Electronics (Switzerland) 9, 1 (2020). Article 20799292. DOI:
[4]
Buildroot Association. [n. d.]. Buildroot Making Embedded Linux Easy. Retrieved from https://buildroot.org/
[5]
Marco Bellato, P. Bernardi, D. Bortolato, A. Candelori, M. Ceschia, A. Paccagnella, M. Rebaudengo, M. S. Reorda, M. Violante, and P. Zambolin. 2004. Evaluating the effects of SEUs affecting the configuration memory of an SRAM-based FPGA. In Proceedings Design, Automation and Test in Europe Conference and Exhibition, Vol. 1, 584–589. DOI:
[6]
Cody Brewer, Nicholas Franconi, Robin Ripley, Alessandro Geist, Travis Wise, Sebastian Sabogal, Gary Crum, Sabrena Heyward, and Christopher Wilson. 2020. NASA SpaceCube intelligent multi-purpose system for enabling remote sensing, communication, and navigation in mission architectures. In Proceedings of the Small Satellite Conference 2020.
[7]
BYUCCL. [n. d.]. Bitstream Fault Analysis Tool. Retrieved December 26, 2022 from https://github.com/byuccl/bfat
[8]
Matthew Cannon, A. Keller, and M. Wirthlin. 2018. Improving the effectiveness of TMR designs on FPGAs with SEU-aware incremental placement. In Proceedings of the IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM ’18). 141–148. DOI:
[9]
Matthew J. Cannon, A. M. Keller, H. C. Rowberry, C. A. Thurlow, A. Pérez-Celis, and M. J. Wirthlin. 2019. Strategies for removing common mode failures from TMR designs deployed on SRAM FPGAs. IEEE Transactions on Nuclear Science 66, 1 (Jan 2019), 207–215. DOI:
[10]
Fulvio Corno, M. S. Reorda, and G. Squillero. 2000. RT-level ITC’99 benchmarks and first ATPG results. IEEE Design & Test of Computers 17, 3 (2000), 44–53. DOI:
[11]
Paul Graham, Michael Caffrey, Jason Zimmerman, D. Eric Johnson, Prasanna Sundararajan, and Cameron Patterson. 2003. Consequences and categories of SRAM FPGA configuration SEUs. Proceedings of the 5th Annual International Conference Military Aerospace Programmable Logic Devices.
[12]
Ammon Gruwell, P. Zabriskie, and M. Wirthlin. 2016. High-speed FPGA configuration and testing through JTAG. In Proceedings of the 2016 IEEE AUTOTESTCON, 218–225. DOI:
[13]
Matt Hamblen. 2020. NASA Mars rover perseverance launches on time Thursday to find evidence of life on red planet. Retrieved December 26, 2022 from https://www.fierceelectronics.com/electronics/nasa-mars-rover-perseverance-launches-thursday-to-find-evidence-life-red-planet
[14]
Yoshihiro Ichinomiya, S. Tanoue, M. Amagasaki, M. Iida, M. Kuga, and T. Sueyoshi. 2010. Improving the robustness of a softcore processor against SEUs by using TMR and partial reconfiguration. In Proceedings of the 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 47–54. DOI:
[15]
Florent Kermarrec, Sébastien Bourdeauducq, Hannah Badier, and Jean-Christophe Le Lann. 2019. LiteX: An open-source SoC builder and library based on Migen Python DSL. In Proceedings of the Oregon Society of Dermatology Associates (OSDA ’19), Colocated with DATE 2019 Design Automation and Test in Europe.
[16]
Paul. W. Lisowski, C. D. Bowman, G. J. Russell, and S. A. Wender. 1990. The Los Alamos national laboratory spallation neutron sources. Nuclear Science and Engineering 106, 2 (1990), 208–218. DOI:
[17]
LiteX-Hub. [n. d.]. Linux on LiteX VexRiscv. Retrieved February 3, 2022 from https://github.com/litex-hub/linux-on-litex-vexriscv
[18]
Shih-Fu Liu, G. Sorrenti, P. Reviriego, F. Casini, J. A. Maestro, M. Alderighi, and H. Mecha. 2012. Comparison of the susceptibility to soft errors of SRAM-based FPGA error correction codes implementations. IEEE Transactions on Nuclear Science 59, 3 (2012), 619–624. DOI:
[19]
Thandassery S. Nidhin, Anindya Bhattacharyya, R. P. Behera, and T. Jayanthi. 2018. A review on SEU mitigation techniques for FPGA configuration memory. IETE Technical Review 35, 2 (2018), 157–168. DOI:
[20]
Patrick Ostler, Michael Caffrey, Derrick Gibelyou, Paul Graham, Keith Morgan, Brian Pratt, Heather Quinn, and Michael Wirthlin. 2010. SRAM FPGA reliability analysis for harsh radiation environments. Nuclear Science, IEEE Transactions on 56 (Jan. 2010), 3519–3526. DOI:
[21]
Heather Quinn, Paul S. Graham, Keith Morgan, Jim Krone, Michael P. Caffrey, and Michael J. Wirthlin. 2008. An introduction to radiation-induced failure modes and related mitigation methods for Xilinx SRAM FPGAs. In Proceedings of the ERSA, 139–145.
[22]
Heather Quinn, William H. Robinson, Paolo Rech, Miguel Aguirre, Arno Barnard, Marco Desogus, Luis Entrena, Mario Garcia-Valderas, Steven M. Guertin, David Kaeli, Fernanda Lima Kastensmidt, Bradley T. Kiddie, Antonio Sanchez-Clemente, Matteo Sonza Reorda, Luca Sterpone, and Michael Wirthlin. 2015. Using benchmarks for radiation testing of microprocessors and FPGAs. IEEE Transactions on Nuclear Science 62, 6 (2015), 2547–2554.
[23]
Felix Siegle, Tanya Vladimirova, Jørgen Ilstad, and Omar Emam. 2015. Mitigation of radiation effects in SRAM-based FPGAs for space applications. ACM Computing Surveys (CSUR) 47, 2 (2015), 1–34.
[24]
Alexander Sirotkin. 2011. Roll your own embedded Linux system with buildroot. Linux Journal 2011, 206 (2011), 7.
[25]
Dallin Skouson, Andrew Keller, and Michael Wirthlin. 2020. Netlist analysis and transformations using SpyDrNet. In Proceedings of the Python in Science Conference, 41–47.
[26]
SpinalHDL. [n. d.]. VexRiscv. Retrieved December 26, 2022 from https://github.com/SpinalHDL/VexRiscv
[27]
Andrew Elbert Wilson, Nathan Baker, Ethan Campbell, Jackson Sahleen, and Michael Wirthlin. 2023a. Post-radiation fault analysis of a high reliability FPGA Linux SoC. In Proceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA ’23). ACM, New York, NY, 123–133. DOI:
[28]
Andrew E. Wilson, Sam Larsen, Christopher Wilson, Corbin Thurlow, and Michael Wirthlin. 2021. Neutron radiation testing of a TMR VexRiscv soft processor on SRAM-based FPGAs. IEEE Transactions on Nuclear Science 68, 5 (2021), 1054–1060. DOI:
[29]
Andrew E. Wilson, C Thurlow, and M. Wirthlin. 2021. Fault injection testing of fault tolerant RISC-V soft processors on Xilinx SRAM-based FPGAs. Journal of Research and Educational Research Evaluation 39, 1 (Aprile 2021), 356–361.
[30]
Andrew E. Wilson, Michael Wirthlin, and Nathan G. Baker. 2023b. Neutron radiation testing of RISC-V TMR soft processors on SRAM-based FPGAs. IEEE Transactions on Nuclear Science 70, 4 (2023), 603–610. DOI:
[31]
Michael Wirthlin. 2015. High-reliability FPGA-based systems: space, high-energy physics, and beyond. Proceedings of the IEEE 103, 3 (2015), 379–389. DOI:
[32]
Michael Wirthlin, David Lee, Gary Swift, and Heather Quinn. 2014. A method and case study on identifying physically adjacent multiple-cell upsets using 28-nm, interleaved and SECDED-protected arrays. IEEE Transactions on Nuclear Science 61, 6 (2014), 3080–3087.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Reconfigurable Technology and Systems
ACM Transactions on Reconfigurable Technology and Systems  Volume 17, Issue 3
September 2024
434 pages
EISSN:1936-7414
DOI:10.1145/3613592
  • Editor:
  • Deming Chen
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 September 2024
Online AM: 19 July 2024
Accepted: 02 June 2024
Revised: 11 April 2024
Received: 02 September 2023
Published in TRETS Volume 17, Issue 3

Check for updates

Author Tags

  1. FPGA
  2. TMR
  3. RISC-V
  4. soft processor
  5. radiation testing
  6. fault injection
  7. fault analysis
  8. reliability

Qualifiers

  • Research-article

Funding Sources

  • I/UCRC Program of the National Science Foundation
  • Los Alamos Neutron Science Center (LANSCE)

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 260
    Total Downloads
  • Downloads (Last 12 months)260
  • Downloads (Last 6 weeks)26
Reflects downloads up to 15 Jan 2025

Other Metrics

Citations

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media