Skip to main content
Log in

A Fine-Grained Software-Implemented DMA Fault Tolerance for SoC Against Soft Error

  • Published:
Journal of Electronic Testing Aims and scope Submit manuscript

Abstract

In system-on-chips (SoCs), DMA, as a peripheral module, plays an important role in data transmission. However, the structure shrinking of SoC leads to its proneness to radiation-induced soft errors, especially for DMA. This paper presents a fine-grained software-implemented fault tolerance for SoC, named DCRH, to enhance the reliability of DMA against soft errors. DCRH achieves fine-grained selective fault tolerance, protecting DMA without interfering other modules of SoC. Furthermore, it is transparent to the user application because it performs on driver layer. In this paper, we present our fault source analysis for DMA based on Xilinx Zynq-7010 SoC and the detailed design of DCRH. The method is then applied to bare-metal MicroZed so that a DCRH-enhanced DMA driver is developed. Finally, SSIFFI is engaged in the simulated DMA fault injection experiments to validate DCRH. The experimental results prove that DCRH can achieve high fault coverage for DMA, above 97%, with stable performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. ARM Limited. (2007) PrimeCell® DMA controller (PL330) technical reference manual. In: http://infocenter.arm.com/help/topic/com.arm.doc.ddi0424a/DDI0424A_dmac_pl330_r0p0_trm.pdf

  2. Beard RV (1971) Failure accommodation in linear systems through self-reorganization. Dissertation. In: Massachusetts institute of technology

    Google Scholar 

  3. Borkar S (2005) Designing reliable systems from unreliable components: the challenges of transistor variability and degradation. IEEE Micro 25(6):10–16

    Article  Google Scholar 

  4. Didehban M, Shrivastava A (2016) nZDC: A Compiler technique for near Zero Silent data Corruption. In Proc. 53rd ACM/EDAC/IEEE Design Automation Conference 48:1–48:6

  5. Döbel B, Härtig H, Engel M (2012) Operating system support for redundant multithreading. In: Proc. 10th ACM international conference on embedded software, vol 83, p 92

    Google Scholar 

  6. Du X, He C, Liu S, Zhang Y, Li Y, Xiong C, Tan P (2016) Soft error evaluation and vulnerability analysis in Xilinx Zynq-7010 system-on chip. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment 831:344–348

    Article  Google Scholar 

  7. Du X, Liu S, Luo D, Zhang Y, Du X, He C, Ren X, Yang W, Yuan Y (2017) Single event effects sensitivity of low energy proton in Xilinx Zynq-7010 system-on chip. Microelectron Reliab 71:65–70

    Article  Google Scholar 

  8. Du X, Luo D, Shi K, He C, Liu S (2018) FFI4SoC : a fine-grained fault injection framework for assessing reliability against soft error in SoC. Journal of Electronic Testing : Theory and Applications 34(1):15–25

    Article  Google Scholar 

  9. Faure F, Velazco R, Peronnard P (2006) Single-event-upset-like fault injection: a comprehensive framework. IEEE Trans Nucl Sci 52(6):2205–2209

    Article  Google Scholar 

  10. Huang KH, Abraham JA (1984) Algorithm-based fault tolerance for matrix operations. IEEE Transaction on Computers C-33:518–528

    Article  Google Scholar 

  11. Kapritsos M, Wang Y, Quema V, Clement A, Alvisi L, Dahlin M (2012) All about eve: execute-verify replication for multi-core servers. In Proc USENIX Conference on Operating Systems Design and Implementation:237–250

  12. Li D, Chen Z, Wu P, Vetter JS (2013) Rethinking algorithm-based fault tolerance with a cooperative software-hardware approach. In Proc. International Conference for High Performance Computing, Networking, Storage and Analysis

  13. Martin-Ortega A, Alvarez M, Esteve S, Rodriguez S, Lopez-Buedo S (2008) Radiation hardening of FPGA-based SoCs through self-reconfiguration and XTMR techniques. In proc. 4th Southern Conference on Programmable Logic:261–264

  14. Nicolaidis M (1999) Time redundancy based soft-error tolerance to rescue nanometer technologies. In Proc IEEE VLSI Test Symposium:86–94

  15. Reinhardt SK, Mukherjee SS (2000) Transient fault detection via simultaneous multithreading. ACM SIGARCH Computer Architecture News 28:25–36

    Article  Google Scholar 

  16. Shye A, Blomstedt J, Moseley T, Reddi VJ, Connors DA (2009) PLR: a software approach to transient fault tolerance for multicore architectures. IEEE Transaction on Dependable and Secure Computing 6(2):135–148

    Article  Google Scholar 

  17. da Silva MP, Obelheiro RR, Koslovski GP (2017) Adaptive Remus: adaptive checkpointing for Xen-based virtual machine replication. International Journal of Parallel, Emergent and Distributed Systems 32(4):348–367

    Article  Google Scholar 

  18. Wadden J, Lyashevsky A, Gurumurthi S, Sridharan V, Skadron K (2014) Real-world design and evaluation of compiler-managed GPU redundant multithreading. In Proc IEEE International Symposium on Computer Architecture:73–84

  19. Xilinx Inc. (2013) Zynq-7000 All Programmable SoC Overview, DS190 (v1.6), in http://www.xilinx.com/support/documentation/data_sheets/ds190-Zynq-7000-Overview.pdf

  20. Xilinx Inc. (2015) Zynq-7000 all programmable SoC technical reference manual. In: http://www.xilinx.com/support/documentation/user_guides/ug585-Zynq-7000-TRM.pdf

Download references

Acknowledgments

This work was supported by the Chinese National Natural Science Foundation (grant number 11575138), the Industrial PR project of Shaanxi Province (grant number 2013 K06-20) and the Fundamental Research Funds for the Central Universities in China (grant number XJJ2015122).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaozhi Du.

Additional information

Responsible Editor: M. Abadir

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Du, X., Luo, D., He, C. et al. A Fine-Grained Software-Implemented DMA Fault Tolerance for SoC Against Soft Error. J Electron Test 34, 717–733 (2018). https://doi.org/10.1007/s10836-018-5757-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10836-018-5757-2

Keywords

Navigation