Abstract
In system-on-chips (SoCs), DMA, as a peripheral module, plays an important role in data transmission. However, the structure shrinking of SoC leads to its proneness to radiation-induced soft errors, especially for DMA. This paper presents a fine-grained software-implemented fault tolerance for SoC, named DCRH, to enhance the reliability of DMA against soft errors. DCRH achieves fine-grained selective fault tolerance, protecting DMA without interfering other modules of SoC. Furthermore, it is transparent to the user application because it performs on driver layer. In this paper, we present our fault source analysis for DMA based on Xilinx Zynq-7010 SoC and the detailed design of DCRH. The method is then applied to bare-metal MicroZed so that a DCRH-enhanced DMA driver is developed. Finally, SSIFFI is engaged in the simulated DMA fault injection experiments to validate DCRH. The experimental results prove that DCRH can achieve high fault coverage for DMA, above 97%, with stable performance.
Similar content being viewed by others
References
ARM Limited. (2007) PrimeCell® DMA controller (PL330) technical reference manual. In: http://infocenter.arm.com/help/topic/com.arm.doc.ddi0424a/DDI0424A_dmac_pl330_r0p0_trm.pdf
Beard RV (1971) Failure accommodation in linear systems through self-reorganization. Dissertation. In: Massachusetts institute of technology
Borkar S (2005) Designing reliable systems from unreliable components: the challenges of transistor variability and degradation. IEEE Micro 25(6):10–16
Didehban M, Shrivastava A (2016) nZDC: A Compiler technique for near Zero Silent data Corruption. In Proc. 53rd ACM/EDAC/IEEE Design Automation Conference 48:1–48:6
Döbel B, Härtig H, Engel M (2012) Operating system support for redundant multithreading. In: Proc. 10th ACM international conference on embedded software, vol 83, p 92
Du X, He C, Liu S, Zhang Y, Li Y, Xiong C, Tan P (2016) Soft error evaluation and vulnerability analysis in Xilinx Zynq-7010 system-on chip. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment 831:344–348
Du X, Liu S, Luo D, Zhang Y, Du X, He C, Ren X, Yang W, Yuan Y (2017) Single event effects sensitivity of low energy proton in Xilinx Zynq-7010 system-on chip. Microelectron Reliab 71:65–70
Du X, Luo D, Shi K, He C, Liu S (2018) FFI4SoC : a fine-grained fault injection framework for assessing reliability against soft error in SoC. Journal of Electronic Testing : Theory and Applications 34(1):15–25
Faure F, Velazco R, Peronnard P (2006) Single-event-upset-like fault injection: a comprehensive framework. IEEE Trans Nucl Sci 52(6):2205–2209
Huang KH, Abraham JA (1984) Algorithm-based fault tolerance for matrix operations. IEEE Transaction on Computers C-33:518–528
Kapritsos M, Wang Y, Quema V, Clement A, Alvisi L, Dahlin M (2012) All about eve: execute-verify replication for multi-core servers. In Proc USENIX Conference on Operating Systems Design and Implementation:237–250
Li D, Chen Z, Wu P, Vetter JS (2013) Rethinking algorithm-based fault tolerance with a cooperative software-hardware approach. In Proc. International Conference for High Performance Computing, Networking, Storage and Analysis
Martin-Ortega A, Alvarez M, Esteve S, Rodriguez S, Lopez-Buedo S (2008) Radiation hardening of FPGA-based SoCs through self-reconfiguration and XTMR techniques. In proc. 4th Southern Conference on Programmable Logic:261–264
Nicolaidis M (1999) Time redundancy based soft-error tolerance to rescue nanometer technologies. In Proc IEEE VLSI Test Symposium:86–94
Reinhardt SK, Mukherjee SS (2000) Transient fault detection via simultaneous multithreading. ACM SIGARCH Computer Architecture News 28:25–36
Shye A, Blomstedt J, Moseley T, Reddi VJ, Connors DA (2009) PLR: a software approach to transient fault tolerance for multicore architectures. IEEE Transaction on Dependable and Secure Computing 6(2):135–148
da Silva MP, Obelheiro RR, Koslovski GP (2017) Adaptive Remus: adaptive checkpointing for Xen-based virtual machine replication. International Journal of Parallel, Emergent and Distributed Systems 32(4):348–367
Wadden J, Lyashevsky A, Gurumurthi S, Sridharan V, Skadron K (2014) Real-world design and evaluation of compiler-managed GPU redundant multithreading. In Proc IEEE International Symposium on Computer Architecture:73–84
Xilinx Inc. (2013) Zynq-7000 All Programmable SoC Overview, DS190 (v1.6), in http://www.xilinx.com/support/documentation/data_sheets/ds190-Zynq-7000-Overview.pdf
Xilinx Inc. (2015) Zynq-7000 all programmable SoC technical reference manual. In: http://www.xilinx.com/support/documentation/user_guides/ug585-Zynq-7000-TRM.pdf
Acknowledgments
This work was supported by the Chinese National Natural Science Foundation (grant number 11575138), the Industrial PR project of Shaanxi Province (grant number 2013 K06-20) and the Fundamental Research Funds for the Central Universities in China (grant number XJJ2015122).
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible Editor: M. Abadir
Rights and permissions
About this article
Cite this article
Du, X., Luo, D., He, C. et al. A Fine-Grained Software-Implemented DMA Fault Tolerance for SoC Against Soft Error. J Electron Test 34, 717–733 (2018). https://doi.org/10.1007/s10836-018-5757-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10836-018-5757-2