Modular fault tolerant processor architecture on a SoC for space☆
Introduction
Soft errors induced by radiation are a major issue for electronic systems [1]. In critical applications, techniques to mitigate the errors are used to ensure that they do not compromise system reliability [2]. These techniques can be applied at technology, circuit and system level or combined to provide a cross layer solution. In all cases, protecting the circuit implies area and power overheads.
SRAM-based logic devices such as SoC or FPGAs are susceptible to SEUs and functional interruptions (SEFI) in harsh radiation environments, such as space [3]. Space community has studied how SEEs manifest and affect FPGAs [[3], [4], [5], [6], [7]], and they have found that the most common SEE for Xilinx FPGAs is the single-event upset (SEU) [8].
SEUs are possible in the configuration and user memories, and the embedded cores of SRAM-based FPGAs. Several mitigation techniques have been used in order to maintain the functionality of the design, after SEUs are detected and corrected.
Unfortunately, most of the works on SRAM-based FPGAs or SoC focus only on the device radiation tolerance [6,7], rather than on the design. In fact, it is common to study simple functional units [9] (flip-flops, look-up tables, routing resources) or modules (Block RAMs, hard macros), which are essential to understand the problem, but which reflect only small portions of real applications and neglect overall possible design-dependent effects [10].
To enhance FPGAs functionality, embedded processors, either soft-cores using reconfigurable logic of FPGA, or built-in hard-cores can be included. In fact, there is an increasing need to support processors in FPGAs [11], especially in the space industry, because commercial hard processors cannot be modified to include specific fault tolerant techniques. On the other hand, rad-hard components, especially designed for space, are more expensive and have a larger time-to-market when compared to their COTS equivalent. Besides the cost, rad-hard processors lag several generations behind COTS, in terms of both performance and power, as it is summarized in a survey of processors for space in [12].
Space community has invested significant effort in identifying and deploying methods and techniques for exploiting the advantages of FPGAs within a radiation, harsh space environment [2,3]. This improvement is achieved through methods such as configuration scrubbing, N-Modular Redundancy (NMR), and Error Correction Coding (ECC).
Recent characterizations of advanced commercial foundries show that Total Ionizing Dose (TID) and Single Event Latchup (SEL) tolerance of commercial processes have favorable performance trends, making single event upset (SEU) the primary problem preventing the design of a low power, high speed and radiation hardened computer system [13].
Different techniques can be used to reduce or even to remove the negative effects that radiation induces in FPGAs when implementing a multicore processor system [14]. These techniques include temporal redundancy, deadlock-free finite state machines, Error Correcting Codes or watchdog timers among others. Since FPGAs are intended to host soft processors IPs and digital designs inside, most of the mitigation techniques for digital design and SoC are readily applicable to FPGAs as well. Additionally, some techniques specific for FPGAs, as Triple Modular Redundancy (TMR) at local, global or large-grain level, voters in TMR data-paths, scrubbing of configuration memory and embedded processor protection, among others, can be used.
NMR is one of the radiation effect mitigation techniques considered in this work. Using processor cores in NMR on a configurable device provides a practical solution to Single Event Effects which is low cost and offers flexibility to be reconfigured and easily developed.
This paper presents SEU mitigation and recovery techniques for a hard-core processor architecture using NMR. The remainder of the present paper is organized as follows: Fault tolerant architecture proposal is presented in Section 2. The processor operation monitoring is presented in Section 3. In Section 4 the voter overhead is analyzed. Finally, the fault injection campaign and results are presented in Section 5.
Section snippets
Fault tolerant processor architecture proposal
Using reliable processors with reconfigurable devices for space applications is an interesting option, especially in nanosatellite missions, due to their advantages such as significant performance, low power and cost, versatility, etc. To implement such a space processing system, COTS components are used due to the advantages commented in Section 1.
In addition to selecting the type of redundancy, it is necessary to select the right device to implement the multicore processor architecture. Major
Processors operation monitoring
Each pair of processors (P0/P1 and P2/P3) is placed in a different device with its corresponding P-Voter that controls the execution of both (see Fig. 1). The P-Voter compares both signatures and, if they mismatch, it sends a STOP signal to both processors P0/P1 (P2/P3) to interrupt them. As the P-Voter does not know which processor is wrong, it sends a message to the P-Voter placed in the other device, to receive the signatures of processors P2/P3 (P0/P1) and to stop their execution. Once the
Voter overhead
In any NMR system, the amount of useful work done is less than the corresponding non-replicated system. The voting process done in a NMR system introduces some overhead that reduces the system throughput.
In [18], different voting concepts and a trade-off to determine the optimal voting frequency are presented (throughput versus reliability). This overhead is made up of many different components and checking process that are analyzed in this research.
The amount of work done between votes
Implementation and test
In order to implement the proposed design, there were used two ZC702 prototyping boards [23], which includes a XQ7Z020CL484 device each. The communication between boards (Fig. 3(a) was carried out through two ribbon cables specifically made for this purpose.
Xilinx Vivado 2016.2 was used to implement the proposed design. Table 2 shows the programmable logic resource usage in terms of area utilization (FFs, LUTs and memory tiles) for each device. Fig. 7 shows the implementation of the proposed
Conclusions
We propose a fault tolerant architecture that fits in FPGAs and SoC devices with an even number of processors. This architecture is based in a novel modular voting strategy where the voter mechanism is distributed in two devices. This voter mechanism provides processor operation monitoring using an IP specifically created for this purpose. This architecture has been translated into a Zynq-7000 device to be adapted to Cortex-A9 hard processors. However, the low resource consumption of the
References (26)
- et al.
Dependability in Electronic Systems: Mitigation of Hardware Failures, Soft Errors, and Electro-Magnetic Disturbances
(2010) Design for soft error mitigation
IEEE Trans. Device Mater. Reliab.
(2005)FPGAs operating in a radiation environment: lessons learned from FPGAs in space
J. Instrum.
(2013)A survey of processors for space, Data Systems in Aerospace (DASIA)
Eur. Secur.
(2012)- et al.
Single-event characterization of the 28 nm Xilinx Kintex-7 field-programmable gate array under heavy ion irradiation
IEEE Radiation Effects Data Workshop
(2014) - et al.
Virtex-4QV static SEU characterization summary, Xilinx radiation test consortium
Tech. Rep.
(2008) - et al.
Validation techniques for fault emulation of SRAM-based FPGAs
IEEE Trans. Nucl. Sci.
(2015) - et al.
Single event upsets in Xilinx Virtex-4 FPGA devices
Proc. Radiation Data Workshop of the Nuclear and Space Radiation Effects Conference
(2006) - et al.
Single event upset characterization of the Virtex-4 field programmable gate array using proton irradiation
Proc. IEEE Radiation Effects Data Workshop
(2006) - et al.
Layout and radiation tolerance issues in high-speed links
IEEE Trans. Nucl. Sci.
(2015)
The microarchitecture of FPGA-based soft processors
Proceedings of the International Conference On Compilers, Architectures And Synthesis For Embedded Systems
Radiation hardness of FDSOI and FINFET technologies
Space Micro Ultra Low-Power Space Computer Leveraging Embedded SEU Mitigation, IEEEAC Paper #1065, Updated
Cited by (3)
Triple module redundancy reliability framework design based on heterogeneous multi-core processor
2021, Procedia Computer ScienceExamination on avionics system fault prediction technology based on ashy neural network and fuzzy recognition
2020, Journal of Intelligent and Fuzzy SystemsAdaptive-hybrid redundancy with error injection
2019, Electronics (Switzerland)
- ☆
This work was supported by the Spanish Ministry of Economy and Competitiveness under Grant ESP2014-54505-C2.