Elsevier

Microelectronics Reliability

Volume 83, April 2018, Pages 84-90
Microelectronics Reliability

Modular fault tolerant processor architecture on a SoC for space

https://doi.org/10.1016/j.microrel.2018.02.011Get rights and content

Abstract

Due to configurability feature and increasingly complex architecture, FPGAs have brought advantages to many applications such as avionics and safety critical aerospace, allowing in system reconfiguration after launch. Commercial FPGAs suffer from radiation-induced failures, which are provoked by high-energy particles in space; for this reason, fault tolerant techniques are necessary to harden these devices. This paper presents a design of a fault tolerant multicore processor architecture based on a novel modular voting strategy that fits in FPGAs and System-on-Chip (SoC) devices with an even number of processors. This architecture is implemented within a Commercial off-the-shelf (COTS) SoC that will allow to be used safely in space missions. To harden the fault tolerance of the embedded multicore processor architecture different fault tolerance techniques are combined.

Introduction

Soft errors induced by radiation are a major issue for electronic systems [1]. In critical applications, techniques to mitigate the errors are used to ensure that they do not compromise system reliability [2]. These techniques can be applied at technology, circuit and system level or combined to provide a cross layer solution. In all cases, protecting the circuit implies area and power overheads.

SRAM-based logic devices such as SoC or FPGAs are susceptible to SEUs and functional interruptions (SEFI) in harsh radiation environments, such as space [3]. Space community has studied how SEEs manifest and affect FPGAs [[3], [4], [5], [6], [7]], and they have found that the most common SEE for Xilinx FPGAs is the single-event upset (SEU) [8].

SEUs are possible in the configuration and user memories, and the embedded cores of SRAM-based FPGAs. Several mitigation techniques have been used in order to maintain the functionality of the design, after SEUs are detected and corrected.

Unfortunately, most of the works on SRAM-based FPGAs or SoC focus only on the device radiation tolerance [6,7], rather than on the design. In fact, it is common to study simple functional units [9] (flip-flops, look-up tables, routing resources) or modules (Block RAMs, hard macros), which are essential to understand the problem, but which reflect only small portions of real applications and neglect overall possible design-dependent effects [10].

To enhance FPGAs functionality, embedded processors, either soft-cores using reconfigurable logic of FPGA, or built-in hard-cores can be included. In fact, there is an increasing need to support processors in FPGAs [11], especially in the space industry, because commercial hard processors cannot be modified to include specific fault tolerant techniques. On the other hand, rad-hard components, especially designed for space, are more expensive and have a larger time-to-market when compared to their COTS equivalent. Besides the cost, rad-hard processors lag several generations behind COTS, in terms of both performance and power, as it is summarized in a survey of processors for space in [12].

Space community has invested significant effort in identifying and deploying methods and techniques for exploiting the advantages of FPGAs within a radiation, harsh space environment [2,3]. This improvement is achieved through methods such as configuration scrubbing, N-Modular Redundancy (NMR), and Error Correction Coding (ECC).

Recent characterizations of advanced commercial foundries show that Total Ionizing Dose (TID) and Single Event Latchup (SEL) tolerance of commercial processes have favorable performance trends, making single event upset (SEU) the primary problem preventing the design of a low power, high speed and radiation hardened computer system [13].

Different techniques can be used to reduce or even to remove the negative effects that radiation induces in FPGAs when implementing a multicore processor system [14]. These techniques include temporal redundancy, deadlock-free finite state machines, Error Correcting Codes or watchdog timers among others. Since FPGAs are intended to host soft processors IPs and digital designs inside, most of the mitigation techniques for digital design and SoC are readily applicable to FPGAs as well. Additionally, some techniques specific for FPGAs, as Triple Modular Redundancy (TMR) at local, global or large-grain level, voters in TMR data-paths, scrubbing of configuration memory and embedded processor protection, among others, can be used.

NMR is one of the radiation effect mitigation techniques considered in this work. Using processor cores in NMR on a configurable device provides a practical solution to Single Event Effects which is low cost and offers flexibility to be reconfigured and easily developed.

This paper presents SEU mitigation and recovery techniques for a hard-core processor architecture using NMR. The remainder of the present paper is organized as follows: Fault tolerant architecture proposal is presented in Section 2. The processor operation monitoring is presented in Section 3. In Section 4 the voter overhead is analyzed. Finally, the fault injection campaign and results are presented in Section 5.

Section snippets

Fault tolerant processor architecture proposal

Using reliable processors with reconfigurable devices for space applications is an interesting option, especially in nanosatellite missions, due to their advantages such as significant performance, low power and cost, versatility, etc. To implement such a space processing system, COTS components are used due to the advantages commented in Section 1.

In addition to selecting the type of redundancy, it is necessary to select the right device to implement the multicore processor architecture. Major

Processors operation monitoring

Each pair of processors (P0/P1 and P2/P3) is placed in a different device with its corresponding P-Voter that controls the execution of both (see Fig. 1). The P-Voter compares both signatures and, if they mismatch, it sends a STOP signal to both processors P0/P1 (P2/P3) to interrupt them. As the P-Voter does not know which processor is wrong, it sends a message to the P-Voter placed in the other device, to receive the signatures of processors P2/P3 (P0/P1) and to stop their execution. Once the

Voter overhead

In any NMR system, the amount of useful work done is less than the corresponding non-replicated system. The voting process done in a NMR system introduces some overhead that reduces the system throughput.

In [18], different voting concepts and a trade-off to determine the optimal voting frequency are presented (throughput versus reliability). This overhead is made up of many different components and checking process that are analyzed in this research.

The amount of work done between votes

Implementation and test

In order to implement the proposed design, there were used two ZC702 prototyping boards [23], which includes a XQ7Z020CL484 device each. The communication between boards (Fig. 3(a) was carried out through two ribbon cables specifically made for this purpose.

Xilinx Vivado 2016.2 was used to implement the proposed design. Table 2 shows the programmable logic resource usage in terms of area utilization (FFs, LUTs and memory tiles) for each device. Fig. 7 shows the implementation of the proposed

Conclusions

We propose a fault tolerant architecture that fits in FPGAs and SoC devices with an even number of processors. This architecture is based in a novel modular voting strategy where the voter mechanism is distributed in two devices. This voter mechanism provides processor operation monitoring using an IP specifically created for this purpose. This architecture has been translated into a Zynq-7000 device to be adapted to Cortex-A9 hard processors. However, the low resource consumption of the

References (26)

  • N. Kanekawa et al.

    Dependability in Electronic Systems: Mitigation of Hardware Failures, Soft Errors, and Electro-Magnetic Disturbances

    (2010)
  • M. Nicolaidis

    Design for soft error mitigation

    IEEE Trans. Device Mater. Reliab.

    (2005)
  • M. Wirthlin

    FPGAs operating in a radiation environment: lessons learned from FPGAs in space

    J. Instrum.

    (2013)
  • R. Ginosar

    A survey of processors for space, Data Systems in Aerospace (DASIA)

    Eur. Secur.

    (2012)
  • D.S. Lee et al.

    Single-event characterization of the 28 nm Xilinx Kintex-7 field-programmable gate array under heavy ion irradiation

    IEEE Radiation Effects Data Workshop

    (2014)
  • G. Allen et al.

    Virtex-4QV static SEU characterization summary, Xilinx radiation test consortium

    Tech. Rep.

    (2008)
  • H. Quinn et al.

    Validation techniques for fault emulation of SRAM-based FPGAs

    IEEE Trans. Nucl. Sci.

    (2015)
  • J. George et al.

    Single event upsets in Xilinx Virtex-4 FPGA devices

    Proc. Radiation Data Workshop of the Nuclear and Space Radiation Effects Conference

    (2006)
  • D.M. Hiemstra et al.

    Single event upset characterization of the Virtex-4 field programmable gate array using proton irradiation

    Proc. IEEE Radiation Effects Data Workshop

    (2006)
  • R. Giordano et al.

    Layout and radiation tolerance issues in high-speed links

    IEEE Trans. Nucl. Sci.

    (2015)
  • P. Yiannacouras et al.

    The microarchitecture of FPGA-based soft processors

    Proceedings of the International Conference On Compilers, Architectures And Synthesis For Embedded Systems

    (2005)
  • M.L. Alles

    Radiation hardness of FDSOI and FINFET technologies

  • D. Czajkowski et al.

    Space Micro Ultra Low-Power Space Computer Leveraging Embedded SEU Mitigation, IEEEAC Paper #1065, Updated

    (2003)
  • Cited by (3)

    This work was supported by the Spanish Ministry of Economy and Competitiveness under Grant ESP2014-54505-C2.

    View full text