Concurrent error detection for finite state machines implemented with embedded memory blocks of SRAM-based FPGAs,☆☆

https://doi.org/10.1016/j.micpro.2008.03.008Get rights and content

Abstract

We propose a cost-efficient concurrent error detection (CED) scheme for finite state machines (FSMs) designed for implementation with embedded memory blocks (EMBs) available in today’s SRAM-based FPGAs. The proposed scheme is proven to detect each permanent or transient fault associated with a single input or output of any component of the circuit that results in its incorrect state or output. The experimental results obtained using our proprietary FSM synthesis tool show that despite the heterogeneous structure of the proposed CED scheme, the overhead is very low. For the examined benchmark circuits, the circuitry overhead in terms of extra EMBs is in the range of 6.3–56.3%, with an average value of 27.2%, whereas the combined overhead (EMBs and logic cells) calculated under pessimistic assumptions is in the range of 20.7–63.8%, with an average value of 32.2%. This compares favorably with the earlier proposed solutions applicable to conventional FSM designs based on gates and flip–flops for which an overhead exceeding 100% is quite typical.

Introduction

As technology advances to the deep submicron level, digital circuits have become increasingly more susceptible to faults, especially soft (transient) faults induced by various types of radiation. Error failure rates caused by such faults will soon become unacceptable even for mainstream commercial applications [12]. Therefore, there is an increasing interest in designing digital systems so that to make them fault-tolerant, i.e. to protect them against faults that occur during normal operation.

There are different ways to provide fault tolerance. Some of these methods are based on the concept of error masking and rely on massive hardware redundancy. Such methods, for example TMR (Triple Modular Redundancy), are very costly and can be afforded only for very critical applications. Alternative methods are based on on-line detection of errors and taking an appropriate “recovery” action (for example, precise identification of the faulty component and its replacement with spare resources available in the system). The key part of the latter techniques is effective concurrent error detection (CED).

Various CED schemes and the corresponding design techniques have been proposed for digital circuits and systems. Most of these schemes rely on information and hardware redundancy. However, schemes based on time redundancy, such as recomputation with shifted operands, alternating logic or multiple sampling of the outputs, can support error detection. In microprocessor-based systems, software-oriented concurrent error detection techniques can also be employed.

Most hardware redundancy based design techniques for concurrent error detection in digital circuits and systems presented in the literature assume that at some stage of design, a circuit is represented by a network of gates and flip–flips (or equivalent Boolean formulas) and that such a representation is then mapped onto standard cells. Under this assumption, the CED schemes and the corresponding design techniques have been proposed for:

  • general combinational logic blocks and sequential subcircuits (implementing finite state machines – FSMs) [1], [6], [13], [14], [19], [27], [38], [40],

  • specific functional units such as adders and ALUs [16], [28].

Other CED techniques have been proposed for alternative circuit implementations, in particular for:

  • sequential circuits operating as microprogrammed control units [18], [39],

  • PLA-based implementations of combinational and sequential circuits [10].

A number of CED schemes and design techniques have also been proposed for circuits implemented with SRAM-based FPGAs. Such solutions are of special importance for the following reasons:

  • FPGA are becoming an implementation technology of choice for a large class of today’s digital systems, especially when low non-recurring engineering cost, short time-to-market or flexibility are crucial,

  • SRAM-based FPGAs are particularly vulnerable to soft (transient) faults occurring during normal system operation.

The major source of soft (transient) faults and resulting errors in SRAM-based FPGAs are single event upsets (SEUs) induced by external radiation. SEUs affect both functional memory (flip–flops, embedded memory blocks) and configuration memory of an FPGA. Faults that affect the configuration memory can modify the function of LUTs and other programmable components and can also change the circuit interconnection structure [3], [4], [8], [35]. SEUs affecting the FPGA configuration memory are sometimes classified as permanent faults or, more precisely, recoverable permanent faults [3], as they can be corrected by reloading the configuration bit-stream of the FPGA device.

Memory cells, especially those in dense memory arrays, are more susceptible to radiation-induced faults than other logic components. Therefore, circuits implemented with SRAM-based FPGAs, containing a large number of memory arrays (functional and configuration memory) are more vulnerable to soft faults that occur during normal circuit operation than circuits implemented with other technologies.

One way to protect FPGA-based designs against faults that occur during normal circuit operation is to use radiation-hardened versions of FPGA devices which are based on special design of memory cells [30]. Such high-reliability components are, however, very expensive and not always available for the newest FPGA products. Therefore, solutions that are applicable to standard FPGA devices are of primary importance.

Various dedicated architectures and design techniques to deal with errors that occur during normal operation of FPGA-based systems, especially with SEU-induced errors, have been proposed. These include:

  • triple modular redundancy (TMR), usually combined with readback and partial correction of configuration memory or full reloading of the configuration memory (the process usually referred to as scrubbing) [5], [11], [35],

  • periodic readback for checking the state of the user memory and sections of configuration memory (single frames) augmented with CRC checksums, combined with partial reloading of configuration memory and user memory [2], [3],

  • duplication with comparison (DWC), combined with timing redundancy for fault identification or with reconfiguration based on precompiled configurations [17], [20],

  • implementation of schemes based on concurrent error detection/correction codes using FPGAs [24], [25], [37],

  • adjustments of on-line testing techniques developed for circuits represented by a network of gates and flip–flops (typically implemented with standard cells) through post-synthesis modification of the network [7], [8],

  • dedicated synthesis procedure for self-checking logic implemented with 2-LUT programmable logic blocks (the two LUTs produce complementary outputs) [23],

  • using special placement and routing algorithms [33],

  • using new basic FPGA components (SRAM-cells) optimized for soft errors [34],

  • using new structures of FPGA logic and routing components that facilitate checking the FPGA configuration bits, sometimes combined with appropriate coding techniques for error detection and correction [36], [41].

In this paper, we deal with concurrent error detection (CED) for sequential circuits – finite state machines (FSMs) implemented using embedded memory blocks of FPGAs. The development of CED schemes for such circuits is interesting for the following reasons:

  • embedded memory blocks are available in almost all today’s SRAM-based FPGAs,

  • embedded memory blocks can be effectively used to implement FSMs [9],

  • architectural features of FPGAs with embedded memory make the implementation of CED in such devices relatively cost-efficient, especially when compared with other types of devices.

The paper is structured as follows: In Section 2, concurrent error detection schemes proposed for various structures of sequential circuits are described. Section 3 presents specific problems associated with an FSM implementation using embedded memory blocks of an FPGA. The proposed concurrent error detection scheme for such an implementation is described in detail in Section 4. In the subsequent section, it is proven that the proposed scheme guarantees the detection of all faults in the assumed fault model. Then, in Section 6, for a set of benchmark FSMs, the cost of implementing the proposed CED scheme in terms of extra logic components (embedded memory blocks and programmable logic components) is estimated and compared with the earlier proposed solutions applicable to conventional FSM designs based on gates and flip–flops. Finally, Section 7 summarizes our contribution.

Section snippets

Sequential circuits with concurrent error detection

Synthesis of sequential circuits – finite state machines (FSMs) with concurrent error detection has been discussed by many authors. As mentioned in the previous section, most CED techniques proposed for such circuits assume that at some stage of the design process, the circuit is represented by a network of gates and flip–flips and add extra components to this representation, so that to provide the circuit with CED [1], [6], [13], [14], [19], [26], [27], [40]. Concurrent error detection

Problem statement

Assuming that both the memory part and the address modifier part of the circuit in Fig. 1b are implemented with embedded memory blocks, the notation used in Fig. 1b is somewhat misleading. Therefore, in the following part of the paper, we use the following terms:

  • memory G, when referring to the implementation of the address modifier,

  • memory H, when referring to the implementation of the memory part of the circuit in Fig. 1b.

Symbols G and H, introduced here to distinguish the two memory

Proposed concurrent error detection scheme

The fundamental assumption of low-cost, underlying the development of a concurrent error detection scheme for the circuit of Fig. 2, implies that error detection mechanisms can only add to the width of the memory word, and not to the number of address lines. This requirement is justified by the following observations:

  • an extension of the address width by just one bit doubles the size of the required memory;

  • the address space of EMBs in today’s FPGAs is quite limited – even with all possible

Effectiveness of fault detection

As was mentioned in Section 3, we assume a combined functional–structural fault model. In this model, the set of target faults consists of faults that result from both permanent and transient phenomena (in particular, from SEUs) and manifest themselves as permanent or transient single-bit errors in the circuit, i.e. faults that produce – in some clock cycle – a single incorrect logic value at the input or output of some component in the functional part of the circuit in Fig. 2, eventually

Evaluation of overhead

The complexity of the examined implementations of FSMs and, in particular, an overhead associated with concurrent error detection is specified in terms of FPGA resources – EMBs and simple programmable logic components.

In the proposed concurrent error detection scheme, extra EMBs are required to:

  • extend the word of memory H, in most cases by 4 bits (or 5 bits if the address legality is checked); one bit can be saved if the address modifier is not fed by any next state variable (set Qc is empty);

Conclusion

We show that it is possible to design a cost-efficient concurrent error detection scheme for a sequential circuit implemented using embedded memory blocks available in today’s FPGAs. The proposed scheme is proven to detect each permanent or transient fault associated with a single input or output of any component of the circuit that results in its incorrect state or output. Such faults are detected with no latency. It should be emphasized that the set of detectable faults includes faults

Acknowledgement

The author would like to thank Dr. Grzegorz Borowik for providing a large database of detailed results of experiments on synthesis of FSMs intended for memory-based implementation.

References (41)

  • S. Almukhaizim et al.

    Entropy-driven parity-tree selection for low-overhead concurrent error detection in finite state machines

    IEEE Trans. CAD

    (2006)
  • Altera Application Note 357: Error Detection Using CRC in Altera FPGA Devices, July...
  • G.-H. Asadi, M.B. Tahoori, Soft error mitigation for SRAM-based FPGAs, in: Proc. IEEE VLSI Test Symp., 2005, pp....
  • M. Bellato et al., Evaluating the effects of SEUs affecting the configuration memory of an SRAM-based FPGA, in: Proc....
  • C. Bolchini, A. Miele, M.D. Santambrogio, TMR and Partial Dynamic Reconfiguration to mitigate SEU faults in FPGAs, in:...
  • C. Bolchini et al.

    Design of VHDL-based totally self-checking finite-state machine and data path descriptions

    IEEE Trans. VLSI Syst.

    (2000)
  • C. Bolchini, F. Salice, D. Sciuto, Designing self-checking FPGAs through error detection codes, in: Proc. IEEE Int....
  • C. Bolchini, F. Salice, D. Sciuto, R. Zavaglia, An integrated design approach for self-checking FPGAs, in: Proc. IEEE...
  • G. Borowik, B. Falkowski, T. Luba, Cost-efficient synthesis for sequential circuits implemented using embedded memory...
  • M. Boudjit, M. Nicolaidis, K. Torki, Automatic generation algorithms, experiments and comparisons of self-checking PLA...
  • C. Carmichael, Triple Module Redundancy Design Techniques for Virtex Series FPGA, Xilinx Application Note 197, March....
  • N. Cohen

    Soft error considerations for deep-submicron CMOS circuit applications

    Dig. IEDM Int. Electron Dev. Meet.

    (1999)
  • M. Damm, State assignment for detecting erroneous transitions in finite state machines, in: Proc. 10th EUROMICRO Conf....
  • D. Das, N.A. Touba, Synthesis of circuits with low-cost concurrent error detection based on Bose–Lin codes, in: Proc....
  • P. Drineas, Y. Makris, Non-intrusive concurrent error detection in FSMs through state/output compaction and monitoring...
  • S.S. Gorshe, B. Bose, A self-checking ALU design with efficient codes, in: Proc. IEEE VLSI Test Symp., 1996, pp....
  • W.-J. Huang, S. Mitra, E.J. McCluskey, Fast run-time fault location in dependable FPGA-based applications, in: Proc....
  • V.S. Iyengar et al.

    Concurrent fault detection in microprogrammed control units

    IEEE Trans. Comput.

    (1985)
  • N.K. Jha et al.

    Design and synthesis of self-checking VLSI circuits

    IEEE Trans. CAD

    (1993)
  • F.G.L. Kastensmidt

    Designing fault-tolerant techniques for SRAM-based FPGAs

    IEEE Des. Test Comput.

    (2004)
  • Cited by (10)

    • Hybrid time and hardware redundancy to mitigate SEU effects on SRAM-FPGAs: Case study over the MicroLAN protocol

      2014, Microelectronics Journal
      Citation Excerpt :

      The single upsets on bidirectional ports can cause unwanted effects in both propagated data and the direction of the data transfer, therefore the SEU testing of bidirectional ports is vital and should not be neglected [25]. In the past decade, various SEU mitigation techniques for FPGAs have been studied [3–5,8–10,15–20]. Besides the shielding approach, the fabrication process-based techniques, design-based methods, and recovery techniques are three major classes to mitigate space radiation effects on FPGAs.

    • Seu-secure parity prediction multiplier on sram-based FPGAs

      2014, Journal of Circuits, Systems and Computers
    • Optimised fault tolerant core-based ASIC design for SRAM

      2021, International Journal of Manufacturing Technology and Management
    • Security-Aware FSM Design Flow for Identifying and Mitigating Vulnerabilities to Fault Attacks

      2019, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
    • New Lightweight Architectures for Secure FSM Design to Thwart Fault Injection and Trojan Attacks

      2018, Journal of Electronic Testing: Theory and Applications (JETTA)
    • Trading-off error detection efficiency with implementation cost for sequential circuits implemented with FPGAs

      2012, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    View all citing articles on Scopus

    This work was supported by the Ministry of Science and Higher Education of Poland – Research Grant No. N517 003 32/0583 for 2007-2010.

    ☆☆

    This paper is a revised and significantly extended version of the paper presented at 10th EUROMICRO Conf. on Digital System Design, 2007.

    View full text