A method for evaluating fault coverage using simulated fault injection for digitalized systems in nuclear power plants

https://doi.org/10.1016/j.ress.2005.05.002Get rights and content

Abstract

The fault coverage for digital system in nuclear power plants is evaluated using a simulated fault injection method. Digital systems have numerous advantages, such as hardware elements share and hardware replication of the needed number of independent channels. However, the application of digital systems to safety-critical systems in nuclear power plants has been limited due to reliability concerns. In the reliability issues, fault coverage is one of the most important factors. In this study, we propose an evaluation method of the fault coverage for safety-critical digital systems in nuclear power plants. The system under assessment is a local coincidence logic processor for a digital plant protection system at Ulchin nuclear power plant units 5 and 6. The assessed system is simplified and then a simulated fault injection method is applied to evaluate the fault coverage of two fault detection mechanisms. From the simulated fault injection experiment, the fault detection coverage of the watchdog timer is 44.2% and that of the read only memory (ROM) checksum is 50.5%. Our experiments show that the fault coverage of a safety-critical digital system is effectively quantified using the simulated fault injection method.

Introduction

Modern technologies based on both digital hardware and advanced software algorithms are being rapidly developed and widely used. Due to the progress of instrumentation and control (I & C) technologies for process engineering modern digital technology is expected to significantly improve the performance and the safety of nuclear power plants.

However, the migration from analog to digital I & C systems within nuclear power plants has increased the complexity of such systems. The I & C systems that are being developed are computer-based, comprising digital hardware and software components. These systems perform complex functions that are essential to the safety-critical requirements of nuclear power plants. To prevent significant risks from arising, these systems must be dependable [1].

The development of a methodology for the probabilistic safety assessment (PSA) of digital I & C systems is a critical issue. Present PSA techniques are used to evaluate the relative effects of contributing events on system-level safety or reliability. In addition, PSA provides a unified means of assessing physical faults, recovery processes, contributing effects, human actions, and other events that have a high degree of uncertainty [2]. However, conventional PSA techniques cannot adequately evaluate all features of digital systems. Kang and Sung found that fault coverage, common cause failures, and software reliability are three most critical factors in the safety assessment of digital systems [3]. Among these factors, this study focuses on evaluating method of the fault coverage of actual nuclear power plant digital system.

The probability of a fault being properly removed from a fault-tolerant system is referred to as fault coverage. The fault coverage value crucially affects the dependability of a system. Thus, fault coverage is one of the most critical factors in a PSA. There are mathematical and qualitative expressions for the fault coverage. Mathematically, the fault coverage C is defined as the fault processed correctly divided by the fault existence.C=P(faultprocessedcorrectly/faultexistence)

Qualitatively, coverage is a measure of the system's ability to detect, locate, contain, and recover from the presence of a fault. There are four primary types of fault coverage: (1) fault detection coverage, (2) fault location coverage, (3) fault containment coverage, and (4) fault recovery coverage. Thus, the term ‘fault processed correctly’ refers to one or more of the four coverage types [4].

Most of safety-critical systems manage faults in a fail-safe manner when they detect a fault. For example, the digital protection systems of the Ulchin nuclear power units generate safety signals when they detect a fault. That is, in this case, the fault detection coverage is a matter of interest. The purpose of this study is to introduce a quantitative, fault-detection coverage evaluation method for a fail-safe digital system by using a simulated fault injection.

This paper is structured as follows. In Section 2, we describe the fault-detection coverage evaluation method. The target system and the local coincidence logic (LCL) in the DPPS are introduced in Section 3. The experiment setup is presented in Section 4. In Section 5, we present some application results from the experiment. We conclude the paper in Section 6.

Section snippets

Coverage evaluation method

Several studies have considered a quantitative evaluation of fault detection coverage by using fault injection methods. Koche et al. proposed a deductive fault simulator for the fault coverage evaluation [5]. Levendal and Menon used hardware description languages to describe small circuits and faults are applied to the circuit, such as function variables stuck at 0 or 1 and control faults [6]. Mao and Gulati proposed an RTL fault model and simulation methodology [7]. Hayne and Johnson evaluated

Target system

The target system to evaluate fault coverage is an LCL processor in digital plant protection system (DPPS). Fig. 1 shows a block diagram of the DPPS [12]. The DPPS protects the core fuel design limits and the reactor coolant system pressure boundary by tripping the reactor when monitored plant conditions exceed design limits. It is one of the safety-critical digital systems in Ulchin nuclear power plant 5 and 6.

Each LCL processor in the DPPS performs the coincidence logic function. The LCL

Application experiment setup

The LCL system is a digitalized system, and the major hardware consists of CPU, RAM, and ROM. An actual LCL system is very complex and, for convenience, must be simplified. Fig.3 shows a simplified block diagram of the LCL system used. This simplified system is designed to imitate the LCL processor in the DPPS. The simplified system is comprised of CPU, RAM, and ROM.

Heartbeat-watchdog timer case

We can calculate the fault detection coverage of the system by the simulation experiment.

Table 4 shows the results of the fault detection coverage of eight sections in the CPU. When 12,210 stuck-at faults are injected into the decoder, about 23.0% of them are detected. When 256 stuck-at faults are injected into the instruction register and 65,536 stuck-at faults are injected into the program counter, nearly 100.0% of the faults are detected in each case. The fault detection coverage of the CPU

Conclusions

In this study, we introduced a fault-detection coverage evaluation method to increase the safety of nuclear power plant digital systems. To evaluate the proposed method, simulated fault injection experiment is performed on a simplified system with program. The LCL system in the DPPS is selected for assessment. Because of its complexity, the LCL system was simplified. The simplified system consists of CPU, RAM, and ROM. The permanent stuck-at fault is selected as a possible fault in the system

Acknowledgements

This work is partly supported by the Korean National Research Laboratory (NRL) Program.

References (15)

  • Hyun Gook Kang et al.

    An analysis of safety-critical digital systems for risk-informed design

    Reliab Eng Syst Saf

    (2002)
  • Kaufman LM, Johnson BW. Embedded digital system reliability and safety analyses, NUREG/GR-0200;...
  • Digital instrumentation and control systems in nuclear power plants

    (1997)
  • Joanne B. Dugan et al.

    Coverage modeling for dependability analysis of fault-tolerant systems

    IEEE Trans Comput

    (1989)
  • A. Khoche et al.

    A behavioral fault simulator for ideal

    IEEE Design Test Comput

    (1992)
  • Y.H. Levendel et al.

    Test generation algorithms for computer hardware description languages

    IEEE Trans Comp

    (1982)
  • W. Mao et al.

    Improving gate level fault coverage by RTL fault grading

    (1996)
There are more references available in the full text version of this article.

Cited by (21)

  • Evaluation of effectiveness of fault-tolerant techniques in a digital instrumentation and control system with a fault injection experiment

    2019, Nuclear Engineering and Technology
    Citation Excerpt :

    A specific fault-tolerant technique, however, cannot detect and recover all possible faults in a system, but it may detect and recover only limited number of faults. Therefore, it is important to quantify the effectiveness of fault-tolerant techniques in estimating the reliability of the system [6,7]. A report published in 1997 by the US National Research Council states that appropriate methods for assessing safety and reliability are key to establishing the acceptability of digital I&C systems in safety-critical plants such as NPPs [8].

  • A survey of the state of condition-based maintenance (CBM) in the nuclear power industry

    2018, Annals of Nuclear Energy
    Citation Excerpt :

    The results demonstrated the implementation of the FDI algorithm for both instrument and actuator monitoring. Kim et al. (2006) developed a fault diagnostic system for the NPP digital systems. They employed a simulated fault injection method in evaluating the faults coverage on the digital systems.

  • Fault-weighted quantification method of fault detection coverage through fault mode and effect analysis in digital I&C systems

    2017, Nuclear Engineering and Design
    Citation Excerpt :

    To this end, a memory mapping analysis for each final effect is conducted. It aims to allocate each final effects category to a fault injection experiment (Kim et al., 2006; Lee et al., 2010). Using the memory map, a fault injection experiment is conducted and we can obtain fault detection coverage for each final effect.

  • PSA model with consideration of the effect of fault-tolerant techniques in digital I&C systems

    2016, Annals of Nuclear Energy
    Citation Excerpt :

    The most important part of the proposed method is to identify the portion of detectable faults and fault detection coverage of each area. One of the promising methods for identifying detectable faults and fault detection coverage is a fault injection experiment (Kim et al., 2006; Lee et al., 2010). The research to solve this issue should be conducted as a further study.

  • Dynamic Bayesian network modeling of reliability of subsea blowout preventer stack in presence of common cause failures

    2015, Journal of Loss Prevention in the Process Industries
    Citation Excerpt :

    As the components are connected in parallel, the subsystem will fail when all the components fail. The imperfect coverage and CCF are two important factors in reliability issues (Kim et al., 2006). Due to the uncertainty, sometimes the system might not be able to recover from the occurrence of a fault.

  • Reliability modeling of digital component in plant protection system with various fault-tolerant techniques

    2013, Nuclear Engineering and Design
    Citation Excerpt :

    PSA techniques are used to evaluate the relative effects of contributing event on safety or reliability. However, the conventional PSA techniques cannot adequately evaluate all features of digital systems because the digital system uses heterogeneous and complex components (Kim et al., 2006; Lee et al., 2006b; Lee et al., 2010). A digital protection system (DPS) is a representative digital system in NPPs.

View all citing articles on Scopus
View full text