A method for evaluating fault coverage using simulated fault injection for digitalized systems in nuclear power plants
Introduction
Modern technologies based on both digital hardware and advanced software algorithms are being rapidly developed and widely used. Due to the progress of instrumentation and control (I & C) technologies for process engineering modern digital technology is expected to significantly improve the performance and the safety of nuclear power plants.
However, the migration from analog to digital I & C systems within nuclear power plants has increased the complexity of such systems. The I & C systems that are being developed are computer-based, comprising digital hardware and software components. These systems perform complex functions that are essential to the safety-critical requirements of nuclear power plants. To prevent significant risks from arising, these systems must be dependable [1].
The development of a methodology for the probabilistic safety assessment (PSA) of digital I & C systems is a critical issue. Present PSA techniques are used to evaluate the relative effects of contributing events on system-level safety or reliability. In addition, PSA provides a unified means of assessing physical faults, recovery processes, contributing effects, human actions, and other events that have a high degree of uncertainty [2]. However, conventional PSA techniques cannot adequately evaluate all features of digital systems. Kang and Sung found that fault coverage, common cause failures, and software reliability are three most critical factors in the safety assessment of digital systems [3]. Among these factors, this study focuses on evaluating method of the fault coverage of actual nuclear power plant digital system.
The probability of a fault being properly removed from a fault-tolerant system is referred to as fault coverage. The fault coverage value crucially affects the dependability of a system. Thus, fault coverage is one of the most critical factors in a PSA. There are mathematical and qualitative expressions for the fault coverage. Mathematically, the fault coverage C is defined as the fault processed correctly divided by the fault existence.
Qualitatively, coverage is a measure of the system's ability to detect, locate, contain, and recover from the presence of a fault. There are four primary types of fault coverage: (1) fault detection coverage, (2) fault location coverage, (3) fault containment coverage, and (4) fault recovery coverage. Thus, the term ‘fault processed correctly’ refers to one or more of the four coverage types [4].
Most of safety-critical systems manage faults in a fail-safe manner when they detect a fault. For example, the digital protection systems of the Ulchin nuclear power units generate safety signals when they detect a fault. That is, in this case, the fault detection coverage is a matter of interest. The purpose of this study is to introduce a quantitative, fault-detection coverage evaluation method for a fail-safe digital system by using a simulated fault injection.
This paper is structured as follows. In Section 2, we describe the fault-detection coverage evaluation method. The target system and the local coincidence logic (LCL) in the DPPS are introduced in Section 3. The experiment setup is presented in Section 4. In Section 5, we present some application results from the experiment. We conclude the paper in Section 6.
Section snippets
Coverage evaluation method
Several studies have considered a quantitative evaluation of fault detection coverage by using fault injection methods. Koche et al. proposed a deductive fault simulator for the fault coverage evaluation [5]. Levendal and Menon used hardware description languages to describe small circuits and faults are applied to the circuit, such as function variables stuck at 0 or 1 and control faults [6]. Mao and Gulati proposed an RTL fault model and simulation methodology [7]. Hayne and Johnson evaluated
Target system
The target system to evaluate fault coverage is an LCL processor in digital plant protection system (DPPS). Fig. 1 shows a block diagram of the DPPS [12]. The DPPS protects the core fuel design limits and the reactor coolant system pressure boundary by tripping the reactor when monitored plant conditions exceed design limits. It is one of the safety-critical digital systems in Ulchin nuclear power plant 5 and 6.
Each LCL processor in the DPPS performs the coincidence logic function. The LCL
Application experiment setup
The LCL system is a digitalized system, and the major hardware consists of CPU, RAM, and ROM. An actual LCL system is very complex and, for convenience, must be simplified. Fig.3 shows a simplified block diagram of the LCL system used. This simplified system is designed to imitate the LCL processor in the DPPS. The simplified system is comprised of CPU, RAM, and ROM.
Heartbeat-watchdog timer case
We can calculate the fault detection coverage of the system by the simulation experiment.
Table 4 shows the results of the fault detection coverage of eight sections in the CPU. When 12,210 stuck-at faults are injected into the decoder, about 23.0% of them are detected. When 256 stuck-at faults are injected into the instruction register and 65,536 stuck-at faults are injected into the program counter, nearly 100.0% of the faults are detected in each case. The fault detection coverage of the CPU
Conclusions
In this study, we introduced a fault-detection coverage evaluation method to increase the safety of nuclear power plant digital systems. To evaluate the proposed method, simulated fault injection experiment is performed on a simplified system with program. The LCL system in the DPPS is selected for assessment. Because of its complexity, the LCL system was simplified. The simplified system consists of CPU, RAM, and ROM. The permanent stuck-at fault is selected as a possible fault in the system
Acknowledgements
This work is partly supported by the Korean National Research Laboratory (NRL) Program.
References (15)
- et al.
An analysis of safety-critical digital systems for risk-informed design
Reliab Eng Syst Saf
(2002) - Kaufman LM, Johnson BW. Embedded digital system reliability and safety analyses, NUREG/GR-0200;...
Digital instrumentation and control systems in nuclear power plants
(1997)- et al.
Coverage modeling for dependability analysis of fault-tolerant systems
IEEE Trans Comput
(1989) - et al.
A behavioral fault simulator for ideal
IEEE Design Test Comput
(1992) - et al.
Test generation algorithms for computer hardware description languages
IEEE Trans Comp
(1982) - et al.
Improving gate level fault coverage by RTL fault grading
(1996)
Cited by (21)
Evaluation of effectiveness of fault-tolerant techniques in a digital instrumentation and control system with a fault injection experiment
2019, Nuclear Engineering and TechnologyCitation Excerpt :A specific fault-tolerant technique, however, cannot detect and recover all possible faults in a system, but it may detect and recover only limited number of faults. Therefore, it is important to quantify the effectiveness of fault-tolerant techniques in estimating the reliability of the system [6,7]. A report published in 1997 by the US National Research Council states that appropriate methods for assessing safety and reliability are key to establishing the acceptability of digital I&C systems in safety-critical plants such as NPPs [8].
A survey of the state of condition-based maintenance (CBM) in the nuclear power industry
2018, Annals of Nuclear EnergyCitation Excerpt :The results demonstrated the implementation of the FDI algorithm for both instrument and actuator monitoring. Kim et al. (2006) developed a fault diagnostic system for the NPP digital systems. They employed a simulated fault injection method in evaluating the faults coverage on the digital systems.
Fault-weighted quantification method of fault detection coverage through fault mode and effect analysis in digital I&C systems
2017, Nuclear Engineering and DesignCitation Excerpt :To this end, a memory mapping analysis for each final effect is conducted. It aims to allocate each final effects category to a fault injection experiment (Kim et al., 2006; Lee et al., 2010). Using the memory map, a fault injection experiment is conducted and we can obtain fault detection coverage for each final effect.
PSA model with consideration of the effect of fault-tolerant techniques in digital I&C systems
2016, Annals of Nuclear EnergyCitation Excerpt :The most important part of the proposed method is to identify the portion of detectable faults and fault detection coverage of each area. One of the promising methods for identifying detectable faults and fault detection coverage is a fault injection experiment (Kim et al., 2006; Lee et al., 2010). The research to solve this issue should be conducted as a further study.
Dynamic Bayesian network modeling of reliability of subsea blowout preventer stack in presence of common cause failures
2015, Journal of Loss Prevention in the Process IndustriesCitation Excerpt :As the components are connected in parallel, the subsystem will fail when all the components fail. The imperfect coverage and CCF are two important factors in reliability issues (Kim et al., 2006). Due to the uncertainty, sometimes the system might not be able to recover from the occurrence of a fault.
Reliability modeling of digital component in plant protection system with various fault-tolerant techniques
2013, Nuclear Engineering and DesignCitation Excerpt :PSA techniques are used to evaluate the relative effects of contributing event on safety or reliability. However, the conventional PSA techniques cannot adequately evaluate all features of digital systems because the digital system uses heterogeneous and complex components (Kim et al., 2006; Lee et al., 2006b; Lee et al., 2010). A digital protection system (DPS) is a representative digital system in NPPs.