A review of process fault detection and diagnosis: Part I: Quantitative model-based methods

https://doi.org/10.1016/S0098-1354(02)00160-6Get rights and content

Abstract

Fault detection and diagnosis is an important problem in process engineering. It is the central component of abnormal event management (AEM) which has attracted a lot of attention recently. AEM deals with the timely detection, diagnosis and correction of abnormal conditions of faults in a process. Early detection and diagnosis of process faults while the plant is still operating in a controllable region can help avoid abnormal event progression and reduce productivity loss. Since the petrochemical industries lose an estimated 20 billion dollars every year, they have rated AEM as their number one problem that needs to be solved. Hence, there is considerable interest in this field now from industrial practitioners as well as academic researchers, as opposed to a decade or so ago. There is an abundance of literature on process fault diagnosis ranging from analytical methods to artificial intelligence and statistical approaches. From a modelling perspective, there are methods that require accurate process models, semi-quantitative models, or qualitative models. At the other end of the spectrum, there are methods that do not assume any form of model information and rely only on historic process data. In addition, given the process knowledge, there are different search techniques that can be applied to perform diagnosis. Such a collection of bewildering array of methodologies and alternatives often poses a difficult challenge to any aspirant who is not a specialist in these techniques. Some of these ideas seem so far apart from one another that a non-expert researcher or practitioner is often left wondering about the suitability of a method for his or her diagnostic situation. While there have been some excellent reviews in this field in the past, they often focused on a particular branch, such as analytical models, of this broad discipline. The basic aim of this three part series of papers is to provide a systematic and comparative study of various diagnostic methods from different perspectives. We broadly classify fault diagnosis methods into three general categories and review them in three parts. They are quantitative model-based methods, qualitative model-based methods, and process history based methods. In the first part of the series, the problem of fault diagnosis is introduced and approaches based on quantitative models are reviewed. In the remaining two parts, methods based on qualitative models and process history data are reviewed. Furthermore, these disparate methods will be compared and evaluated based on a common set of criteria introduced in the first part of the series. We conclude the series with a discussion on the relationship of fault diagnosis to other process operations and on emerging trends such as hybrid blackboard-based frameworks for fault diagnosis.

Introduction

The discipline of process control has made tremendous advances in the last three decades with the advent of computer control of complex processes. Low-level control actions such as opening and closing valves, called regulatory control, which used to be performed by human operators are now routinely performed in an automated manner with the aid of computers with considerable success. With progress in distributed control and model predictive control systems, the benefits to various industrial segments such as chemical, petrochemical, cement, steel, power and desalination industries have been enormous. However, a very important control task in managing process plants still remains largely a manual activity, performed by human operators. This is the task of responding to abnormal events in a process. This involves the timely detection of an abnormal event, diagnosing its causal origins and then taking appropriate supervisory control decisions and actions to bring the process back to a normal, safe, operating state. This entire activity has come to be called Abnormal Event Management (AEM), a key component of supervisory control.

However, this complete reliance on human operators to cope with such abnormal events and emergencies has become increasingly difficult due to several factors. It is difficult due to the broad scope of the diagnostic activity that encompasses a variety of malfunctions such as process unit failures, process unit degradation, parameter drifts and so on. It is further complicated by the size and complexity of modern process plants. For example, in a large process plant there may be as many as 1500 process variables observed every few seconds (Bailey, 1984) leading to information overload. In addition, often the emphasis is on quick diagnosis which poses certain constraints and demands on the diagnostic activity. Furthermore, the task of fault diagnosis is made difficult by the fact that the process measurements may often be insufficient, incomplete and/or unreliable due to a variety of causes such as sensor biases or failures.

Given such difficult conditions, it should come as no surprise that human operators tend to make erroneous decisions and take actions which make matters even worse, as reported in the literature. Industrial statistics show that about 70% of the industrial accidents are caused by human errors. These abnormal events have significant economic, safety and environmental impact. Despite advances in computer-based control of chemical plants, the fact that two of the worst ever chemical plant accidents, namely Union Carbide's Bhopal, India, accident and Occidental Petroleum's Piper Alpha accident (Lees, 1996), happened in recent times is a troubling development. Another major recent incident is the explosion at the Kuwait Petrochemical's Mina Al-Ahmedi refinery in June of 2000, which resulted in about 100 million dollars in damages.

Further, industrial statistics have shown that even though major catastrophes and disasters from chemical plant failures may be infrequent, minor accidents are very common, occurring on a day to day basis, resulting in many occupational injuries, illnesses, and costing the society billions of dollars every year (Bureau of Labor Statistics, 1998, McGraw-Hill Economics, 1985, National Safety Council, 1999). It is estimated that the petrochemical industry alone in the US incurs approximately 20 billion dollars in annual losses due to poor AEM (Nimmo, 1995). The cost is much more when one includes similar situations in other industries such as pharmaceutical, specialty chemicals, power and so on. Similar accidents cost the British economy up to 27 billion dollars every year (Laser, 2000).

Thus, here is the next grand challenge for control engineers. In the past, the control community showed how regulatory control could be automated using computers and thereby removing it from the hands of human operators. This has led to great progress in product quality and consistency, process safety and process efficiency. The current challenge is the automation of AEM using intelligent control systems, thereby providing human operators the assistance in this most pressing area of need. People in the process industries view this as the next major milestone in control systems research and application.

The automation of process fault detection and diagnosis forms the first step in AEM. Due to the broad scope of the process fault diagnosis problem and the difficulties in its real time solution, various computer-aided approaches have been developed over the years. They cover a wide variety of techniques such as the early attempts using fault trees and digraphs, analytical approaches, and knowledge-based systems and neural networks in more recent studies. From a modelling perspective, there are methods that require accurate process models, semi-quantitative models, or qualitative model. At the other end of the spectrum, there are methods that do not assume any form of model information and rely only on process history information. In addition, given the process knowledge, there are different search techniques that can be applied to perform diagnosis. Such a collection of bewildering array of methodologies and alternatives often pose a difficult challenge to any aspirant who is not a specialist in these techniques. Some of these ideas seem so far apart from one another that a non-expert researcher or practitioner is often left wondering about the suitability of a method for his or her diagnostic situation. While there have been some excellent reviews in this filed in the past, they often focused on a particular branch, such as analytical models, of this broad discipline.

The basic aim of this three part series of papers is to provide a systematic and comparative study of various diagnostic methods from different perspectives. We broadly classify fault diagnosis methods into three general categories and review them in three parts. They are quantitative model based methods, qualitative model based methods, and process history based methods. We review these different approaches and attempt to present a perspective showing how these different methods relate to and differ from each other. While discussing these various methods we will also try to point out important assumptions, drawbacks as well as advantages that are not stated explicitly and are difficult to gather. Due to the broad nature of this exercise it is not possible to discuss every method in all its detail. Hence the intent is to provide the reader with the general concepts and lead him or her on to literature that will be a good entry point into this field.

In the first part of the series, the problem of fault diagnosis is introduced and fault diagnosis approaches based on quantitative models are reviewed. In the following two parts, fault diagnostic methods based on qualitative models and process history data are reviewed. Further, these disparate methods will be compared and evaluated based on a common set of desirable characteristics for fault diagnostic classifiers introduced in this paper. The relation of fault diagnosis to other process operations and a discussion on future directions are presented in Part III.

By way of introduction, we first address the definitions and nomenclature used in the area of process fault diagnosis. The term fault is generally defined as a departure from an acceptable range of an observed variable or a calculated parameter associated with a process (Himmelblau, 1978). This defines a fault as a process abnormality or symptom, such as high temperature in a reactor or low product quality and so on. The underling cause(s) of this abnormality, such as a failed coolant pump or a controller, is(are) called the basic event(s) or the root cause(s). The basic event is also referred to as a malfunction or a failure. Since one can view the task of diagnosis as a classification problem, the diagnostic system is also referred to as a diagnostic classifier. Fig. 1 depicts the components of a general fault diagnosis framework. The figure shows a controlled process system and indicates the different sources of failures in it. In general, one has to deal with three classes of failures or malfunctions as described below:

In any modelling, there are processes occurring below the selected level of detail of the model. These processes which are not modelled are typically lumped as parameters and these include interactions across the system boundary. Parameter failures arise when there is a disturbance entering the process from the environment through one or more exogenous (independent) variables. An example of such a malfunction is a change in the concentration of the reactant from its normal or steady state value in a reactor feed. Here, the concentration is an exogenous variable, a variable whose dynamics is not provided with that of the process. Another example is the change in the heat transfer coefficient due to fouling in a heat exchanger.

Structural changes refer to changes in the process itself. They occur due to hard failures in equipment. Structural malfunctions result in a change in the information flow between various variables. To handle such a failure in a diagnostic system would require the removal of the appropriate model equations and restructuring the other equations in order to describe the current situation of the process. An example of a structural failure would be failure of a controller. Other examples include a stuck valve, a broken or leaking pipe and so on.

Gross errors usually occur with actuators and sensors. These could be due to a fixed failure, a constant bias (positive or negative) or an out-of range failure. Some of the instruments provide feedback signals which are essential for the control of the plant. A failure in one of the instruments could cause the plant state variables to deviate beyond acceptable limits unless the failure is detected promptly and corrective actions are accomplished in time. It is the purpose of diagnosis to quickly detect any instrument fault which could seriously degrade the performance of the control system.

Outside the scope of fault diagnosis are unstructured uncertainties, process noise and measurement noise. Unstructured uncertainties are mainly faults that are not modelled a priori. Process noise refers to the mismatch between the actual process and the predictions from model equations, whereas, measurement noise refers to high frequency additive component in the sensor measurements.

In this series of review papers, we will provide a review of the various techniques that have been proposed to solve the problem of fault detection and diagnosis. We classify the techniques as quantitative model based, qualitative model based and process history based approaches. Under the quantitative model based approaches, we will review techniques that use analytical redundancy to generate residuals that can be used for isolating process failures. We will discuss residual generation through diagnostic observers, parity relations, Kalman filters and so on. Under the qualitative model based approaches, we review the signed directed graph (SDG), Fault Trees, Qualitative Simulation (QSIM), and Qualitative Process Theory (QPT) approaches to fault diagnosis. Further, we also classify diagnostic search strategies as being topographic or symptomatic searches. Under process history based approaches we will discuss both qualitative approaches such as expert systems and qualitative trend analysis (QTA) techniques and quantitative approaches such as neural networks, PCA and statistical classifiers.

We believe that there have been very few articles that comprehensively review the field of fault diagnosis considering all the different types of techniques that have been discussed in this series of review papers. Most of the review papers such as the one by Frank, Ding, and Marcu (2000) seem to focus predominantly on model based approaches. For example, in the review by Frank et al., a detailed description of various types of analytical model based approaches is presented. The robustness issues in fault detection, optimized generation of residuals and generation of residuals for nonlinear systems are some of the issues that have been addressed in a comprehensive manner. There are a number of other review articles that fall under the same category. A brief review article that is more representative of all the available fault diagnostic techniques has been presented by Kramer and Mah (1993). This review deals with data validation, rectification and fault diagnosis issues. The fault diagnosis problem is viewed as consisting of feature extraction and classification stages. This view of fault diagnosis has been generalized in our review as the transformations that measurements go through before a final diagnostic decision is attained. The classification stage is examined by Kramer and Mah as falling under three main categories. (i) pattern recognition, (ii) model-based reasoning and (iii) model-matching. Under pattern recognition, most of the process history based methods are discussed; under model-based reasoning most of the qualitative model based techniques are discussed; and symptomatic search techniques using different model forms are discussed under model matching techniques.

Closely associated with the area of fault detection and diagnosis is the research area of gross error detection in sensor data and the subsequent validation. Gross error detection or sensor validation refers to the identification of faulty or failed sensors in the process. Data reconciliation or rectification is the task of providing estimates for the true values of sensor readings. There has been considerable work done in this area and there have also been review papers and books written on this area. Hence, we do not provide a review of this field in this series of papers. However, as mentioned before, fault diagnosis includes sensor failures also in its scope and hence data validation and rectification is a specific case of a more general fault diagnosis problem (Kramer & Mah, 1993).

The rest of this first part of the review is organized as follows. In the next section, we propose a list of ten desirable characteristics that one would like a diagnostic system to possess. This list would help us assess the various approaches against a common set of criteria. In Section 3, we discuss the transformations of data that take place during the process of diagnostic decision-making. This discussion lays down the framework for analyzing the various diagnostic approaches in terms of their knowledge and search components. In Section 4, a classification of fault diagnosis methods is provided. In Section 5, diagnosis methods based on quantitative models are discussed in detail.

Section snippets

Desirable characteristics of a fault diagnostic system

In the last section, the general problem of fault diagnosis was presented. In order to compare various diagnostic approaches, it is useful to identify a set of desirable characteristics that a diagnostic system should possess. Then the different approaches may be evaluated against such a common set of requirements or standards. Though these characteristics will not usually be met by any single diagnostic method, they are useful to benchmark various methods in terms of the a priori information

Transformations of measurements in a diagnostic system

To attempt a comparative study of various diagnostic methods it is helpful to view them from different perspectives. In this sense, it is important to identify the various transformations that process measurements go through before the final diagnostic decision is made. Two important components in the transformations are the a priori process knowledge and the search technique used. Hence, one can discuss diagnostic methods from these two perspectives. Also, one can view diagnostic methods based

Classification of diagnostic algorithms

As discussed earlier two of the main components in a diagnosis classifier are: (i) the type of knowledge and (ii) the type of diagnostic search strategy. Diagnostic search strategy is usually a very strong function of the knowledge representation scheme which in turn is largely influenced by the kind of a priori knowledge available. Hence, the type of a priori knowledge used is the most important distinguishing feature in diagnostic systems. In this three part review paper we classify the

Quantitative model-based approaches

This section reviews quantitative model-based fault diagnosis methods. The concept of analytical redundancy is introduced first, followed by a description of discrete dynamic system with linear models. The most frequently used FDI approaches, including diagnostic observers, parity relations, Kalman filters and parameter estimation are outlined. The recent effort of generating enhanced residuals to facilitate the fault isolation procedure is discussed. We will discuss the principles behind these

Conclusions

In this first part of the three part review paper, we have reviewed quantitative model based approaches to fault diagnosis. For the comparative evaluation of various fault diagnosis methods, we first proposed a set of desirable characteristics that one would like the diagnostic systems to possess. This can serve as a common set of criteria against which the different techniques may be evaluated and compared. Further, we provided a general framework for analyzing and understanding various

References (65)

  • R.K. Mehra et al.

    An innovations approach to fault detection and diagnosis in dynamic systems

    Automatica

    (1971)
  • M. Soroush

    State and parameter estimations and their applications in process control

    Computers and Chemical Engineering

    (1998)
  • K. Watanabe et al.

    Incipient fault diagnosis of nonlinear processes with multiple causes of faults

    Chemical Engineering Science

    (1984)
  • A.S. Willsky

    A survey of design methods for failure detection in dynamic systems

    Automatica

    (1976)
  • K. Yin

    Minimax methods for fault isolation in the directional residual approach

    Chemical Engineering Science

    (1998)
  • P. Young

    Parameter estimation for continuous time models-a survey

    Automatica

    (1981)
  • G.A. Almasy et al.

    Checking and correction of measurements on the basis of linear system model

    Problems of Control and Information Theory

    (1975)
  • S.J. Bailey

    From desktop to plant floor, a CRT is the control operators window on the process

    Control Engineering

    (1984)
  • M. Basseville et al.

    Detection of abrupt changes in signals and dynamic systems

    (1986)
  • M. Basseville et al.

    Detection of abrupt changes—theory and application

    (1993)
  • Y. Ben-Haim

    An algorithm for failure location in a complex network

    Nuclear Science and Engineering

    (1980)
  • Y. Ben-Haim

    Malfunction location in linear stochastic systems-application to nuclear power plants

    Nuclear Science and Engineering

    (1983)
  • Broen, R. B. (1974). A nonlinear voter-estimator for redundant systems. In Proceedings of IEEE conference on decision...
  • Occupational injuries and illnesses in the united states by industry

    (1998)
  • J. Chen et al.

    Robust model-based fault diagnosis for dynamic systems

    (1999)
  • E.Y. Chow et al.

    Analytical redundancy and the design of robust failure detection systems

    IEEE Transactions on Automatic Control

    (1984)
  • Clark, R. N. (1979). The dedicated observer approach to instrument fault detection. In Proceedings of the 15th IEEE-CDC...
  • Dash, S., Kantharao, S., Rengaswamy, R., & Venkatasubramanian, V. (2001) Application and evaluation of a...
  • Desai, M., & Ray, A. (1984). A fault detection and isolation methodology-theory and application. In Proceedings of...
  • Dingli, Y., Gomm, J. B., Shields, D. N., Williams, D., & Disdell, K (1995). Fault diagnosis for a gas-fired furnace...
  • Z. Fathi et al.

    Analytical and knowledge-based redundancy for fault diagnosis in process plants

    AIChE J.

    (1993)
  • P.M. Frank

    On-line fault detection in uncertain nonlinear systems using diagnostic observers: a survey

    International Journal Systems Science

    (1994)
  • Cited by (2333)

    • Bond Graph-CNN based hybrid fault diagnosis with minimum labeled data

      2024, Engineering Applications of Artificial Intelligence
    View all citing articles on Scopus
    View full text