Elsevier

Automatica

Volume 47, Issue 4, April 2011, Pages 639-649
Automatica

Active fault tolerant control of discrete event systems using online diagnostics

https://doi.org/10.1016/j.automatica.2011.01.007Get rights and content

Abstract

The aim of this paper is to deal with the problem of fault tolerant control in the framework of discrete event systems modeled as automata. A fault tolerant controller is a controller able to satisfy control specifications both in nominal operation and after the occurrence of a fault. This task is solved by means of a parameterized controller that is suitably updated on the basis of the information provided by online diagnostics: the supervisor actively reacts to the detection of a malfunctioning component in order to eventually meet degraded control specifications. Starting from an appropriate model of the system, we recall the notion of safe diagnosability as a necessary step in order to achieve fault tolerant control. We then introduce two new notions: (i) “safe controllability”, which represents the capability, after the occurrence of a fault, of steering the system away from forbidden zones and (ii) “active fault tolerant system”, which is the property of safely continuing operation after faults. Finally, we show how the problem can be solved using a general control architecture based on the use of special kind of diagnoser, called “diagnosing controller”, which is used to safely detect faults and to switch between the nominal control policy and a bank of reconfigured control policies. A simple example is used to illustrate the new notions and the control architecture introduced in the paper.

Introduction

Complex technological systems are vulnerable to unpredictable events that can cause undesired reactions and as a consequence damage to technical parts of the plant, to personnel, or to the environment. The main objective of the Fault Detection and Isolation (FDI) research area (see, e.g., Patton, Frank, & Clark, 2000) is to study methodologies for identifying and exactly characterizing possible incipient faults arising in predetermined parts of the plant. This is usually achieved by designing a dynamical system which, by processing input/output data, is able to detect the presence of an incipient fault and eventually to precisely isolate it. Once a fault has been detected and isolated, the next natural step is to reconfigure the control law in order to tolerate the fault, namely, to guarantee pre-specified (eventually degraded) performance objectives for the faulty system. In this framework, the FDI phase is usually followed by the design of a fault tolerant control (FTC) system, namely, by the design of a reconfiguring unit that, on the basis of the information provided by the FDI filter, adjusts the controller in order to achieve the prescribed performance for the faulty system (see Blanke, Kinnaert, Lunze, & Staroswiecki, 2003).

The FTC problem can be tackled using either a passive approach or an active one. The passive approach deals with the problem of finding a general controller able to satisfy control specifications both in nominal operation and after the occurrence of a fault. Passive fault tolerance uses robust control techniques to ensure that the closed loop system remains insensitive to certain failures so that the impaired system continues to operate with the same controller and system structure. The effectiveness of the scheme depends upon the robustness of the nominal fault-free closed loop system. Hence, a unique controller, designed offline, can be used and online fault information is not required. In contrast, active fault tolerance aims at achieving the control objectives by adapting the control law to the faulty system behavior. In general, the latter phase is carried out by means of a parameterized controller which is suitably updated by a supervisory unit, on the basis of the information provided by the FDI filter. This approach relies upon a “certainty equivalence” idea extensively used in the context of adaptive control, since it is based on the explicit estimation of faults by the FDI filter and subsequent explicit reconfiguration of the controller in the presence of faults.

In this paper, we consider the FTC problem for systems that are governed by operational rules that can be modeled by discrete event systems (DES), i.e., dynamical systems with discrete state spaces and event-driven transitions. Several methodologies have been developed to solve the FDI problem for systems modeled as DES; see Benveniste, Haar, Fabre, and Jard (2003), van Schuppen (2002), Cordier and Rozé (2002), Debouk, Malik, and Brandin (2002), Garcia, Morant, and Blasco-Giménez (2002), Garcia and Yoo (2004), Genc and Lafortune (2007), Hadjicostis (2005), Jiang and Kumar (2006), Lin (1994), Lunze (2001), Pandalai and Holloway (2000), Paoli and Lafortune (2008), Pencole and Cordier (2002), Provan (2002), Sampath, Sengupta, and Lafortune (1995), Sampath, Sengupta, Lafortune, Sinnamohideen, and Teneketzis (1996) and Su and Wonham (2002), for a sample of this work including references to successful industrial applications. Less effort however has been made to solve the FTC problem in the DES framework; this problem has recently been studied in Darabi, Jafari, and Buczak (2003), Dumitrescu, Girault, Marchand, and Rutten (2007), Iordache and Antsaklis (2004), Park and Cho (2009), Rohloff (2005), Wen, Kumar, Huang, and Liu (2007a), Wen, Kumar, Huang, and Liu (2007b) and Wen, Kumar, Huang, and Liu (2008). In Dumitrescu et al. (2007), the problem of managing a set of real-time periodic tasks onto a set of processors upon the occurrence of a fault (considered as observable) on one or more processors is solved using optimal discrete controller synthesis techniques. In Iordache and Antsaklis (2004), the supervisory control technique for Petri nets based on place invariants is adapted to achieve robustness properties for systems in which faults and reconfigurations are modeled as changes in marking. In Park and Cho (2009), the authors present a formal method to design an optimal fault-tolerant scheduler for real-time multiprocessor systems with non-preemptive tasks such that all deadlines for active tasks are satisfied even in the presence of processor faults modeled as observable events. Ref. Rohloff (2005) deals with the problem of performing robust controller synthesis when the system is subject to failures on sensors, i.e., when previously observable events become unobservable.

In Wen et al. (2008), the authors propose a definition of fault tolerance based on the DES notions of language equivalence and convergence by means of control. Roughly speaking, a DES is said to be fault tolerant if every post-fault behavior is equivalent to a non-faulty behavior in a bounded number of steps; moreover, a supervisor is said to be a “fault-tolerant controller” if it is able to force fault tolerant behavior for the supervised DES. The authors provide a necessary and sufficient condition for the existence of a fault-tolerant supervisor able to enforce a specification for the non-faulty plant and a wider specification for the overall plant. Such an approach can therefore be cast in the framework of passive approaches.

We study the active approach to FTC for DES modeled as automata. Specifically, we want to design an architecture in which the supervisor actively reacts to the detection of a malfunctioning component in order to meet eventually degraded control specifications. To this aim, we describe a modeling procedure that results in a structured model of the controlled system containing a nominal part and a set of faulty parts. Starting from this suitable model, we recall the notion of safe diagnosability (see Paoli & Lafortune, 2005) as a necessary step in order to achieve fault tolerant supervision of DES. We then introduce the new notion of safe controllability, which represents the capability, after the occurrence of a fault, of steering the system away from forbidden zones. We also define the new notion of active fault tolerant system with respect to post-fault specifications as the property of safely continuing operation after faults. We then present a general control architecture to deal with the FTC problem. This architecture is based on the use of a special kind of diagnoser, called diagnosing controller, which is used to safely detect faults and to switch between the nominal control policy and a bank of reconfigured control policies. In this sense, the exploited paradigm is that of switching control in which a high-level logic is used to switch between a bank of different controllers (see Darabi & Jafari, 2003 and Zhang & Jiang, 2001).

The main contributions of this work are as follows:

  • (i)

    the exploitation of a multiple-supervisor architecture to actively counteract the effect of faults;

  • (ii)

    the evaluation of the effect of the diagnostics algorithm on the performance of the architecture;

  • (iii)

    the definition of a new diagnoser called diagnosing controller, which realizes in a unique entity the switching architecture.

Fault tolerance is also an important research area in computer science, especially when dealing with distributed systems. Among the many works in this area, we wish to comment further on the papers (Attie et al., 2004, Kulkarni and Arora, 2000, Kulkarni and Ebnenasir, 2004), and the references therein. Specifically, these works present a method for the synthesis of fault-tolerant programs that is based on a decision procedure for branching time temporal logic. The focus is on non-terminating concurrent programs consisting of a finite number of fixed sequential processes described by directed graphs. Computational tree logic (CTL) is used to allocate specifications on processes and to prove satisfiability. The faults that a concurrent program is subject to are categorized in terms of type, duration, observability, and repair properties; such faults are modeled as actions (guarded commands) whose execution perturbs the program state. When a fault occurs, a concurrent program in general need not satisfy its given nominal specification, but it must accomplish some given specification that can be classified in terms of how (and whether) the safety and the liveness parts of the given specification are respected in the presence of the faults (in masking tolerance, both the safety and the liveness parts are always respected; in fail-safe tolerance, only the safety part but not necessarily the liveness part is respected; in nonmasking tolerance, the liveness part is always respected but the safety part is only eventually respected). The problem solved in Attie et al. (2004) can be described as follows: given a problem specification (models of processes and global specifications), a fault specification (a set of fault actions), a problem-fault coupling specifications (how programs react to faults) and a type of desired tolerance, use the CTL decision procedure to synthesize a concurrent program such that in nominal situation global specifications are satisfied, while, in the case of a fault, the desired tolerance property are met for any evolution after the fault occurrence. The procedure turns out to be of exponential complexity. In Kulkarni and Ebnenasir (2004) the problem is extended to obtain multi-tolerance, i.e., multiple faults with different tolerance specifications.

The approach in these above works is in some sense complementary to the approach we develop in the present paper. In Attie et al. (2004), Kulkarni and Arora (2000) and Kulkarni and Ebnenasir (2004), the focus is on how to design a supervisor that “implicitly” meets nominal and faulty specifications; however, the problem of actively reacting to faults after they have been detected on-the-fly is not addressed. Thus, the approach in these works can be cast in the so-called implicit fault-tolerant methods, where fault tolerance is obtained through the nominal global supervisor, which is designed in order to implicitly react to faults; in this framework, fault diagnosis is of secondary importance and can be obtained by simply observing the actions of the supervisor. In contrast, in our approach, we exploit an active paradigm (active fault tolerance approach) to switch from a nominal supervisor to a reconfigured supervisor once online diagnostics detects a fault. In this approach, it is essential to have an explicit fault diagnosis phase (faults are always considered as unobservable), which must be reliable and fast.

This paper is organized as follows. In Section 2, the effect of unobservable faults on a supervised DES is described and modeled. In Section 3, the notions of diagnosable and safe-diagnosable DES are reviewed, the new notion of safe-controllable DES is introduced and guidelines to test it are given. In Section 4, the fault tolerant control architecture is presented and the diagnosing controller is defined as well as an algorithmic procedure to build it. Finally in Section 5, a simple example is used to illustrate the new notions and the control architecture introduced in the paper. In Section 6, a conclusion is provided. A preliminary version of the results in this paper was presented in Paoli, Sartini, and Lafortune (2008).

Section snippets

Supervisory control of DES with faults

Following the theory of supervisory control of DES (see, e.g., Chapter 3 of Cassandras & Lafortune, 2008), the system is modeled by automaton G=(X,E,δ,x0), where X is the state space, E is the set of the events, δ is the partial transition function and x0 is the initial state of the system. The behavior of the system is described by the prefix-closed language L(G) generated by G. The event set E is partitioned as E=EoEuo, where Eo represents the set of observable events (their occurrence can

Safe controllability of DES

This section is concerned with the definition and testing of the property of safe controllability for the purpose of the fault tolerance objectives described in the preceding section. First, we recall the definition of diagnosability, introduced in Sampath et al. (1995), which states that a language L is diagnosable if it is possible to detect within a finite delay occurrences of faults using the record of observed events.

Definition 1 Diagnosable DES

A prefix-closed language L that is live and does not contain loops of

Active fault tolerance of DES

If language L(Gsupn+f) is safe controllable, then it is always possible to detect any occurrence of event f in a bounded number of observable events and without executing any forbidden action; moreover, in any continuation after the detection of fault f that contains a forbidden action in Φ, there always exists at least one controllable event z that can be disabled to prevent the system from executing unsafe actions. Entering certain state qiFC should therefore trigger an interrupt signal INTi

An illustrative example

Consider the hydraulic system of Fig. 6(a); the system is composed of a tank T, a pump P, a set of valves (V1,V2, and Vr), and associated pipes. The pump P is used to move fluid from the tank through the pipe and must be coordinated with the set of redundant valves. The system is equipped with a pressure sensor. The automaton modeling the set of valves is denoted by G1nom and is shown in Fig. 6(b): events op1 and cl1 are used to open and close valve V1, events op2 and cl2 are used to open and

Conclusions

The main contributions of this paper can be summarized as follows.

  • (i)

    We have investigated an active approach to FTC of DES that makes use of a multiple-supervisor architecture to actively counteract the effect of faults. The control algorithm employs online diagnostics to actively react to the detection of a malfunctioning component in order to eventually meet degraded control specifications.

  • (ii)

    We have evaluated the effect of the diagnostics algorithm on the FTC architecture, based on the idea that

Andrea Paoli received the M.Eng. degree and the Ph.D. degree in 2000 and 2004 respectively, both from the University of Bologna. Since December 2008, he has an assistant professor position at the University of Bologna. Since 2006, he is a member of the IFAC Technical Committee on Fault Detection, Supervision and Safety of Technical Processes—SAFEPROCESS TC. He is co-author of about 50 technical-scientific publications and of a textbook on industrial automation. On July 2005 he won the

References (48)

  • A. Paoli et al.

    Safe diagnosability for fault tolerant supervision of discrete event systems

    Automatica

    (2005)
  • Y. Pencole et al.

    A formal framework for the decentralised diagnosis of large scale discrete event systems and its application to telecommunication networks

    Artificial Intelligence

    (2005)
  • P.C. Attie et al.

    Synthesis of fault-tolerant concurrent programs

    ACM Transactions on Programming Language Systems

    (2004)
  • A. Benveniste et al.

    Diagnosis of asynchronous discrete event systems, a net unfolding approach

    IEEE Transactions on Automatic Control

    (2003)
  • M. Blanke et al.

    Diagnosis and fault-tolerant control

    (2003)
  • Boel, R., & van Schuppen, J. (2002). Decentralized failure diagnosis for discrete-event systems with constrained...
  • C. Cassandras et al.

    Introduction to discrete event systems

    (2008)
  • Chen, Y. -L., & Provan, G. (1997). Modeling and diagnosis of timed discrete event systems: a factory automation...
  • H. Cho et al.

    On supremal languages of classes of sublanguages that arise in supervisor synthesis problems with partial observation

    Mathematics of Control, Signals, and Systems

    (1989)
  • M. Cordier et al.

    Diagnosing discrete-event systems: extending the diagnoser approach to deal with telecommunication networks

    Journal of Discrete Event Dynamic Systems

    (2002)
  • A.B.H. Darabi et al.

    A control switching theory for supervisory control of discrete event systems

    IEEE Transactions on Robotics and Automation

    (2003)
  • H. Darabi et al.

    A control switching theory for supervisory control of discrete event systems

    IEEE Transactions on Robotics and Automation

    (2003)
  • Debouk, R., Malik, R., & Brandin, B. (2002). A modular architecture for diagnosis of discrete event systems. In...
  • Dumitrescu, E., Girault, A., Marchand, H., & Rutten, E. (2007). Optimal discrete controller synthesis for modeling...
  • Garcia, H., Morant, F., & Blasco-Giménez, R. (2002). Centralized modular diagnosis and the phenomenon of coupling. In...
  • H. Garcia et al.

    Model-based detection of routing events in discrete flow networks

    Automatica

    (2004)
  • S. Genc et al.

    Distributed diagnosis of place-bordered petri nets

    IEEE Transactions on Automation Science and Engineering

    (2007)
  • A. Girault et al.

    Automating the addition of fault tolerance with discrete controller synthesis

    Formal Methods in System Design

    (2009)
  • N.B. Hadj-Alouane et al.

    Centralized and distributed algorithms for on-line synthesis of maximal control policies under partial observation

    Journal of Discrete Event Dynamic System: Theory and Applications

    (1996)
  • C. Hadjicostis

    Probabilistic fault detection in finite-state machines based on state occupancy measurements

    IEEE Transactions on Automatic Control

    (2005)
  • Inan, K. (1994). Nondeterministic supervison under partial observation. In Proceedings of the 11th international...
  • Iordache, M. V., & Antsaklis, P. J. (2004). Resilience to failure and reconfigurations in the supervision based on...
  • R. Isermann et al.

    Fault-tolerant drive-by-wire systems

    IEEE Control Systems Magazine

    (2002)
  • S. Jiang et al.

    Diagnosis of repeated failures for discrete event systems with linear-time temporal logic specifications

    IEEE Transactions on Automation Science and Engineering

    (2006)
  • Cited by (0)

    Andrea Paoli received the M.Eng. degree and the Ph.D. degree in 2000 and 2004 respectively, both from the University of Bologna. Since December 2008, he has an assistant professor position at the University of Bologna. Since 2006, he is a member of the IFAC Technical Committee on Fault Detection, Supervision and Safety of Technical Processes—SAFEPROCESS TC. He is co-author of about 50 technical-scientific publications and of a textbook on industrial automation. On July 2005 he won the AUTOMATICA best application paper award for the years 2002–2005 (for a paper co-authored with C. Bonivento, A. Isidori, L. Marconi). His actual main research interests regards industrial automation and in particular fault tolerant control architectures and discrete-event systems.

    Matteo Sartini received the M.Eng. degree and the Ph.D. degree in 2005 and 2010 respectively, both from the University of Bologna. In July 2005 and January 2010 he won, respectively, a research grant supported by the University of Bologna and Emilia-Romagna regional Council for the project titled Diagnosis and control for fault tolerant automation systems and a research grant supported by European JTI Artemis funded project CESAR (cost-efficient methods and processes for safety relevant embedded systems). His main research interests are industrial automation software architectures, discrete event systems, and fault tolerant control architectures.

    Stéphane Lafortune received the B.Eng. degree from Ecole Polytechnique de Montréal in 1980, the M.Eng. degree from McGill University in 1982, and the Ph.D. degree from the University of California at Berkeley in 1986, all in electrical engineering. Since September 1986, he has been with the University of Michigan, Ann Arbor, where he is a professor of Electrical Engineering and Computer Science.

    Dr. Lafortune is a Fellow of the IEEE (1999). He received the Presidential Young Investigator Award from the National Science Foundation in 1990 and the George S. Axelby Outstanding Paper Award from the Control Systems Society of the IEEE in 1994 (for a paper co-authored with S.L. Chung and F. Lin) and in 2001 (for a paper co-authored with G. Barrett).

    Dr. Lafortune is a member of the editorial boards of the Journal of Discrete Event Dynamic Systems: Theory and Applications and of the International Journal of Control. His research interests are in discrete event systems and include multiple problem domains: modeling, diagnosis, control, optimization, and applications to computer systems. He is co-developer of the software packages DESUMA and UMDES. He co-authored, with C. Cassandras, the textbook Introduction to Discrete Event Systems—Second Edition (Springer, 2008).

    The research of the first and second author is in part supported by MIUR and in part supported by the European Artemis Joint Undertaking funded project CESAR: Cost-efficient methods and processes for safety relevant embedded systems. The research of the third author is supported in part by NSF grants ECCS-0624821 and CNS-0930081. The material in this paper was partially presented at the 17th IFAC World Congress, July 6–11, 2008, Seoul, Korea. This paper was recommended for publication in revised form by Associate Editor Bart De Schutter under the direction of Editor Ian R. Petersen.

    View full text