Elsevier

Automatica

Volume 36, Issue 10, October 2000, Pages 1517-1524
Automatica

Brief Paper
A strategy for controlling nonlinear systems using a learning automaton

https://doi.org/10.1016/S0005-1098(00)00066-2Get rights and content

Abstract

This paper presents an application of learning automaton (LA) for nonlinear system control. The proposed control strategy utilizes a learning automaton in which the reinforcement scheme is based on the Pursuit Algorithm interacting with a nonstationary environment. Modulated by an adaptive mechanism, the LA selects, at each control period, a local optimal action, which serves as input to the controlled system. During the control procedure, the system output value takes into account the changes occurring inside the system and provides reward/penalty responses to the learning automaton.

Introduction

The behavior of a complex system can be observed by measuring a number of external variables such as displacements, pressures, temperatures, etc. (Vasseur, 1982). In classical approaches, model-based adaptive control strategies have been extensively used in many industrial applications (e.g. robot control, process control, etc.). In a model-based adaptive control strategy, the parameters of the model are estimated by minimizing the error between the model and the system (Brogan, 1974; Fargeon, 1986).

In many practical control problems, a model and its system might considerably diverge due to parametric and nonparametric uncertainties such as unmodeled dynamics, measurement noise and computation roundoff errors (Zomaya, 1994). Moreover, calculation with model is usually heavy for complex nonlinear systems, which makes real-time control problems rather difficult. Therefore, it is necessary to develop free-model typed control strategies using the information extracted from external measured variables only.

A control strategy can be built based on the theory of reinforcement learning, which has been successfully applied for solving problems involving decision making under uncertainty (Narendra & Thathachar, 1989; Barto, Sutton & Anderson, 1983; Zikidis & Vasilakos, 1996). In general, a reinforcement learning algorithm is included in an adaptive element for different tasks. It conducts a stochastic search of the output space, using only an approximative indication of the “correctness” (reward) of the output value it produced in every iteration. Based on this indication, a reinforcement learning algorithm generates, in each iteration, an error signal giving the difference between the actual and correct response and the adaptive element uses this error signal to update its parameters. This sequence is repeated until the error signal tends to zero.

Compared to the supervised learning methods, reinforcement learning algorithms require less information. In some problems such as real-time control and monitoring of dynamic systems, it is hard or expensive to obtain a priori information and then reinforcement learning is more suitable than supervised learning (Zikidis & Vasilakos, 1996).

A great number of reinforcement learning algorithms has been developed for controlling dynamic systems and other tasks. Barto et al. (1983) used neurolike adaptive elements to control the pole balancing of a cart–pole system. The learning system is composed of a single associate search element (ASE) and a single adaptive critic element (ACE). The ASE constructs associations between input and output by searching under the influence of reinforcement feedback and the ACE constructs a more informative evaluation function than reinforcement feedback alone can provide.

Watkins (1992) developed a general incremental learning method called Q-learning to model reinforcement in artificial creatures and robots. It was initially used for solving Markovian decision problems with incomplete information and then considered as a method for adaptive on-line control. At each step, Q-learning algorithm directly estimates the optimal Q-values for pairs of states and admissible control actions according to the current value of an evaluation function. The controller randomly selects an action using these estimated Q-values. This procedure is repeated until the goal state is reached. If the optimal Q-values are available, an optimal control strategy can be determined with relatively little computation. The behavior of Q-learning is rather close to data from animal experiments, especially when the number of states is small.

In this paper, we propose a control strategy based on a learning automaton in order to control dynamic nonlinear systems. The basic idea is briefly presented as follows.

Learning automata are adaptive decision-making devices operating on unknown random environments (Narendra and Thathachar 1974, Narendra and Thathachar 1989). The learning automaton has a finite set of actions and each action has a certain probability (unknown to the automaton) of getting rewarded by the controlled system, which is considered as environment of the automaton. The aim is to learn to choose the optimal action (i.e. the action with the highest probability of being rewarded) through repeated interaction on the system. If the learning algorithm is chosen properly, then the iterative process of interacting on the system can be made to result in the selection of the optimal action.

Shapiro and Narendra (1969) adopted a stochastic automata model to find an optimal solution for multi-modal performance criteria. Thathachar and Sastry (1985) proposed an estimator automata model called pursuit algorithm. This algorithm is very simple and converges rapidly in simulations. Oommen and Lanctot (1990) proposed a automata model by using discretized pursuit algorithm that improves the convergence speed of the automaton. Both continuous and discretized versions of the pursuit algorithm are known to be ε-optimal (Oommen & Lanctot, 1990).

In the learning automaton of our control strategy, we propose a new reinforcement scheme based on the continuous pursuit algorithm. The proposed automaton tries to find a series of local optimal actions, which are applied to the system by the following steps:


This paper is organized as follows. We present the principle of learning automata and its evaluation criteria in Section 2. In Section 3, we illustrate our control strategy. Section 4 gives some discussion on the proposed algorithm. Our control strategy has been applied to a nonlinear system: a continuous production bioprocess. The corresponding simulation results as well as the comparison with the learning algorithms of the Barto's Adaptive Critic type and Q-learning are given in Section 5. The final conclusion is included in Section 6.

Section snippets

Learning automata

Fig. 1 illustrates how a stochastic automaton works in feedback connection with a random environment. The output α(k) of the automaton at instant k (actions) is also the input of the system. At instant k, the input β(k) of the automaton is generated according to the current output Y(k) of the system as well as its desired value Y.

A learning automaton is completely defined by (A,Q,R,T), where A={α1,α2,…,αr} is the set of all actions of the automaton. The action of the automaton at instant kα(k)

Control strategy

In this section, we present our control strategy for unidimensional nonlinear sampled systems. At each sampling instant k, this control strategy generates a control U(k) corresponding to the action selected by the proposed learning automaton.

Assuming that U(k) and Y(k) are the control and the output of the system, respectively, and Y is the desired value of the output, the objective of the proposed control strategy is to select a series of actions to the system, so that Y(k) approaches Y,

Application in bioprocess control

The proposed strategy has been successfully applied to the control of several nonlinear systems. Next, we present its application to a bioprocess control problem.

The behavior of the continuous bioprocess is described by the dynamic equations in Ferret, Lakrori and Cheruy (1992) and Dantigny, Ninow and Lakrori (1991). This strongly nonlinear system functions as follows. Given a desired concentration in substrate S, we adjust the dilution proposition D (input to the bioprocess) so that the real

Conclusion

Several characteristics of the proposed strategy are summarized as follows:

(1)Our learning automaton acts on a nonstationary environment in which reward probabilities are defined as functions of the response β(k) and then vary with relative position between Y(k) and Y. So, there exists only local optimal actions each available only for several sampling periods. This control strategy is designed to drive the system output to the desired value by alternatively applying the control values

Xianyi Zeng was born in Tianjin, People's Republic of China, in 1963. He received the degree in Computer Science and Technology from Tsinghua University, Beijing, People's Republic of China, in 1986, and Ph.D. in automation from the Université des Sciences et Technologies de Lille, France, in 1992. He works now as an associate professor in the ENSAIT Textile Engineering Institute, Roubaix, France. His research interests include pattern recognition, data analysis, computer modeling and their

References (15)

  • A. Johnson

    The control of fed-batch fermentation processes — a survey

    Automatica

    (1987)
  • K.C. Zikidis et al.

    ASAFES2: a novel-fuzzy architecture for fuzzy computing, based on functional reasoning

    Fuzzy Sets and Systems

    (1996)
  • A.G. Barto et al.

    Neurolike adaptive elements that can solve difficult learning control problems

    IEEE Transactions on Systems, Man and Cybernetics

    (1983)
  • Brogan, W. L. (1974). Modern control theory. Quantum Publishers,...
  • P. Dantigny et al.

    A new control strategy for yeast production based on the L/A approach

    Applied Microbiology and Biotechnology

    (1991)
  • Fargeon, C. (1986). Commande numérique des systèmes: applications aux engins mobiles et aux robots. Masson:...
  • Ferret, E., Lakrori, M., & Cheruy, A. (1992). Prise en compte des contraintes en commande de procédé: les algorithmes...
There are more references available in the full text version of this article.

Cited by (29)

  • Automatic data clustering using continuous action-set learning automata and its application in segmentation of images

    2017, Applied Soft Computing Journal
    Citation Excerpt :

    Recently, researchers have used some techniques to cope with this problem [13,15,16,19,20]. Learning automata (LA) [21,22] is a heuristic approach that can be used in a wide range of applications such as pattern recognition [23], signal processing [25], adaptive control [24], and image segmentation [27]. Unlike nature inspired meta-heuristic algorithms, the proposed algorithm benefits from the CALA algorithm features [49] and has several advantages: (1) techniques based on nature inspired meta-heuristic algorithms (for example ACDE, DCPSO, GCUK and classical DE) for finding the correct number of clusters and their partitions use encoding methods such as chromosome encoding or similar methods.

  • Learning automata for image segmentation

    2016, Pattern Recognition Letters
    Citation Excerpt :

    Different learning automaton algorithms are distinguished by the ways their PDFs are updated. They find applications in signal processing [14], feedback control systems [13,33], power systems [30], and image processing [2,8,22,25]. The randomness in the generation of parameter estimates provides learning automaton algorithms with obvious advantages over the gradient-based and EM algorithms.

  • A cellular learning automata model of investment behavior in the stock market

    2013, Neurocomputing
    Citation Excerpt :

    Fig. 1 shows the interaction between a learning automaton and its environment, where α(n) denotes the action selected by the automaton. For more applications and details, see [20,21,25,34–36] and the references therein. Cellular learning automata divide into two main groups: synchronous and asynchronous [6].

  • A symbol-based intelligent control system with self-exploration process

    2008, Engineering Applications of Artificial Intelligence
  • Learning automata based classifier

    2008, Pattern Recognition Letters
    Citation Excerpt :

    This procedure is continued to reach the optimal action. LA has been used in several tasks of engineering problems (for example graph partitioning (Oommen and de St.Criox, 1996) adaptive control (Zeng et al., 2000), signal processing (Tang and Mars, 1993), and power systems (Wu, 1995)). Also there are researches which explain some applications of LA in pattern recognition tasks.

View all citing articles on Scopus

Xianyi Zeng was born in Tianjin, People's Republic of China, in 1963. He received the degree in Computer Science and Technology from Tsinghua University, Beijing, People's Republic of China, in 1986, and Ph.D. in automation from the Université des Sciences et Technologies de Lille, France, in 1992. He works now as an associate professor in the ENSAIT Textile Engineering Institute, Roubaix, France. His research interests include pattern recognition, data analysis, computer modeling and their applications in textile industry.

Jiande Zhou was born in Zhejiang, People's Republic of China, in 1966. He received the degree in automation from Beijing Polytechnic University, Beijing, People's Republic of China, the Master degree in discrete mathematics from the University of Aix-Marseille-II, Marseille, France, the DEA degree in production engineering from the Lille-I University and Ph.D. in production engineering from the Louis Pasteur University of Strasbourg, France, in 1989, 1993, 1994 and 1998, respectively. He worked as engineer in Research Center of Sciences and Applications of Space, Chinese Academy of Sciences, Beijing, China, from 1989 to 1990, and in Eureka Soft. Telephony and Telecommunication Company, Paris, France, from 1998 to 1999. Now, he is an engineer in TECH-ASI Computers Company, Paris, France. His research interests include production engineering; telephony and telecommunication; combinatory optimization, intelligent control; data analysis; artificial intelligence.

Christian Vasseur was born in Cambrai, France, on January 5, 1947. He received the “Ingenieur” degree from the “Institut Industriel du Nord ” Engineering Institute (France) in 1970, the “Docteur Ingenieur” degree in 1972 and the Ph.D. degree in 1982, from the Lille-1 University (France). From 1972 to 1974 he worked as Research Assistant in the Department of Electrical Engineering of the Sherbrooke University (Quebec, Canada) in the area of Biological and Medical Engineering. He joined the Lille-1 University, France, in 1974. As professor at this University, he created a research team in signal processing and automatic classification in 1980s. From 1988 to 1997 he was the head of the ENSAIT National Textile Engineering Institute in Roubaix, France. Since 1997 he is the head of the I3D Automation Laboratory (Interaction, Image and Decision-Making Engineering) at the Lille-1 University. Dr. Vasseur has published over 100 scientific papers and communications. His interests are in the field of real time automatic classification applied to signal and image processing. Concerning image, he is specialised in medical imaging (IRM, CT, etc.) used in stereotaxy: preparation of operating protocols, tumoral volumes modelling, doses optimisation in radiotherapy, computer-aided surgery. For more details see: http://www-i3d.univ-lille1.frcrva/index.htm

This paper was not presented at any IFAC meeting. This paper was recommended for publication in revised form by Associate Editor P.J. Fleming under the direction of Editor S. Skogestad.

View full text