Brief PaperA strategy for controlling nonlinear systems using a learning automaton☆
Introduction
The behavior of a complex system can be observed by measuring a number of external variables such as displacements, pressures, temperatures, etc. (Vasseur, 1982). In classical approaches, model-based adaptive control strategies have been extensively used in many industrial applications (e.g. robot control, process control, etc.). In a model-based adaptive control strategy, the parameters of the model are estimated by minimizing the error between the model and the system (Brogan, 1974; Fargeon, 1986).
In many practical control problems, a model and its system might considerably diverge due to parametric and nonparametric uncertainties such as unmodeled dynamics, measurement noise and computation roundoff errors (Zomaya, 1994). Moreover, calculation with model is usually heavy for complex nonlinear systems, which makes real-time control problems rather difficult. Therefore, it is necessary to develop free-model typed control strategies using the information extracted from external measured variables only.
A control strategy can be built based on the theory of reinforcement learning, which has been successfully applied for solving problems involving decision making under uncertainty (Narendra & Thathachar, 1989; Barto, Sutton & Anderson, 1983; Zikidis & Vasilakos, 1996). In general, a reinforcement learning algorithm is included in an adaptive element for different tasks. It conducts a stochastic search of the output space, using only an approximative indication of the “correctness” (reward) of the output value it produced in every iteration. Based on this indication, a reinforcement learning algorithm generates, in each iteration, an error signal giving the difference between the actual and correct response and the adaptive element uses this error signal to update its parameters. This sequence is repeated until the error signal tends to zero.
Compared to the supervised learning methods, reinforcement learning algorithms require less information. In some problems such as real-time control and monitoring of dynamic systems, it is hard or expensive to obtain a priori information and then reinforcement learning is more suitable than supervised learning (Zikidis & Vasilakos, 1996).
A great number of reinforcement learning algorithms has been developed for controlling dynamic systems and other tasks. Barto et al. (1983) used neurolike adaptive elements to control the pole balancing of a cart–pole system. The learning system is composed of a single associate search element (ASE) and a single adaptive critic element (ACE). The ASE constructs associations between input and output by searching under the influence of reinforcement feedback and the ACE constructs a more informative evaluation function than reinforcement feedback alone can provide.
Watkins (1992) developed a general incremental learning method called Q-learning to model reinforcement in artificial creatures and robots. It was initially used for solving Markovian decision problems with incomplete information and then considered as a method for adaptive on-line control. At each step, Q-learning algorithm directly estimates the optimal Q-values for pairs of states and admissible control actions according to the current value of an evaluation function. The controller randomly selects an action using these estimated Q-values. This procedure is repeated until the goal state is reached. If the optimal Q-values are available, an optimal control strategy can be determined with relatively little computation. The behavior of Q-learning is rather close to data from animal experiments, especially when the number of states is small.
In this paper, we propose a control strategy based on a learning automaton in order to control dynamic nonlinear systems. The basic idea is briefly presented as follows.
Learning automata are adaptive decision-making devices operating on unknown random environments (Narendra and Thathachar 1974, Narendra and Thathachar 1989). The learning automaton has a finite set of actions and each action has a certain probability (unknown to the automaton) of getting rewarded by the controlled system, which is considered as environment of the automaton. The aim is to learn to choose the optimal action (i.e. the action with the highest probability of being rewarded) through repeated interaction on the system. If the learning algorithm is chosen properly, then the iterative process of interacting on the system can be made to result in the selection of the optimal action.
Shapiro and Narendra (1969) adopted a stochastic automata model to find an optimal solution for multi-modal performance criteria. Thathachar and Sastry (1985) proposed an estimator automata model called pursuit algorithm. This algorithm is very simple and converges rapidly in simulations. Oommen and Lanctot (1990) proposed a automata model by using discretized pursuit algorithm that improves the convergence speed of the automaton. Both continuous and discretized versions of the pursuit algorithm are known to be -optimal (Oommen & Lanctot, 1990).
In the learning automaton of our control strategy, we propose a new reinforcement scheme based on the continuous pursuit algorithm. The proposed automaton tries to find a series of local optimal actions, which are applied to the system by the following steps:
This paper is organized as follows. We present the principle of learning automata and its evaluation criteria in Section 2. In Section 3, we illustrate our control strategy. Section 4 gives some discussion on the proposed algorithm. Our control strategy has been applied to a nonlinear system: a continuous production bioprocess. The corresponding simulation results as well as the comparison with the learning algorithms of the Barto's Adaptive Critic type and Q-learning are given in Section 5. The final conclusion is included in Section .
Section snippets
Learning automata
Fig. 1 illustrates how a stochastic automaton works in feedback connection with a random environment. The output of the automaton at instant (actions) is also the input of the system. At instant , the input of the automaton is generated according to the current output of the system as well as its desired value .
A learning automaton is completely defined by , where is the set of all actions of the automaton. The action of the automaton at instant
Control strategy
In this section, we present our control strategy for unidimensional nonlinear sampled systems. At each sampling instant , this control strategy generates a control corresponding to the action selected by the proposed learning automaton.
Assuming that and are the control and the output of the system, respectively, and is the desired value of the output, the objective of the proposed control strategy is to select a series of actions to the system, so that approaches ,
Application in bioprocess control
The proposed strategy has been successfully applied to the control of several nonlinear systems. Next, we present its application to a bioprocess control problem.
The behavior of the continuous bioprocess is described by the dynamic equations in Ferret, Lakrori and Cheruy (1992) and Dantigny, Ninow and Lakrori (1991). This strongly nonlinear system functions as follows. Given a desired concentration in substrate , we adjust the dilution proposition (input to the bioprocess) so that the real
Conclusion
Several characteristics of the proposed strategy are summarized as follows:
(1) Our learning automaton acts on a nonstationary environment in which reward probabilities are defined as functions of the response and then vary with relative position between and . So, there exists only local optimal actions each available only for several sampling periods. This control strategy is designed to drive the system output to the desired value by alternatively applying the control values
Xianyi Zeng was born in Tianjin, People's Republic of China, in 1963. He received the degree in Computer Science and Technology from Tsinghua University, Beijing, People's Republic of China, in 1986, and Ph.D. in automation from the Université des Sciences et Technologies de Lille, France, in 1992. He works now as an associate professor in the ENSAIT Textile Engineering Institute, Roubaix, France. His research interests include pattern recognition, data analysis, computer modeling and their
References (15)
The control of fed-batch fermentation processes — a survey
Automatica
(1987)- et al.
ASAFES2: a novel-fuzzy architecture for fuzzy computing, based on functional reasoning
Fuzzy Sets and Systems
(1996) - et al.
Neurolike adaptive elements that can solve difficult learning control problems
IEEE Transactions on Systems, Man and Cybernetics
(1983) - Brogan, W. L. (1974). Modern control theory. Quantum Publishers,...
- et al.
A new control strategy for yeast production based on the L/A approach
Applied Microbiology and Biotechnology
(1991) - Fargeon, C. (1986). Commande numérique des systèmes: applications aux engins mobiles et aux robots. Masson:...
- Ferret, E., Lakrori, M., & Cheruy, A. (1992). Prise en compte des contraintes en commande de procédé: les algorithmes...
Cited by (29)
Automatic data clustering using continuous action-set learning automata and its application in segmentation of images
2017, Applied Soft Computing JournalCitation Excerpt :Recently, researchers have used some techniques to cope with this problem [13,15,16,19,20]. Learning automata (LA) [21,22] is a heuristic approach that can be used in a wide range of applications such as pattern recognition [23], signal processing [25], adaptive control [24], and image segmentation [27]. Unlike nature inspired meta-heuristic algorithms, the proposed algorithm benefits from the CALA algorithm features [49] and has several advantages: (1) techniques based on nature inspired meta-heuristic algorithms (for example ACDE, DCPSO, GCUK and classical DE) for finding the correct number of clusters and their partitions use encoding methods such as chromosome encoding or similar methods.
Learning automata for image segmentation
2016, Pattern Recognition LettersCitation Excerpt :Different learning automaton algorithms are distinguished by the ways their PDFs are updated. They find applications in signal processing [14], feedback control systems [13,33], power systems [30], and image processing [2,8,22,25]. The randomness in the generation of parameter estimates provides learning automaton algorithms with obvious advantages over the gradient-based and EM algorithms.
A cellular learning automata model of investment behavior in the stock market
2013, NeurocomputingCitation Excerpt :Fig. 1 shows the interaction between a learning automaton and its environment, where α(n) denotes the action selected by the automaton. For more applications and details, see [20,21,25,34–36] and the references therein. Cellular learning automata divide into two main groups: synchronous and asynchronous [6].
A symbol-based intelligent control system with self-exploration process
2008, Engineering Applications of Artificial IntelligenceLearning automata based classifier
2008, Pattern Recognition LettersCitation Excerpt :This procedure is continued to reach the optimal action. LA has been used in several tasks of engineering problems (for example graph partitioning (Oommen and de St.Criox, 1996) adaptive control (Zeng et al., 2000), signal processing (Tang and Mars, 1993), and power systems (Wu, 1995)). Also there are researches which explain some applications of LA in pattern recognition tasks.
A new learning algorithm for the hierarchical structure learning automata operating in the nonstationary s-model random environment
2002, IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Xianyi Zeng was born in Tianjin, People's Republic of China, in 1963. He received the degree in Computer Science and Technology from Tsinghua University, Beijing, People's Republic of China, in 1986, and Ph.D. in automation from the Université des Sciences et Technologies de Lille, France, in 1992. He works now as an associate professor in the ENSAIT Textile Engineering Institute, Roubaix, France. His research interests include pattern recognition, data analysis, computer modeling and their applications in textile industry.
Jiande Zhou was born in Zhejiang, People's Republic of China, in 1966. He received the degree in automation from Beijing Polytechnic University, Beijing, People's Republic of China, the Master degree in discrete mathematics from the University of Aix-Marseille-II, Marseille, France, the DEA degree in production engineering from the Lille-I University and Ph.D. in production engineering from the Louis Pasteur University of Strasbourg, France, in 1989, 1993, 1994 and 1998, respectively. He worked as engineer in Research Center of Sciences and Applications of Space, Chinese Academy of Sciences, Beijing, China, from 1989 to 1990, and in Eureka Soft. Telephony and Telecommunication Company, Paris, France, from 1998 to 1999. Now, he is an engineer in TECH-ASI Computers Company, Paris, France. His research interests include production engineering; telephony and telecommunication; combinatory optimization, intelligent control; data analysis; artificial intelligence.
Christian Vasseur was born in Cambrai, France, on January 5, 1947. He received the “Ingenieur” degree from the “Institut Industriel du Nord ” Engineering Institute (France) in 1970, the “Docteur Ingenieur” degree in 1972 and the Ph.D. degree in 1982, from the Lille-1 University (France). From 1972 to 1974 he worked as Research Assistant in the Department of Electrical Engineering of the Sherbrooke University (Quebec, Canada) in the area of Biological and Medical Engineering. He joined the Lille-1 University, France, in 1974. As professor at this University, he created a research team in signal processing and automatic classification in 1980s. From 1988 to 1997 he was the head of the ENSAIT National Textile Engineering Institute in Roubaix, France. Since 1997 he is the head of the I3D Automation Laboratory (Interaction, Image and Decision-Making Engineering) at the Lille-1 University. Dr. Vasseur has published over 100 scientific papers and communications. His interests are in the field of real time automatic classification applied to signal and image processing. Concerning image, he is specialised in medical imaging (IRM, CT, etc.) used in stereotaxy: preparation of operating protocols, tumoral volumes modelling, doses optimisation in radiotherapy, computer-aided surgery. For more details see: http://www-i3d.univ-lille1.frcrva/index.htm
- ☆
This paper was not presented at any IFAC meeting. This paper was recommended for publication in revised form by Associate Editor P.J. Fleming under the direction of Editor S. Skogestad.