Multiaction learning automata possessing ergodicity of the mean

doi:10.1016/0020-0255(85)90049-0

Information Sciences

Volume 35, Issue 3, June 1985, Pages 183-198

https://doi.org/10.1016/0020-0255(85)90049-0 Get rights and content

Abstract

Multiaction learning automata which update their action probabilities on the basis of the responses they get from an environment are considered in this paper. The automata update the probabilities according to whether the environment responds with a reward or a penalty. Learning automata are said to possess ergodicity of the mean if the mean action probability is the state probability (or unconditional probability) of an ergodic Markov chain. In an earlier paper [11] we considered the problem of a two-action learning automaton being ergodic in the mean (EM). The family of such automata was characterized completely by proving the necessary and sufficient conditions for automata to be EM. In this paper, we generalize the results of [11] and obtain necessary and sufficient conditions for the multiaction learning automaton to be EM. These conditions involve two families of probability updating functions. It is shown that for the automaton to be EM the two families must be linearly dependent. The vector defining the linear dependence is the only vector parameter which controls the rate of convergence of the automaton. Further, the technique for reducing the variance of the limiting distribution is discussed. Just as in the two-action case, it is shown that the set of absolutely expedient schemes and the set of schemes which possess ergodicity of the mean are mutually disjoint.

References (17)

M.F. Norman
Some convergence theorems for stochastic learning models with distance diminishing operators
J. Math. Psych.
(1968)
M.L. Tsetlin
On the behaviour of finite automata in random media
Avtomat. i Telemekh.
(1961)
M.L. Tsetlin
Automaton Theory and the Modelling of Biological Systems
(1973)
A. Paz
Introduction to Probabilistic Automata
(1971)
V.I. Varshavskii et al.
On the behaviour of stochastic automata with variable structure
Avtomat. i Telemekh.
(1963)
K. S. Narendra and M. A. L. Thathachar, to...
K.S. Narendra et al.
Learning automata—a survey
IEEE Trans. Systems Man Cybernet.
(1974)
D.L. Isaacson et al.
Markov Chains: Theory and Applications
(1976)

There are more references available in the full text version of this article.

Cited by (16)

A new approach to the design of reinforcement schemes for learning automata: Stochastic estimator learning algorithm
1995, Neurocomputing
In this paper a new approach to the design of S-model ergodic reinforcement learning algorithms is introduced. The new scheme utilizes a stochastic estimator and is able to operate in non-stationary environments with high accuracy and a high adaptation rate. According to the stochastic estimator scheme, which is the first attempt in the field, the estimates of the mean rewards of actions are computed stochastically. So, they are not strictly dependent on the environmental responses. The dependence between the stochastic estimates and the deterministic estimator's contents is more relaxed if the latter are not updated. In this way actions that have not been selected recently have the opportunity to be estimated as ‘optimal’, to increase their choice probability and consequently to be selected. Thus, the estimator is always recently updated and consequently able to adapt to environmental changes. The performance of the presented Stochastic Estimator Learning Automaton (SELA) is superior to all previous well-known S-model ergodic schemes. Furthermore it is proved that SELA is ϵ-optimal in every S-model random environment.
Ergodic discretized estimator learning automata with high accuracy and high adaptation rate for nonstationary environments
1992, Neurocomputing
In this paper a new ergodic discretized learning automaton which is epsilon- optimal is introduced. It utilizes a new estimator learning algorithm which is based on the recent history of the environmental responses and is able to operate in nonstationary stochastic environments. The proposed automaton achieves a significantly higher performance than the classic reward-penalty ergodic schemes. Extensive simulation results indicate the superiority of the proposed scheme. Furthermore, it is proved that it is epsilon-optimal in every stochastic environment.
Continuous and discretized pursuit learning schemes: Various algorithms and their comparison
2001, IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Multiple response learning automata
1996, IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Adaptive Directional Neighbor Discovery Schemes in Wireless Networks
2020, 2020 International Conference on Computing, Networking and Communications, ICNC 2020
Evaluating an Adaptive Web Traffic Routing Method for the Cloud
2019, 2019 IEEE ComSoc International Communications Quality and Reliability Workshop, CQR 2019

View all citing articles on Scopus

View full text

Multiaction learning automata possessing ergodicity of the mean

Abstract

J. Math. Psych.

On the behaviour of finite automata in random media

Avtomat. i Telemekh.

Automaton Theory and the Modelling of Biological Systems

Introduction to Probabilistic Automata

On the behaviour of stochastic automata with variable structure

Avtomat. i Telemekh.

Learning automata—a survey

IEEE Trans. Systems Man Cybernet.

Markov Chains: Theory and Applications