Abstract
The pre-cursor field to Reinforcement Learning is that of Learning Automata (LA). Within this field, Estimator Algorithms (EAs) can be said to be the state-of-the-art. Further, the subset of Pursuit Algorithms (PAs), discovered by Thathachar and Sastry [34, 39], were the pioneering schemes. This chapter contains a comprehensive survey of the various EAs, and the most recent convergence results for PAs. Unlike the prior LA, EAs are based on a fundamentally distinct phenomenon. They are also the most accurate LA, converging in the least time. EAs operate on two vectors, namely, the action probability vector which is updated using responses from the Environment, and quickly-computed estimates of the reward probabilities of the various actions. The proofs that they are \(\upvarepsilon \)-optimal is thus very complex. They have to incorporate two rather snon-orthogonal phenomena, which are the convergence of these estimates and the convergence of the probabilities of selecting the various actions. For almost three decades, the reported proofs of PAs possessed an infirmity (or flaw), which we refer to as the claim of the “monotonicity” property. This flaw was discovered by the authors of [37], who also provided an alternate proof for a specific PA where the scheme’s parameter decreased with time. This paper first records all the reported EAs. It then reports a comprehensive survey of the proofs from a different perspective. These proofs have not required that the sequence of action probabilities of selecting the optimal action satisfies the property of monotonicity. On the other hand, whenever any action probability is close enough to unity, we require that the process jumps to an absorbing barrier at the next time instant, i.e., in a single step. By requiring such a constraint, these proofs invoke the weaker property, i.e., the submartinagale property of \(p_m(t)\), to demonstrate the \(\upvarepsilon \)-optimality. We have thus proven the \(\upvarepsilon \)-optimality for the Absorbing CPA [49, 50], the Discretized PA [51, 52], and for the family of Bayesian PA [53], where the estimates are obtained by a Bayesian (rather than a Maximum Likelihood (ML)) process.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
In the rest of the paper, we put the counterparts for discretized algorithms in parentheses.
References
Agache, M.: Estimator Based Learning Algorithms. M.C.S. Thesis, School of Computer Science, Carleton University, Ottawa, Ontario, Canada (2000)
Agache, M., Oommen, B.J.: Generalized pursuit learning schemes: new families of continuous and discretized learning Automata. IEEE Trans. Syst. Man Cybern. Part B 32(6), 738–749 (2002)
Atlassis, A.F., Loukas, N.H., Vasilakos, A.V.: The use of learning algorithms in ATM networks call admission control problem: a methodology. Comput. Netw. 34, 341–353 (2000)
Atlassis, A.F., Vasilakos, A.V.: The use of reinforcement learning algorithms in traffic control of high speed networks. Advances in Computational Intelligence and Learning, pp. 353–369 (2002)
Beigy, H., Meybodi, M.R.: Adaptation of parameters of BP algorithm using learning automata. In: Proceedings of Sixth Brazilian Symposium on Neural Networks. JR, Brazil, pp. 24–31 (2000)
Dean, T., Angluin, D., Basye, K., Engelson, S., Aelbling, L., Maron, O.: Inferring finite automata with stochastic output functions and an application to map learning. Mach. Learn. 18, 81–108 (1995)
Erus, G., Polat, F.: A layered approach to learning coordination knowledge in multiagent environments. Appl. Intell. 27, 249–267 (2007)
Granmo, O.C.: Solving stochastic nonlinear resource allocation problems using a hierarchy of twofold resource allocation automata. IEEE Trans. Comput. 59(4), 545–560 (2010)
Granmo, O.C., Glimsdal, S.: Accelerated Bayesian learning for decentralized two-armed bandit based decision making with applications to the Goore game. Appl. Intell. 38, 479–488 (2013)
Granmo, O.C., Oommen, B.J.: On allocating limited sampling resources using a learning automata-based solution to the fractional knapsack problem. In: Proceedings of the 2006 International Intelligent Information Processing and Web Mining Conference, Advances in Soft Computing, vol. 35, Ustron, Poland, pp. 263–272 (2006)
Granmo, O.C., Oommen, B.J.: Optimal sampling for estimation with constrained resources using a learning automaton-based solution for the nonlinear fractional knapsack problem. Appl. Intell. 33(1), 3–20 (2010)
Granmo, O.C., Oommen, B.J., Myrer, S.A., Olsen, M.G.: Learning automata-based solutions to the nonlinear fractional knapsack problem with applications to optimal resource allocation. IEEE Trans. Syst. Man Cybern. Part B 37(1), 166–175 (2007)
Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58, 13–30 (1963)
Hong, J., Prabhu, V.V.: Distributed reinforcement learning control for batch sequencing and sizing in just-in-time manufacturing systems. Appl. Intell. 20, 71–87 (2004)
Kabudian, J., Meybodi, M.R., Homayounpour, M.M.: Applying continuous action reinforcement learning automata (CARLA) to global training of hidden markov models. In: Proceedings of ITCC’04, the International Conference on Information Technology: Coding and Computing, Las Vegas, Nevada, 2004, pp. 638–642
Lakshmivarahan, S.: Learning Algorithms Theory and Applications. Springer (1981)
Lanctot, J.K., Oommen, B.J.: On discretizing estimator-based learning algorithms. IEEE Trans. on Systems, Man, and Cybernetics, Part B: Cybernetics 2, 1417–1422 (1991)
Lanctot, J.K., Oommen, B.J.: Discretized estimator learning automata. IEEE Trans. Syst. Man Cybern. Part B: Cybern. 22(6), 1473–1483 (1992)
Li, J., Li, Z., Chen, J.: Microassembly path planning using reinforcement learning for improving positioning accuracy of a \(1~cm^3\) omni-directional mobile microrobot. Appl. Intell. 34, 211–225 (2011)
Meybodi, M.R., Beigy, H.: New learning automata based algorithms for adaptation of backpropagation algorithm parameters. Int. J. Neural Syst. 12, 45–67 (2002)
Misra, S., Oommen, B.J.: GPSPA?: A new adaptive algorithm for maintaining shortest path routing trees in stochastic networks. Int. J. Commun. Syst. 17, 963–984 (2004)
Najim, K., Poznyak, A.S.: Learning Automata: Theory and Applications. Pergamon Press, Oxford (1994)
Narendra, K.S., Thathachar, M.A.L.: Learning Automata: An Introduction. Prentice Hall (1989)
Obaidat, M.S., Papadimitriou, G.I., Pomportsis, A.S., Laskaridis, H.S.: Learning automata-based bus arbitration for shared-medium ATM switches. IEEE Trans. Syst. Man Cybern. Part B 32, 815–820 (2002)
Oommen, B.J.: Stochastic searching on the line and its applications to parameter learning in nonlinear optimization. IEEE Trans. Syst. Man Cybern. Part B: Cybern. 27(4), 733–739 (1997)
Oommen, B.J., Granmo, O.C., Pedersen, A.: Using stochastic AI techniques to achieve unbounded resolution in finite player Goore Games and its applications. In: Proceedings of IEEE Symposium on Computational Intelligence and Games, Honolulu, HI, pp. 161–167 (2007)
Oommen, B.J., Lanctot, J.K.: Discretized pursuit learning automata. IEEE Trans. Syst. Man Cybern. 20, 931–938 (1990)
Oommen, B.J.: Absorbing and ergodic discretized two-action learning automata. IEEE Trans. Syst. Man Cybern. 16, 282–296 (1986)
Oommen, B.J., Agache, M.: Continuous and discretized pursuit learning schemes: various algorithms and their comparison. IEEE Trans. Syst. Man Cybern. 31(3), 277–287 (2001)
Oommen, B.J., Croix, T.D.S.: Graph partitioning using learning automata. IEEE Trans. Comput. 45, 195–208 (1996)
Oommen, B.J., Roberts, T.D.: Continuous learning automata solutions to the capacity assignment problem. IEEE Trans. Comput. 49, 608–620 (2000)
Papadimitriou, G.I., Pomportsis, A.S.: Learning-automata-based TDMA protocols for broadcast communication systems with bursty traffic. IEEE Commun. Lett. 107–109 (2000)
Poznyak, A.S., Najim, K.: Learning Automata and Stochastic Optimization. Springer, Berlin (1997)
Sastry, P.S.: Systems of Learning Automata: Estimator Algorithms Applications. Ph.D. Thesis, Department of Electrical Engineering, Indian Institute of Science, Bangalore, India (1985)
Rajaraman, K., Sastry, P.S.: Finite time analysis of the pursuit algorithm for learning automata. IEEE Trans. Syst. Man Cybern. Part B: Cybern. 26, 590–598 (1996)
Seredynski, F.: Distributed scheduling using simple learning machines. Eur. J. Oper. Res. 107, 401–413 (1998)
Ryan, M., Omkar, T.: On \(\upvarepsilon \)-optimality of the pursuit learning algorithm. J. Appl. Probab. 49(3), 795–805 (2012)
Thathachar, M.A.L., Sastry, P.S.: A class of rapidly converging algorithms for learning automata. IEEE Trans. Syst. Man Cybern. SMC-15, 168–175 (1985)
Thathachar, M.A.L., Sastry, P.S.: Estimator algorithms for learning automata. In: Proceedings of the Platinum Jubilee Conference on Systems and Signal Processing, Department of Electrical Engineering, Indian Institute of Science, Bangalore, India, Dec 1986, pp. 29–32
Thathachar, M.A.L.T., Sastry, P.S.: Networks of Learning Automata?: Techniques for Online Stochastic Optimization. Kluwer Academic, Boston (2003)
Torkestani, J.A.: An adaptive focused web crawling algorithm based on learning automata. Appl. Intell. 37, 586–601 (2012)
Unsal, C., Kachroo, P., Bay, J.S.: Multiple stochastic learning automata for vehicle path control in an automated highway system. IEEE Trans. Syst. Man Cybern. Part A 29, 120–128 (1999)
Vafashoar, R., Meybodi, M.R., Momeni, A.A.H.: CLA-DE: a hybrid model based on cellular learning automata for numerical optimization. Appl. Intell. 36, 735–748 (2012)
Vasilakos, A., Saltouros, M.P., Atlassis, A.F., Pedrycz, W.: Optimizing QoS routing in hierarchical ATM networks using computational intelligence techniques. IEEE Trans. Syst. Sci. Cybern. Part C 33, 297–312 (2003)
Yazidi, A., Granmo, O.C., Oommen, B.J.: Service selection in stochastic environments: a learning-automaton based solution. Appl. Intell. 36, 617–637 (2012)
Zhang, X., Granmo, O.C., Oommen, B.J.: The Bayesian pursuit algorithm: A new family of estimator learning automata. In: Proceedings of IEAAIE2011. pp. 608–620. Springer, New York, USA (2011)
Zhang, X., Granmo, O.C., Oommen, B.J.: Discretized Bayesian pursuit—a new scheme for reinforcement learning. In: Proceedings of IEAAIE2012. Dalian, China, pp. 784–793 (2012)
Zhang, X., Granmo, O.C., Oommen, B.J.: On incorporating the paradigms of discretization and Bayesian estimation to create a new family of pursuit learning automata. Appl. Intell. 39, 782–792 (2013)
Zhang, X., Granmo, O.C., Oommen, B.J., Jiao, L.: On using the theory of regular functions to prove the \(\epsilon \)-optimality of the continuous pursuit learning automaton. In: Proceedings of IEAAIE2013, pp. 262–271. Springer, Amsterdan, Holland (2013)
Zhang, X., Granmo, O.C., Oommen, B.J., Jiao, L.: A formal proof of the \(\upvarepsilon \)-optimality of absorbing continuous pursuit algorithms using the theory of regular functions. Appl. Intell. 41, 974–985 (2014)
Zhang, X., Oommen, B.J., Granmo, O.C., Jiao, L.: Using the theory of regular functions to formally prove the \(\epsilon \)-optimality of discretized pursuit learning algorithms. In: Proceedings of IEAAIE2014, pp. 379–388. Springer, Kaohsiung, Taiwan (2014)
Zhang, X., Oommen, B.J., Granmo, O.C., Jiao, L.: A formal proof of the \(\upvarepsilon \)-optimality of discretized pursuit algorithms. Appl. Intell. (2015). https://doi.org/10.1007/s10489-015-0670-1
Zhang, X., Oommen, B.J., Granmo, O.C.: The design of absorbing bayesian pursuit algorithms and the formal analyses of their \(\epsilon \)-Optimality. Pattern Anal. Appl. 20(3) (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
John Oommen, B., Zhang, X., Jiao, L. (2022). A Comprehensive Survey of Estimator Learning Automata and Their Recent Convergence Results. In: Nicopolitidis, P., Misra, S., Yang, L.T., Zeigler, B., Ning, Z. (eds) Advances in Computing, Informatics, Networking and Cybersecurity. Lecture Notes in Networks and Systems, vol 289. Springer, Cham. https://doi.org/10.1007/978-3-030-87049-2_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-87049-2_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-87048-5
Online ISBN: 978-3-030-87049-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)