A Comprehensive Survey of Estimator Learning Automata and Their Recent Convergence Results

John Oommen, B.; Zhang, Xuan; Jiao, Lei

doi:10.1007/978-3-030-87049-2_2

B. John Oommen^14,15,
Xuan Zhang¹⁶ &
Lei Jiao¹⁵

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 289))

690 Accesses

Abstract

The pre-cursor field to Reinforcement Learning is that of Learning Automata (LA). Within this field, Estimator Algorithms (EAs) can be said to be the state-of-the-art. Further, the subset of Pursuit Algorithms (PAs), discovered by Thathachar and Sastry [34, 39], were the pioneering schemes. This chapter contains a comprehensive survey of the various EAs, and the most recent convergence results for PAs. Unlike the prior LA, EAs are based on a fundamentally distinct phenomenon. They are also the most accurate LA, converging in the least time. EAs operate on two vectors, namely, the action probability vector which is updated using responses from the Environment, and quickly-computed estimates of the reward probabilities of the various actions. The proofs that they are \(\upvarepsilon \)-optimal is thus very complex. They have to incorporate two rather snon-orthogonal phenomena, which are the convergence of these estimates and the convergence of the probabilities of selecting the various actions. For almost three decades, the reported proofs of PAs possessed an infirmity (or flaw), which we refer to as the claim of the “monotonicity” property. This flaw was discovered by the authors of [37], who also provided an alternate proof for a specific PA where the scheme’s parameter decreased with time. This paper first records all the reported EAs. It then reports a comprehensive survey of the proofs from a different perspective. These proofs have not required that the sequence of action probabilities of selecting the optimal action satisfies the property of monotonicity. On the other hand, whenever any action probability is close enough to unity, we require that the process jumps to an absorbing barrier at the next time instant, i.e., in a single step. By requiring such a constraint, these proofs invoke the weaker property, i.e., the submartinagale property of \(p_m(t)\), to demonstrate the \(\upvarepsilon \)-optimality. We have thus proven the \(\upvarepsilon \)-optimality for the Absorbing CPA [49, 50], the Discretized PA [51, 52], and for the family of Bayesian PA [53], where the estimates are obtained by a Bayesian (rather than a Maximum Likelihood (ML)) process.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
In the rest of the paper, we put the counterparts for discretized algorithms in parentheses.

References

Agache, M.: Estimator Based Learning Algorithms. M.C.S. Thesis, School of Computer Science, Carleton University, Ottawa, Ontario, Canada (2000)
Google Scholar
Agache, M., Oommen, B.J.: Generalized pursuit learning schemes: new families of continuous and discretized learning Automata. IEEE Trans. Syst. Man Cybern. Part B 32(6), 738–749 (2002)
Article Google Scholar
Atlassis, A.F., Loukas, N.H., Vasilakos, A.V.: The use of learning algorithms in ATM networks call admission control problem: a methodology. Comput. Netw. 34, 341–353 (2000)
Article Google Scholar
Atlassis, A.F., Vasilakos, A.V.: The use of reinforcement learning algorithms in traffic control of high speed networks. Advances in Computational Intelligence and Learning, pp. 353–369 (2002)
Google Scholar
Beigy, H., Meybodi, M.R.: Adaptation of parameters of BP algorithm using learning automata. In: Proceedings of Sixth Brazilian Symposium on Neural Networks. JR, Brazil, pp. 24–31 (2000)
Google Scholar
Dean, T., Angluin, D., Basye, K., Engelson, S., Aelbling, L., Maron, O.: Inferring finite automata with stochastic output functions and an application to map learning. Mach. Learn. 18, 81–108 (1995)
Google Scholar
Erus, G., Polat, F.: A layered approach to learning coordination knowledge in multiagent environments. Appl. Intell. 27, 249–267 (2007)
Article Google Scholar
Granmo, O.C.: Solving stochastic nonlinear resource allocation problems using a hierarchy of twofold resource allocation automata. IEEE Trans. Comput. 59(4), 545–560 (2010)
Article MathSciNet Google Scholar
Granmo, O.C., Glimsdal, S.: Accelerated Bayesian learning for decentralized two-armed bandit based decision making with applications to the Goore game. Appl. Intell. 38, 479–488 (2013)
Article Google Scholar
Granmo, O.C., Oommen, B.J.: On allocating limited sampling resources using a learning automata-based solution to the fractional knapsack problem. In: Proceedings of the 2006 International Intelligent Information Processing and Web Mining Conference, Advances in Soft Computing, vol. 35, Ustron, Poland, pp. 263–272 (2006)
Google Scholar
Granmo, O.C., Oommen, B.J.: Optimal sampling for estimation with constrained resources using a learning automaton-based solution for the nonlinear fractional knapsack problem. Appl. Intell. 33(1), 3–20 (2010)
Article Google Scholar
Granmo, O.C., Oommen, B.J., Myrer, S.A., Olsen, M.G.: Learning automata-based solutions to the nonlinear fractional knapsack problem with applications to optimal resource allocation. IEEE Trans. Syst. Man Cybern. Part B 37(1), 166–175 (2007)
Article Google Scholar
Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58, 13–30 (1963)
Article MathSciNet Google Scholar
Hong, J., Prabhu, V.V.: Distributed reinforcement learning control for batch sequencing and sizing in just-in-time manufacturing systems. Appl. Intell. 20, 71–87 (2004)
Article Google Scholar
Kabudian, J., Meybodi, M.R., Homayounpour, M.M.: Applying continuous action reinforcement learning automata (CARLA) to global training of hidden markov models. In: Proceedings of ITCC’04, the International Conference on Information Technology: Coding and Computing, Las Vegas, Nevada, 2004, pp. 638–642
Google Scholar
Lakshmivarahan, S.: Learning Algorithms Theory and Applications. Springer (1981)
Google Scholar
Lanctot, J.K., Oommen, B.J.: On discretizing estimator-based learning algorithms. IEEE Trans. on Systems, Man, and Cybernetics, Part B: Cybernetics 2, 1417–1422 (1991)
Google Scholar
Lanctot, J.K., Oommen, B.J.: Discretized estimator learning automata. IEEE Trans. Syst. Man Cybern. Part B: Cybern. 22(6), 1473–1483 (1992)
Google Scholar
Li, J., Li, Z., Chen, J.: Microassembly path planning using reinforcement learning for improving positioning accuracy of a \(1~cm^3\) omni-directional mobile microrobot. Appl. Intell. 34, 211–225 (2011)
Article Google Scholar
Meybodi, M.R., Beigy, H.: New learning automata based algorithms for adaptation of backpropagation algorithm parameters. Int. J. Neural Syst. 12, 45–67 (2002)
Article Google Scholar
Misra, S., Oommen, B.J.: GPSPA?: A new adaptive algorithm for maintaining shortest path routing trees in stochastic networks. Int. J. Commun. Syst. 17, 963–984 (2004)
Article Google Scholar
Najim, K., Poznyak, A.S.: Learning Automata: Theory and Applications. Pergamon Press, Oxford (1994)
MATH Google Scholar
Narendra, K.S., Thathachar, M.A.L.: Learning Automata: An Introduction. Prentice Hall (1989)
Google Scholar
Obaidat, M.S., Papadimitriou, G.I., Pomportsis, A.S., Laskaridis, H.S.: Learning automata-based bus arbitration for shared-medium ATM switches. IEEE Trans. Syst. Man Cybern. Part B 32, 815–820 (2002)
Article Google Scholar
Oommen, B.J.: Stochastic searching on the line and its applications to parameter learning in nonlinear optimization. IEEE Trans. Syst. Man Cybern. Part B: Cybern. 27(4), 733–739 (1997)
Google Scholar
Oommen, B.J., Granmo, O.C., Pedersen, A.: Using stochastic AI techniques to achieve unbounded resolution in finite player Goore Games and its applications. In: Proceedings of IEEE Symposium on Computational Intelligence and Games, Honolulu, HI, pp. 161–167 (2007)
Google Scholar
Oommen, B.J., Lanctot, J.K.: Discretized pursuit learning automata. IEEE Trans. Syst. Man Cybern. 20, 931–938 (1990)
Article MathSciNet Google Scholar
Oommen, B.J.: Absorbing and ergodic discretized two-action learning automata. IEEE Trans. Syst. Man Cybern. 16, 282–296 (1986)
Article MathSciNet Google Scholar
Oommen, B.J., Agache, M.: Continuous and discretized pursuit learning schemes: various algorithms and their comparison. IEEE Trans. Syst. Man Cybern. 31(3), 277–287 (2001)
Article Google Scholar
Oommen, B.J., Croix, T.D.S.: Graph partitioning using learning automata. IEEE Trans. Comput. 45, 195–208 (1996)
Article MathSciNet Google Scholar
Oommen, B.J., Roberts, T.D.: Continuous learning automata solutions to the capacity assignment problem. IEEE Trans. Comput. 49, 608–620 (2000)
Article Google Scholar
Papadimitriou, G.I., Pomportsis, A.S.: Learning-automata-based TDMA protocols for broadcast communication systems with bursty traffic. IEEE Commun. Lett. 107–109 (2000)
Google Scholar
Poznyak, A.S., Najim, K.: Learning Automata and Stochastic Optimization. Springer, Berlin (1997)
MATH Google Scholar
Sastry, P.S.: Systems of Learning Automata: Estimator Algorithms Applications. Ph.D. Thesis, Department of Electrical Engineering, Indian Institute of Science, Bangalore, India (1985)
Google Scholar
Rajaraman, K., Sastry, P.S.: Finite time analysis of the pursuit algorithm for learning automata. IEEE Trans. Syst. Man Cybern. Part B: Cybern. 26, 590–598 (1996)
Google Scholar
Seredynski, F.: Distributed scheduling using simple learning machines. Eur. J. Oper. Res. 107, 401–413 (1998)
Article Google Scholar
Ryan, M., Omkar, T.: On \(\upvarepsilon \)-optimality of the pursuit learning algorithm. J. Appl. Probab. 49(3), 795–805 (2012)
Article MathSciNet Google Scholar
Thathachar, M.A.L., Sastry, P.S.: A class of rapidly converging algorithms for learning automata. IEEE Trans. Syst. Man Cybern. SMC-15, 168–175 (1985)
Google Scholar
Thathachar, M.A.L., Sastry, P.S.: Estimator algorithms for learning automata. In: Proceedings of the Platinum Jubilee Conference on Systems and Signal Processing, Department of Electrical Engineering, Indian Institute of Science, Bangalore, India, Dec 1986, pp. 29–32
Google Scholar
Thathachar, M.A.L.T., Sastry, P.S.: Networks of Learning Automata?: Techniques for Online Stochastic Optimization. Kluwer Academic, Boston (2003)
Google Scholar
Torkestani, J.A.: An adaptive focused web crawling algorithm based on learning automata. Appl. Intell. 37, 586–601 (2012)
Article Google Scholar
Unsal, C., Kachroo, P., Bay, J.S.: Multiple stochastic learning automata for vehicle path control in an automated highway system. IEEE Trans. Syst. Man Cybern. Part A 29, 120–128 (1999)
Article Google Scholar
Vafashoar, R., Meybodi, M.R., Momeni, A.A.H.: CLA-DE: a hybrid model based on cellular learning automata for numerical optimization. Appl. Intell. 36, 735–748 (2012)
Article Google Scholar
Vasilakos, A., Saltouros, M.P., Atlassis, A.F., Pedrycz, W.: Optimizing QoS routing in hierarchical ATM networks using computational intelligence techniques. IEEE Trans. Syst. Sci. Cybern. Part C 33, 297–312 (2003)
Article Google Scholar
Yazidi, A., Granmo, O.C., Oommen, B.J.: Service selection in stochastic environments: a learning-automaton based solution. Appl. Intell. 36, 617–637 (2012)
Article Google Scholar
Zhang, X., Granmo, O.C., Oommen, B.J.: The Bayesian pursuit algorithm: A new family of estimator learning automata. In: Proceedings of IEAAIE2011. pp. 608–620. Springer, New York, USA (2011)
Google Scholar
Zhang, X., Granmo, O.C., Oommen, B.J.: Discretized Bayesian pursuit—a new scheme for reinforcement learning. In: Proceedings of IEAAIE2012. Dalian, China, pp. 784–793 (2012)
Google Scholar
Zhang, X., Granmo, O.C., Oommen, B.J.: On incorporating the paradigms of discretization and Bayesian estimation to create a new family of pursuit learning automata. Appl. Intell. 39, 782–792 (2013)
Article Google Scholar
Zhang, X., Granmo, O.C., Oommen, B.J., Jiao, L.: On using the theory of regular functions to prove the \(\epsilon \)-optimality of the continuous pursuit learning automaton. In: Proceedings of IEAAIE2013, pp. 262–271. Springer, Amsterdan, Holland (2013)
Google Scholar
Zhang, X., Granmo, O.C., Oommen, B.J., Jiao, L.: A formal proof of the \(\upvarepsilon \)-optimality of absorbing continuous pursuit algorithms using the theory of regular functions. Appl. Intell. 41, 974–985 (2014)
Article Google Scholar
Zhang, X., Oommen, B.J., Granmo, O.C., Jiao, L.: Using the theory of regular functions to formally prove the \(\epsilon \)-optimality of discretized pursuit learning algorithms. In: Proceedings of IEAAIE2014, pp. 379–388. Springer, Kaohsiung, Taiwan (2014)
Google Scholar
Zhang, X., Oommen, B.J., Granmo, O.C., Jiao, L.: A formal proof of the \(\upvarepsilon \)-optimality of discretized pursuit algorithms. Appl. Intell. (2015). https://doi.org/10.1007/s10489-015-0670-1
Article Google Scholar
Zhang, X., Oommen, B.J., Granmo, O.C.: The design of absorbing bayesian pursuit algorithms and the formal analyses of their \(\epsilon \)-Optimality. Pattern Anal. Appl. 20(3) (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, Carleton University, Ottawa, Canada
B. John Oommen
Centre for Artificial Intelligence Research, University of Agder, Grimstad, Norway
B. John Oommen & Lei Jiao
NORCE, Norwagian Research Center, Jon Lilletuns vei 9, 4879, Grimstad, Norway
Xuan Zhang

Authors

B. John Oommen
View author publications
You can also search for this author in PubMed Google Scholar
Xuan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Lei Jiao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to B. John Oommen .

Editor information

Editors and Affiliations

Department of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece
Petros Nicopolitidis
Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur, West Bengal, India
Sudip Misra
Department of Computer Science, St. Francis Xavier University, Antigonish, NS, Canada
Laurence T. Yang
Department of Electrical and Computer Engineering, University of Arizona, Tucson, AZ, USA
Bernard Zeigler
Dalian University of Technology, Dalian, China
Zhaolng Ning

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

John Oommen, B., Zhang, X., Jiao, L. (2022). A Comprehensive Survey of Estimator Learning Automata and Their Recent Convergence Results. In: Nicopolitidis, P., Misra, S., Yang, L.T., Zeigler, B., Ning, Z. (eds) Advances in Computing, Informatics, Networking and Cybersecurity. Lecture Notes in Networks and Systems, vol 289. Springer, Cham. https://doi.org/10.1007/978-3-030-87049-2_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-87049-2_2
Published: 03 March 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-87048-5
Online ISBN: 978-3-030-87049-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

A Comprehensive Survey of Estimator Learning Automata and Their Recent Convergence Results