Skip to main content

A Comprehensive Survey of Estimator Learning Automata and Their Recent Convergence Results

  • Chapter
  • First Online:
Advances in Computing, Informatics, Networking and Cybersecurity

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 289))

  • 690 Accesses

Abstract

The pre-cursor field to Reinforcement Learning is that of Learning Automata (LA). Within this field, Estimator Algorithms (EAs) can be said to be the state-of-the-art. Further, the subset of Pursuit Algorithms (PAs), discovered by Thathachar and Sastry [34, 39], were the pioneering schemes. This chapter contains a comprehensive survey of the various EAs, and the most recent convergence results for PAs. Unlike the prior LA, EAs are based on a fundamentally distinct phenomenon. They are also the most accurate LA, converging in the least time. EAs operate on two vectors, namely, the action probability vector which is updated using responses from the Environment, and quickly-computed estimates of the reward probabilities of the various actions. The proofs that they are \(\upvarepsilon \)-optimal is thus very complex. They have to incorporate two rather snon-orthogonal phenomena, which are the convergence of these estimates and the convergence of the probabilities of selecting the various actions. For almost three decades, the reported proofs of PAs possessed an infirmity (or flaw), which we refer to as the claim of the “monotonicity” property. This flaw was discovered by the authors of [37], who also provided an alternate proof for a specific PA where the scheme’s parameter decreased with time. This paper first records all the reported EAs. It then reports a comprehensive survey of the proofs from a different perspective. These proofs have not required that the sequence of action probabilities of selecting the optimal action satisfies the property of monotonicity. On the other hand, whenever any action probability is close enough to unity, we require that the process jumps to an absorbing barrier at the next time instant, i.e., in a single step. By requiring such a constraint, these proofs invoke the weaker property, i.e., the submartinagale property of \(p_m(t)\), to demonstrate the \(\upvarepsilon \)-optimality. We have thus proven the \(\upvarepsilon \)-optimality for the Absorbing CPA [49, 50], the Discretized PA [51, 52], and for the family of Bayesian PA [53], where the estimates are obtained by a Bayesian (rather than a Maximum Likelihood (ML)) process.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In the rest of the paper, we put the counterparts for discretized algorithms in parentheses.

References

  1. Agache, M.: Estimator Based Learning Algorithms. M.C.S. Thesis, School of Computer Science, Carleton University, Ottawa, Ontario, Canada (2000)

    Google Scholar 

  2. Agache, M., Oommen, B.J.: Generalized pursuit learning schemes: new families of continuous and discretized learning Automata. IEEE Trans. Syst. Man Cybern. Part B 32(6), 738–749 (2002)

    Article  Google Scholar 

  3. Atlassis, A.F., Loukas, N.H., Vasilakos, A.V.: The use of learning algorithms in ATM networks call admission control problem: a methodology. Comput. Netw. 34, 341–353 (2000)

    Article  Google Scholar 

  4. Atlassis, A.F., Vasilakos, A.V.: The use of reinforcement learning algorithms in traffic control of high speed networks. Advances in Computational Intelligence and Learning, pp. 353–369 (2002)

    Google Scholar 

  5. Beigy, H., Meybodi, M.R.: Adaptation of parameters of BP algorithm using learning automata. In: Proceedings of Sixth Brazilian Symposium on Neural Networks. JR, Brazil, pp. 24–31 (2000)

    Google Scholar 

  6. Dean, T., Angluin, D., Basye, K., Engelson, S., Aelbling, L., Maron, O.: Inferring finite automata with stochastic output functions and an application to map learning. Mach. Learn. 18, 81–108 (1995)

    Google Scholar 

  7. Erus, G., Polat, F.: A layered approach to learning coordination knowledge in multiagent environments. Appl. Intell. 27, 249–267 (2007)

    Article  Google Scholar 

  8. Granmo, O.C.: Solving stochastic nonlinear resource allocation problems using a hierarchy of twofold resource allocation automata. IEEE Trans. Comput. 59(4), 545–560 (2010)

    Article  MathSciNet  Google Scholar 

  9. Granmo, O.C., Glimsdal, S.: Accelerated Bayesian learning for decentralized two-armed bandit based decision making with applications to the Goore game. Appl. Intell. 38, 479–488 (2013)

    Article  Google Scholar 

  10. Granmo, O.C., Oommen, B.J.: On allocating limited sampling resources using a learning automata-based solution to the fractional knapsack problem. In: Proceedings of the 2006 International Intelligent Information Processing and Web Mining Conference, Advances in Soft Computing, vol. 35, Ustron, Poland, pp. 263–272 (2006)

    Google Scholar 

  11. Granmo, O.C., Oommen, B.J.: Optimal sampling for estimation with constrained resources using a learning automaton-based solution for the nonlinear fractional knapsack problem. Appl. Intell. 33(1), 3–20 (2010)

    Article  Google Scholar 

  12. Granmo, O.C., Oommen, B.J., Myrer, S.A., Olsen, M.G.: Learning automata-based solutions to the nonlinear fractional knapsack problem with applications to optimal resource allocation. IEEE Trans. Syst. Man Cybern. Part B 37(1), 166–175 (2007)

    Article  Google Scholar 

  13. Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58, 13–30 (1963)

    Article  MathSciNet  Google Scholar 

  14. Hong, J., Prabhu, V.V.: Distributed reinforcement learning control for batch sequencing and sizing in just-in-time manufacturing systems. Appl. Intell. 20, 71–87 (2004)

    Article  Google Scholar 

  15. Kabudian, J., Meybodi, M.R., Homayounpour, M.M.: Applying continuous action reinforcement learning automata (CARLA) to global training of hidden markov models. In: Proceedings of ITCC’04, the International Conference on Information Technology: Coding and Computing, Las Vegas, Nevada, 2004, pp. 638–642

    Google Scholar 

  16. Lakshmivarahan, S.: Learning Algorithms Theory and Applications. Springer (1981)

    Google Scholar 

  17. Lanctot, J.K., Oommen, B.J.: On discretizing estimator-based learning algorithms. IEEE Trans. on Systems, Man, and Cybernetics, Part B: Cybernetics 2, 1417–1422 (1991)

    Google Scholar 

  18. Lanctot, J.K., Oommen, B.J.: Discretized estimator learning automata. IEEE Trans. Syst. Man Cybern. Part B: Cybern. 22(6), 1473–1483 (1992)

    Google Scholar 

  19. Li, J., Li, Z., Chen, J.: Microassembly path planning using reinforcement learning for improving positioning accuracy of a \(1~cm^3\) omni-directional mobile microrobot. Appl. Intell. 34, 211–225 (2011)

    Article  Google Scholar 

  20. Meybodi, M.R., Beigy, H.: New learning automata based algorithms for adaptation of backpropagation algorithm parameters. Int. J. Neural Syst. 12, 45–67 (2002)

    Article  Google Scholar 

  21. Misra, S., Oommen, B.J.: GPSPA?: A new adaptive algorithm for maintaining shortest path routing trees in stochastic networks. Int. J. Commun. Syst. 17, 963–984 (2004)

    Article  Google Scholar 

  22. Najim, K., Poznyak, A.S.: Learning Automata: Theory and Applications. Pergamon Press, Oxford (1994)

    MATH  Google Scholar 

  23. Narendra, K.S., Thathachar, M.A.L.: Learning Automata: An Introduction. Prentice Hall (1989)

    Google Scholar 

  24. Obaidat, M.S., Papadimitriou, G.I., Pomportsis, A.S., Laskaridis, H.S.: Learning automata-based bus arbitration for shared-medium ATM switches. IEEE Trans. Syst. Man Cybern. Part B 32, 815–820 (2002)

    Article  Google Scholar 

  25. Oommen, B.J.: Stochastic searching on the line and its applications to parameter learning in nonlinear optimization. IEEE Trans. Syst. Man Cybern. Part B: Cybern. 27(4), 733–739 (1997)

    Google Scholar 

  26. Oommen, B.J., Granmo, O.C., Pedersen, A.: Using stochastic AI techniques to achieve unbounded resolution in finite player Goore Games and its applications. In: Proceedings of IEEE Symposium on Computational Intelligence and Games, Honolulu, HI, pp. 161–167 (2007)

    Google Scholar 

  27. Oommen, B.J., Lanctot, J.K.: Discretized pursuit learning automata. IEEE Trans. Syst. Man Cybern. 20, 931–938 (1990)

    Article  MathSciNet  Google Scholar 

  28. Oommen, B.J.: Absorbing and ergodic discretized two-action learning automata. IEEE Trans. Syst. Man Cybern. 16, 282–296 (1986)

    Article  MathSciNet  Google Scholar 

  29. Oommen, B.J., Agache, M.: Continuous and discretized pursuit learning schemes: various algorithms and their comparison. IEEE Trans. Syst. Man Cybern. 31(3), 277–287 (2001)

    Article  Google Scholar 

  30. Oommen, B.J., Croix, T.D.S.: Graph partitioning using learning automata. IEEE Trans. Comput. 45, 195–208 (1996)

    Article  MathSciNet  Google Scholar 

  31. Oommen, B.J., Roberts, T.D.: Continuous learning automata solutions to the capacity assignment problem. IEEE Trans. Comput. 49, 608–620 (2000)

    Article  Google Scholar 

  32. Papadimitriou, G.I., Pomportsis, A.S.: Learning-automata-based TDMA protocols for broadcast communication systems with bursty traffic. IEEE Commun. Lett. 107–109 (2000)

    Google Scholar 

  33. Poznyak, A.S., Najim, K.: Learning Automata and Stochastic Optimization. Springer, Berlin (1997)

    MATH  Google Scholar 

  34. Sastry, P.S.: Systems of Learning Automata: Estimator Algorithms Applications. Ph.D. Thesis, Department of Electrical Engineering, Indian Institute of Science, Bangalore, India (1985)

    Google Scholar 

  35. Rajaraman, K., Sastry, P.S.: Finite time analysis of the pursuit algorithm for learning automata. IEEE Trans. Syst. Man Cybern. Part B: Cybern. 26, 590–598 (1996)

    Google Scholar 

  36. Seredynski, F.: Distributed scheduling using simple learning machines. Eur. J. Oper. Res. 107, 401–413 (1998)

    Article  Google Scholar 

  37. Ryan, M., Omkar, T.: On \(\upvarepsilon \)-optimality of the pursuit learning algorithm. J. Appl. Probab. 49(3), 795–805 (2012)

    Article  MathSciNet  Google Scholar 

  38. Thathachar, M.A.L., Sastry, P.S.: A class of rapidly converging algorithms for learning automata. IEEE Trans. Syst. Man Cybern. SMC-15, 168–175 (1985)

    Google Scholar 

  39. Thathachar, M.A.L., Sastry, P.S.: Estimator algorithms for learning automata. In: Proceedings of the Platinum Jubilee Conference on Systems and Signal Processing, Department of Electrical Engineering, Indian Institute of Science, Bangalore, India, Dec 1986, pp. 29–32

    Google Scholar 

  40. Thathachar, M.A.L.T., Sastry, P.S.: Networks of Learning Automata?: Techniques for Online Stochastic Optimization. Kluwer Academic, Boston (2003)

    Google Scholar 

  41. Torkestani, J.A.: An adaptive focused web crawling algorithm based on learning automata. Appl. Intell. 37, 586–601 (2012)

    Article  Google Scholar 

  42. Unsal, C., Kachroo, P., Bay, J.S.: Multiple stochastic learning automata for vehicle path control in an automated highway system. IEEE Trans. Syst. Man Cybern. Part A 29, 120–128 (1999)

    Article  Google Scholar 

  43. Vafashoar, R., Meybodi, M.R., Momeni, A.A.H.: CLA-DE: a hybrid model based on cellular learning automata for numerical optimization. Appl. Intell. 36, 735–748 (2012)

    Article  Google Scholar 

  44. Vasilakos, A., Saltouros, M.P., Atlassis, A.F., Pedrycz, W.: Optimizing QoS routing in hierarchical ATM networks using computational intelligence techniques. IEEE Trans. Syst. Sci. Cybern. Part C 33, 297–312 (2003)

    Article  Google Scholar 

  45. Yazidi, A., Granmo, O.C., Oommen, B.J.: Service selection in stochastic environments: a learning-automaton based solution. Appl. Intell. 36, 617–637 (2012)

    Article  Google Scholar 

  46. Zhang, X., Granmo, O.C., Oommen, B.J.: The Bayesian pursuit algorithm: A new family of estimator learning automata. In: Proceedings of IEAAIE2011. pp. 608–620. Springer, New York, USA (2011)

    Google Scholar 

  47. Zhang, X., Granmo, O.C., Oommen, B.J.: Discretized Bayesian pursuit—a new scheme for reinforcement learning. In: Proceedings of IEAAIE2012. Dalian, China, pp. 784–793 (2012)

    Google Scholar 

  48. Zhang, X., Granmo, O.C., Oommen, B.J.: On incorporating the paradigms of discretization and Bayesian estimation to create a new family of pursuit learning automata. Appl. Intell. 39, 782–792 (2013)

    Article  Google Scholar 

  49. Zhang, X., Granmo, O.C., Oommen, B.J., Jiao, L.: On using the theory of regular functions to prove the \(\epsilon \)-optimality of the continuous pursuit learning automaton. In: Proceedings of IEAAIE2013, pp. 262–271. Springer, Amsterdan, Holland (2013)

    Google Scholar 

  50. Zhang, X., Granmo, O.C., Oommen, B.J., Jiao, L.: A formal proof of the \(\upvarepsilon \)-optimality of absorbing continuous pursuit algorithms using the theory of regular functions. Appl. Intell. 41, 974–985 (2014)

    Article  Google Scholar 

  51. Zhang, X., Oommen, B.J., Granmo, O.C., Jiao, L.: Using the theory of regular functions to formally prove the \(\epsilon \)-optimality of discretized pursuit learning algorithms. In: Proceedings of IEAAIE2014, pp. 379–388. Springer, Kaohsiung, Taiwan (2014)

    Google Scholar 

  52. Zhang, X., Oommen, B.J., Granmo, O.C., Jiao, L.: A formal proof of the \(\upvarepsilon \)-optimality of discretized pursuit algorithms. Appl. Intell. (2015). https://doi.org/10.1007/s10489-015-0670-1

    Article  Google Scholar 

  53. Zhang, X., Oommen, B.J., Granmo, O.C.: The design of absorbing bayesian pursuit algorithms and the formal analyses of their \(\epsilon \)-Optimality. Pattern Anal. Appl. 20(3) (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to B. John Oommen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

John Oommen, B., Zhang, X., Jiao, L. (2022). A Comprehensive Survey of Estimator Learning Automata and Their Recent Convergence Results. In: Nicopolitidis, P., Misra, S., Yang, L.T., Zeigler, B., Ning, Z. (eds) Advances in Computing, Informatics, Networking and Cybersecurity. Lecture Notes in Networks and Systems, vol 289. Springer, Cham. https://doi.org/10.1007/978-3-030-87049-2_2

Download citation

Publish with us

Policies and ethics