Abstract
For the past six decades, the operation of Learning Automata (LA) has involved states and action probabilities. These have been central to “remembering” the quality of the actions chosen during the learning. The latest enhancements have also incorporated estimates of the actions’ reward probabilities. However, a phenomenon that has never been used to-date is that of considering how these actions themselves, can be ordered. Ordering the actions in traditional LA is rather meaningless unless one resorts to invoking the theory of Random Races [1]. However, we show that such an ordering makes sense if the automata operate hierarchically, within a tree, with the actions being placed at the leaves. In this paper, we shall show that when the LA are arranged “in a tree formation”, and when the learning is achieved within such a tree, the hierarchical LA has a superior performance if the actions located at the leaves of the tree are arranged suitably. While this concept can be incorporated in any hierarchical LA, we demonstrate its power for the most recent machine, i.e., the Hierarchical Discretized Pursuit Automaton (HDPA). These strategies can also be included in the Hierarchical Continuous Pursuit Automaton (HCPA), and to both of these which utilize traditional Maximum Likelihood (ML) or Bayesian estimates [2]. The experimental results presented here are very impressive, and so, if we consider the chronology of LA from FSSA through VSSA, the Estimator schemes, and the recent hierarchical LA, our modest claim is that the inclusion of the ADE represents the state-of-the-art which is not easily surpassed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The term LA is used interchangeably to address the field of Learning Automata or the Learning Automata themselves, depending on the context.
- 2.
The proof that the ADE approach represents a superior solution compared with unordered solutions, will be proven in the extended version of the paper [16].
- 3.
Although the algorithm have been explained in details verbally in this paper, a more detailed programmatic description of the algorithm will be presented in an extended version of the paper [16].
- 4.
In our experiments, we have configured the convergence criterion as being achieved once any of the LA has attained a certain threshold of choosing one of the actions in its action probability vector. However, in [15], they defined the convergence as being achieved only when all the LA along the path to a leaf action had attained the prescribed threshold. Thus, the convergence criterion in this paper is different, i.e., it utilizes the “logical or” instead of the “logical and”, making the algorithms (i.e., both the HDPA without/with the ADE) attain a faster convergence.
- 5.
The speed of HDPA with ADE, compared with vanilla HDPA, indeed decreases a bit for the ascending/descending case. Nevertheless, considering the significant speed gain for other cases, the average speed increases.
References
Ng, D.T.H., Oommen, B.J., Hansen, E.R.: Adaptive learning mechanisms for ordering actions using random races. IEEE Trans. Syst. Man Cybern. 23(5), 1450–1465 (1993)
Zhang, X., Granmo, O.-C., Oommen, B.J.: On incorporating the paradigms of discretization and Bayesian estimation to create a new family of pursuit learning automata. Appl. Intell. 39(4), 782–792 (2013)
Tsetlin, M.L.: Automaton Theory and the Modeling of Biological Systems. Academic Press, New York (1973)
Lakshmivarahan, S.: Learning Algorithms Theory and Applications, ed. 1. Springer, New York (1981). https://doi.org/10.1007/978-1-4612-5975-6
Narendra, K.S., Thathachar, M.A.L.: Learning Automata: An Introduction. Dover Books on Electrical Engineering Series, Dover Publications, Courier Corporation (2013)
Tsetlin, M.L.: Finite automata and modeling the simplest forms of behavior. Uspekhi Matem Nauk 8(4), 1–26 (1963)
Lakshmivarahan, S., Thathachar, M.A.L.: Absolutely expedient learning algorithms for stochastic automata. IEEE Trans. Syst. Man Cybern. SMC–3(3), 281–286 (1973)
Oommen, B.J.: Absorbing and ergodic discretized two-action learning automata. IEEE Trans. Syst. Man Cybern. 16(2), 282–293 (1986)
Oommen, B.J., Agache, M.: Continuous and discretized pursuit learning schemes: various algorithms and their comparison. IEEE Trans. Syst. Man Cybernet. Part B (Cybernetics) 31(3), 277–287 (2001)
Zhang, X., Granmo, O.-C., Oommen, B.J.: Discretized Bayesian Pursuit – a new scheme for reinforcement learning. In: Jiang, H., Ding, W., Ali, M., Wu, X. (eds.) IEA/AIE 2012. LNCS (LNAI), vol. 7345, pp. 784–793. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31087-4_79
Thathachar, M.A.L., Sastry, P.S.: Estimator algorithms for learning automata. In: Proceedings of the Platinum Jubilee Conference on Systems and Signal Processing, Department of Electrical Engineering, Indian Institute of Science (1986)
Zhang, X., Oommen, B.J., Granmo, O.-C.: The design of absorbing Bayesian pursuit algorithms and the formal analyses of their \(\epsilon \)-optimality. Pattern Anal. Appl. 20, 797–808 (2017)
Lanctot, J.K., Oommen, B.J.: Discretized estimator learning automata. IEEE Trans. Syst. Man Cybern. 22(6), 1473–1483 (1992)
Yazidi, A., Zhang, X., Jiao, L., Oommen, B.J.: The hierarchical continuous pursuit learning automation: a novel scheme for environments with large numbers of actions. IEEE Trans. Neural Networks Learn. Syst. 31(2), 512–526 (2020)
Omslandseter, R.O., Jiao, L., Zhang, X., Yazidi, A., Oommen, B.J.: The hierarchical discrete learning automaton suitable for environments with Many actions and High accuracy requirements. In: Long, G., Yu, X., Wang, S. (eds.) AI 2022. LNCS (LNAI), vol. 13151, pp. 507–518. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-97546-3_41
Omslandseter, R.O., Jiao, L., Oommen, B.J.: Pioneering Approaches for Enhancing the Speed of Hierarchical LA by Ordering the Actions". Unabridged version of this paper. To be submitted for publication
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Omslandseter, R.O., Jiao, L., Oommen, B.J. (2022). Enhancing the Speed of Hierarchical Learning Automata by Ordering the Actions - A Pioneering Approach. In: Aziz, H., Corrêa, D., French, T. (eds) AI 2022: Advances in Artificial Intelligence. AI 2022. Lecture Notes in Computer Science(), vol 13728. Springer, Cham. https://doi.org/10.1007/978-3-031-22695-3_54
Download citation
DOI: https://doi.org/10.1007/978-3-031-22695-3_54
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-22694-6
Online ISBN: 978-3-031-22695-3
eBook Packages: Computer ScienceComputer Science (R0)