Abstract
Learning automaton (LA) is a reinforcement learning unit that learns the optimal action in a stochastic environment. Great efforts have been made to improve the performance of LA in the environments that provide only reward or penalty. However, in many practical scenarios, the feedback from the environment splits into multiple levels. The later environment is recognized by the LA community as the Q-model. This paper studies the LA in Q-model environments, whose study has been scanty. We propose a novel Bayesian inference-based LA that is capable of functioning in Q-model environments, BILAML. We utilize Bayesian inference to estimate the environment’s response to each action. Then, KL divergence metric is adopted for adaptive decision-making. The BILAML scheme is proved to be 𝜖-optimal and is evaluated to be superior to established LA frameworks by comprehensive experiments.





Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
In the paper, multi-level environments and Q-model environments are used interchangeably.
References
Sutton RS, Barto AG (2018) Reinforcement learning: an introduction, MIT Press, Cambridge
Narendra KS, Thathachar MAL (2012) Learning automata: an introduction. Courier Corporation
Tsetlin ML (1961) On behaviour of finite automata in random medium. Avtom I Telemekhanika 22(10):1345–1354
Hasanzadeh M, Meybodi MR (2014) Grid resource discovery based on distributed learning automata. Computing 96(9):909–922
Jobava A, Yazidi A, Oommen BJ, Begnum K (2018) On achieving intelligent traffic-aware consolidation of virtual machines in a data center using learning automata. J Comput Sci 24:290–312
Rahmanian AA, Ghobaei-Arani M, Tofighy S (2018) A learning automata-based ensemble resource usage prediction algorithm for cloud computing environment. Future Gener Comput Syst 79:54–71
Yazidi A, Hammer HL, Jonassen TM (2019) Two-time scale learning automata: an efficient decision making mechanism for stochastic nonlinear resource allocation. Appl Intell 49(9):3392–3405
Di C, Zhang B, Liang Q, Li S, Guo Y (2018) Learning automata-based access class barring scheme for massive random access in machine-to-machine communications. IEEE Internet Things J 6(4):6007–6017
Mofrad MH, Sadeghi S, Rezvanian A, Meybodi MR (2015) Cellular edge detection: combining cellular automata and cellular learning automata. AEU-Int J Electron Commun 69(9):1282–1290
Kumar N, Lee J-H, Rodrigues JJPC (2014) Intelligent mobile video surveillance system as a bayesian coalition game in vehicular sensor networks: learning automata approach. IEEE Trans Intell Transp Syst 16(3):1148–1161
Adinehvand K, Sardari D, Hosntalab M, Pouladian M (2017) An efficient multistage segmentation method for accurate hard exudates and lesion detection in digital retinal images. J Intell Fuzzy Syst 33 (3):1639–1649
Vafashoar R, Meybodi MR (2016) Multi swarm bare bones particle swarm optimization with distribution adaption. Appl Soft Comput 47:534–552
Kordestani JK, Firouzjaee HA, Meybodi MR (2018) An adaptive bi-flight cuckoo search with variable nests for continuous dynamic optimization problems. Appl Intell 48(1):97–117
Rezvanian A, Meybodi MR (2017) Sampling algorithms for stochastic graphs: a learning automata approach. Knowl-Based Syst 127:126–144
Saghiri AM, Meybodi MR (2018) Open asynchronous dynamic cellular learning automata and its application to allocation hub location problem. Knowl-Based Syst 139:149–169
Mirsaleh MR, Meybodi MR (2018) Balancing exploration and exploitation in memetic algorithms: a learning automata approach. Comput Intell 34(1):282–309
Yazidi A, Bouhmala N, Goodwin M (2020) A team of pursuit learning automata for solving deterministic optimization problems. Appl Intell 50:2916–2931
Ahangaran M, Taghizadeh N, Beigy H (2017) Associative cellular learning automata and its applications. Appl Soft Comput 53:1–18
Sohrabi MK, Roshani R (2017) Frequent itemset mining using cellular learning automata. Comput Hum Behav 68:244–253
Ghavipour M, Meybodi MR (2018) Trust propagation algorithm based on learning automata for inferring local trust in online social networks. Knowl-Based Syst 143:307–316
Hasanzadeh-Mofrad M, Rezvanian A (2018) Learning automata clustering. J Comput Sci 24:379–388
Rezvanian A, Moradabadi B, Ghavipour M, Khomami MMD, Meybodi MR (2019) Introduction to learning automata models. In: Learning automata approach for social networks. Springer, pp 1–49
Khaksar Manshad M, Meybodi M, Salajegheh A (2021) A new irregular cellular learning automata-based evolutionary computation for time series link prediction in social networks. Appl Intell 51:71–84
Goodwin M, Yazidi A (2020) Distributed learning automata-based scheme for classification using novel pursuit scheme. Appl Intell 50:2222–2238
Zhang J, Wang Y, Wang C, Zhou MC (2017) Fast variable structure stochastic automaton for discovering and tracking spatiotemporal event patterns. IEEE Trans Cybern 48(3):890–903
Najim K, Poznyak AS (2014) Learning automata: theory and applications, Elsevier
Varshavskii VI, Vorontsova IP (1963) On the behavior of stochastic automata with a variable structure. Avtomatika i Telemekhanika 24(3):353–360
Oommen BJ, Hansen E (1984) The asymptotic optimality of discretized linear reward-inaction learning automata. IEEE Trans Syst Man Cybern (3): 542–545
Oommen BJ, Lanctôt JK (1990) Discretized pursuit learning automata. IEEE Trans Syst Man Cybern 20(4):931–938
Agache M, Oommen BJ (2002) Generalized pursuit learning schemes: new families of continuous and discretized learning automata. IEEE Trans Syst Man Cybern Part B (Cybernetics) 32(6):738–749
Zhang X, Granmo O-C, Oommen BJ (2013) On incorporating the paradigms of discretization and bayesian estimation to create a new family of pursuit learning automata. Appl Intell 39(4):782–792
Zhang J, Wang C, Zhou MC (2014) Last-position elimination-based learning automata. IEEE Trans Cybern 44(12):2484–2492
Zhang J, Wang C, Zang D, Zhou M (2015) Incorporation of optimal computing budget allocation for ordinal optimization into learning automata. IEEE Trans Autom Sci Eng 13(2):1008–1017
Papadimitriou GI, Sklira M, Pomportsis AS (2004) A new class of/spl epsi/-optimal learning automata. IEEE Trans Syste Man Cybern Part B (Cybernetics) 34(1):246–254
Ge H, Jiang W, Li S, Li J, Wang Y, Jing Y (2015) A novel estimator based learning automata algorithm. Appl Intell 42(2):262–275
Yazidi A, Zhang X, Jiao L, Oommen BJ (2019) The hierarchical continuous pursuit learning automation: a novel scheme for environments with large numbers of actions. IEEE Trans Neural Netw Learn Syst 31(2):512–526
Chasparis GC (2019) Stochastic stability of perturbed learning automata in positive-utility games. IEEE Trans Autom Control 64(11):4454–4469
Zhang X, Jiao L, Oommen BJ, Granmo O-C (2019) A conclusive analysis of the finite-time behavior of the discretized pursuit learning automaton. IEEE Trans Neural Netw Learn Syst 31(1):284–294
Di C, Liang Q, Li F, Li S, Luo F An efficient parameter-free learning automaton scheme. IEEE Trans Neural Netw Learn Syst
Di C, Li S, Li F, Qi K (2019) A novel framework for learning automata: a statistical hypothesis testing approach. IEEE Access 7:27911–27922
Ge H, Yan Y, Li J, Guo Y, Li S (2016) A parameter-free gradient bayesian two-action learning automaton scheme. In: Proceedings of the 2015 international conference on communications, signal processing, and systems. Springer, pp 963–970
Ge H (2017) A parameter-free learning automaton scheme. arXiv:1711.10111
Guo Y, Ge H, Li S (2017) A loss function based parameterless learning automaton scheme. Neurocomputing 260:331–340
Guo Y, Li S (2018) A non-monte-carlo parameter-free learning automata scheme based on two categories of statistics. IEEE Trans Cybern 49(12):4153–4166
Jamalian AH, Rezvani R, Shams H, Mehrabi SH (2012) A new learning automaton for interaction with triple level environments. In: 2012 IEEE 11th international conference on cognitive informatics and cognitive computing. IEEE, pp 492–498
Jiang W, Li S-H (2014) A general method for p-model fssa learning in triple level environment. Neurocomputing 137:150–156
Baba N, et al. (1976) On the learning behavior of the SLR-I reinforcement scheme for stochastic automata. IEEE Trans Syst Man Cybern SMC-6(8):580–582
Casella G, Berger RL (2002) Statistical inference, vol 2. Duxbury Pacific Grove
Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Mach Learn 47(2–3):235–256
Acknowledgements
This research work is funded by the National Nature Science Foundation of China under Grant 61971283 and 2020 Industrial Internet Innovation Development Project of Ministry of Industry and Information Technology of P.R. China “Smart energy Internet security situation awareness platform project”.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Di, C., Li, F., Li, S. et al. Bayesian inference based learning automaton scheme in Q-model environments. Appl Intell 51, 7453–7468 (2021). https://doi.org/10.1007/s10489-021-02230-8
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02230-8