Skip to main content
Log in

Bayesian inference based learning automaton scheme in Q-model environments

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Learning automaton (LA) is a reinforcement learning unit that learns the optimal action in a stochastic environment. Great efforts have been made to improve the performance of LA in the environments that provide only reward or penalty. However, in many practical scenarios, the feedback from the environment splits into multiple levels. The later environment is recognized by the LA community as the Q-model. This paper studies the LA in Q-model environments, whose study has been scanty. We propose a novel Bayesian inference-based LA that is capable of functioning in Q-model environments, BILAML. We utilize Bayesian inference to estimate the environment’s response to each action. Then, KL divergence metric is adopted for adaptive decision-making. The BILAML scheme is proved to be 𝜖-optimal and is evaluated to be superior to established LA frameworks by comprehensive experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. In the paper, multi-level environments and Q-model environments are used interchangeably.

References

  1. Sutton RS, Barto AG (2018) Reinforcement learning: an introduction, MIT Press, Cambridge

  2. Narendra KS, Thathachar MAL (2012) Learning automata: an introduction. Courier Corporation

  3. Tsetlin ML (1961) On behaviour of finite automata in random medium. Avtom I Telemekhanika 22(10):1345–1354

    Google Scholar 

  4. Hasanzadeh M, Meybodi MR (2014) Grid resource discovery based on distributed learning automata. Computing 96(9):909–922

    Article  Google Scholar 

  5. Jobava A, Yazidi A, Oommen BJ, Begnum K (2018) On achieving intelligent traffic-aware consolidation of virtual machines in a data center using learning automata. J Comput Sci 24:290–312

    Article  Google Scholar 

  6. Rahmanian AA, Ghobaei-Arani M, Tofighy S (2018) A learning automata-based ensemble resource usage prediction algorithm for cloud computing environment. Future Gener Comput Syst 79:54–71

    Article  Google Scholar 

  7. Yazidi A, Hammer HL, Jonassen TM (2019) Two-time scale learning automata: an efficient decision making mechanism for stochastic nonlinear resource allocation. Appl Intell 49(9):3392–3405

    Article  Google Scholar 

  8. Di C, Zhang B, Liang Q, Li S, Guo Y (2018) Learning automata-based access class barring scheme for massive random access in machine-to-machine communications. IEEE Internet Things J 6(4):6007–6017

    Article  Google Scholar 

  9. Mofrad MH, Sadeghi S, Rezvanian A, Meybodi MR (2015) Cellular edge detection: combining cellular automata and cellular learning automata. AEU-Int J Electron Commun 69(9):1282–1290

    Article  Google Scholar 

  10. Kumar N, Lee J-H, Rodrigues JJPC (2014) Intelligent mobile video surveillance system as a bayesian coalition game in vehicular sensor networks: learning automata approach. IEEE Trans Intell Transp Syst 16(3):1148–1161

    Article  Google Scholar 

  11. Adinehvand K, Sardari D, Hosntalab M, Pouladian M (2017) An efficient multistage segmentation method for accurate hard exudates and lesion detection in digital retinal images. J Intell Fuzzy Syst 33 (3):1639–1649

    Article  Google Scholar 

  12. Vafashoar R, Meybodi MR (2016) Multi swarm bare bones particle swarm optimization with distribution adaption. Appl Soft Comput 47:534–552

    Article  Google Scholar 

  13. Kordestani JK, Firouzjaee HA, Meybodi MR (2018) An adaptive bi-flight cuckoo search with variable nests for continuous dynamic optimization problems. Appl Intell 48(1):97–117

    Article  Google Scholar 

  14. Rezvanian A, Meybodi MR (2017) Sampling algorithms for stochastic graphs: a learning automata approach. Knowl-Based Syst 127:126–144

    Article  Google Scholar 

  15. Saghiri AM, Meybodi MR (2018) Open asynchronous dynamic cellular learning automata and its application to allocation hub location problem. Knowl-Based Syst 139:149–169

    Article  Google Scholar 

  16. Mirsaleh MR, Meybodi MR (2018) Balancing exploration and exploitation in memetic algorithms: a learning automata approach. Comput Intell 34(1):282–309

    Article  MathSciNet  Google Scholar 

  17. Yazidi A, Bouhmala N, Goodwin M (2020) A team of pursuit learning automata for solving deterministic optimization problems. Appl Intell 50:2916–2931

    Article  Google Scholar 

  18. Ahangaran M, Taghizadeh N, Beigy H (2017) Associative cellular learning automata and its applications. Appl Soft Comput 53:1–18

    Article  Google Scholar 

  19. Sohrabi MK, Roshani R (2017) Frequent itemset mining using cellular learning automata. Comput Hum Behav 68:244–253

    Article  Google Scholar 

  20. Ghavipour M, Meybodi MR (2018) Trust propagation algorithm based on learning automata for inferring local trust in online social networks. Knowl-Based Syst 143:307–316

    Article  Google Scholar 

  21. Hasanzadeh-Mofrad M, Rezvanian A (2018) Learning automata clustering. J Comput Sci 24:379–388

    Article  MathSciNet  Google Scholar 

  22. Rezvanian A, Moradabadi B, Ghavipour M, Khomami MMD, Meybodi MR (2019) Introduction to learning automata models. In: Learning automata approach for social networks. Springer, pp 1–49

  23. Khaksar Manshad M, Meybodi M, Salajegheh A (2021) A new irregular cellular learning automata-based evolutionary computation for time series link prediction in social networks. Appl Intell 51:71–84

    Article  Google Scholar 

  24. Goodwin M, Yazidi A (2020) Distributed learning automata-based scheme for classification using novel pursuit scheme. Appl Intell 50:2222–2238

    Article  Google Scholar 

  25. Zhang J, Wang Y, Wang C, Zhou MC (2017) Fast variable structure stochastic automaton for discovering and tracking spatiotemporal event patterns. IEEE Trans Cybern 48(3):890–903

    Article  Google Scholar 

  26. Najim K, Poznyak AS (2014) Learning automata: theory and applications, Elsevier

  27. Varshavskii VI, Vorontsova IP (1963) On the behavior of stochastic automata with a variable structure. Avtomatika i Telemekhanika 24(3):353–360

    MathSciNet  Google Scholar 

  28. Oommen BJ, Hansen E (1984) The asymptotic optimality of discretized linear reward-inaction learning automata. IEEE Trans Syst Man Cybern (3): 542–545

  29. Oommen BJ, Lanctôt JK (1990) Discretized pursuit learning automata. IEEE Trans Syst Man Cybern 20(4):931–938

    Article  MathSciNet  Google Scholar 

  30. Agache M, Oommen BJ (2002) Generalized pursuit learning schemes: new families of continuous and discretized learning automata. IEEE Trans Syst Man Cybern Part B (Cybernetics) 32(6):738–749

    Article  Google Scholar 

  31. Zhang X, Granmo O-C, Oommen BJ (2013) On incorporating the paradigms of discretization and bayesian estimation to create a new family of pursuit learning automata. Appl Intell 39(4):782–792

    Article  Google Scholar 

  32. Zhang J, Wang C, Zhou MC (2014) Last-position elimination-based learning automata. IEEE Trans Cybern 44(12):2484–2492

    Article  Google Scholar 

  33. Zhang J, Wang C, Zang D, Zhou M (2015) Incorporation of optimal computing budget allocation for ordinal optimization into learning automata. IEEE Trans Autom Sci Eng 13(2):1008–1017

    Article  Google Scholar 

  34. Papadimitriou GI, Sklira M, Pomportsis AS (2004) A new class of/spl epsi/-optimal learning automata. IEEE Trans Syste Man Cybern Part B (Cybernetics) 34(1):246–254

    Article  Google Scholar 

  35. Ge H, Jiang W, Li S, Li J, Wang Y, Jing Y (2015) A novel estimator based learning automata algorithm. Appl Intell 42(2):262–275

    Article  Google Scholar 

  36. Yazidi A, Zhang X, Jiao L, Oommen BJ (2019) The hierarchical continuous pursuit learning automation: a novel scheme for environments with large numbers of actions. IEEE Trans Neural Netw Learn Syst 31(2):512–526

    Article  MathSciNet  Google Scholar 

  37. Chasparis GC (2019) Stochastic stability of perturbed learning automata in positive-utility games. IEEE Trans Autom Control 64(11):4454–4469

    Article  MathSciNet  Google Scholar 

  38. Zhang X, Jiao L, Oommen BJ, Granmo O-C (2019) A conclusive analysis of the finite-time behavior of the discretized pursuit learning automaton. IEEE Trans Neural Netw Learn Syst 31(1):284–294

    Article  MathSciNet  Google Scholar 

  39. Di C, Liang Q, Li F, Li S, Luo F An efficient parameter-free learning automaton scheme. IEEE Trans Neural Netw Learn Syst

  40. Di C, Li S, Li F, Qi K (2019) A novel framework for learning automata: a statistical hypothesis testing approach. IEEE Access 7:27911–27922

    Article  Google Scholar 

  41. Ge H, Yan Y, Li J, Guo Y, Li S (2016) A parameter-free gradient bayesian two-action learning automaton scheme. In: Proceedings of the 2015 international conference on communications, signal processing, and systems. Springer, pp 963–970

  42. Ge H (2017) A parameter-free learning automaton scheme. arXiv:1711.10111

  43. Guo Y, Ge H, Li S (2017) A loss function based parameterless learning automaton scheme. Neurocomputing 260:331–340

    Article  Google Scholar 

  44. Guo Y, Li S (2018) A non-monte-carlo parameter-free learning automata scheme based on two categories of statistics. IEEE Trans Cybern 49(12):4153–4166

    Article  Google Scholar 

  45. Jamalian AH, Rezvani R, Shams H, Mehrabi SH (2012) A new learning automaton for interaction with triple level environments. In: 2012 IEEE 11th international conference on cognitive informatics and cognitive computing. IEEE, pp 492–498

  46. Jiang W, Li S-H (2014) A general method for p-model fssa learning in triple level environment. Neurocomputing 137:150–156

    Article  Google Scholar 

  47. Baba N, et al. (1976) On the learning behavior of the SLR-I reinforcement scheme for stochastic automata. IEEE Trans Syst Man Cybern SMC-6(8):580–582

    Article  Google Scholar 

  48. Casella G, Berger RL (2002) Statistical inference, vol 2. Duxbury Pacific Grove

  49. Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Mach Learn 47(2–3):235–256

    Article  Google Scholar 

Download references

Acknowledgements

This research work is funded by the National Nature Science Foundation of China under Grant 61971283 and 2020 Industrial Internet Innovation Development Project of Ministry of Industry and Information Technology of P.R. China “Smart energy Internet security situation awareness platform project”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shenghong Li.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Di, C., Li, F., Li, S. et al. Bayesian inference based learning automaton scheme in Q-model environments. Appl Intell 51, 7453–7468 (2021). https://doi.org/10.1007/s10489-021-02230-8

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02230-8

Keywords