Skip to main content
Log in

A set of novel continuous action-set reinforcement learning automata models to optimize continuous functions

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Learning automata (LA) as a powerful tool for reinforcement learning which belongs to the subject of Artificial Intelligence, could search for the optimal state adaptively in a random environment. In the past decades quite a few FALA algorithms are maturely developed but exposing critical defects, when they are applied to optimize continuous functions. In order to overcome their shortcomings and explore a higher-performance LA, we propose a novel CALA algorithm to solve the function optimization problems via one kind of LA prototypes, i.e, the continuous action-set reinforcement learning automata, which is abbreviated as CARLA. The key mechanism of the proposed algorithm lies in a combination of equidistant discretization and linear interpolation. Specifically, four categories of application models are constructed. Two of them are created to obtain continuous actions when the priori information is finite ones, thus avoiding the drawbacks of FALA. The realization of this functionality recourses to the so-called cumulative distribution function (CDF) and a new concept of area surrounded by curves (AsbC) respectively. The other two models are modified versions to balance the trade-off between accuracy and speed. Moreover, these models are expanded to their generalized versions so that multidimensional function optimization problems can be handled as well. A massive amount of experiments including four benchmarks and three scenarios are designed to demonstrate the effectiveness and efficiency of the proposed application models. The proposed algorithm outperforms the state of the arts of LA as well as optimization algorithms, with a high accuracy rate, a fast convergence speed, and a competitive time consumption, especially in noised environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. The parameter space is the scope where we search for an optimum.

  2. A could be either finite points {α 1,α 2,⋯ ,α r } or a continuous interval chosen from real line (α m i n ,α m a x ), corresponding to FALA and CALA respectively.

  3. Throughout the paper, β=1 means the environment rewards the selected action to the maximum extent. And vice versa.

  4. Through out the paper, D α does not change over time. That is, only a stationary random environment is considered.

  5. It is exactly the order that we introduced the exsiting algorithms in Section 2.2.

  6. The parameters of FALA are I: n=7; II: n=7; III: r=5; IV: D=600.

  7. Different cases represent different initial parameters μ 0 and σ 0 in CALA, which are (3,5), (3,6), (-10,5), (-10,7), (10,5), (10,7) and (7,3) respectively.

References

  1. Sutton RS, Barto AG (2013) IEEE Trans Neural Netw 9(5):1054

    Article  Google Scholar 

  2. Thathachar M, Sastry PS (2002) IEEE Trans Syst Man Cybern B Cybern 32(6):711

    Article  Google Scholar 

  3. Tsetlin M (1961) Avtomat I Telemekh 22(10):1345

    Google Scholar 

  4. Varshavskii V, Vorontsova I (1963) Avtomatika i Telemekhanika 24(3):353

    Google Scholar 

  5. Thathachar M, Oommen B (1979) J Cybern Inf Sci 2(1):24

    Google Scholar 

  6. Thathachar ML, Sastry PS (1985) IEEE Trans Syst Man Cybern 1:168

    Article  Google Scholar 

  7. Papadimitriou GI, Sklira M, Pomportsis AS (2004) IEEE Trans Syst Man Cybern Part B Cybern 34 (1):246

    Article  Google Scholar 

  8. Zhang X, Granmo OC, Oommen BJ (2013) Appl Intell 39(4):782

    Article  Google Scholar 

  9. Ge H, Jiang W, Li S, Li J, Wang Y, Jing Y (2015) Appl Intell 42(2):262

    Article  Google Scholar 

  10. Zhang J, Wang C, Zhou M (2014) IEEE Trans Cybern 44(12):2484

    Article  Google Scholar 

  11. Zhang J, Wang C, Zhou M (2015) IEEE Trans Cybern 45(10):2089

    Article  Google Scholar 

  12. Oommen BJ (1997) IEEE Trans Syst Man Cybern Part B Cybern 27(4):733

    Article  MathSciNet  Google Scholar 

  13. Oommen BJ, Raghunath G (1998) IEEE Trans Syst Man Cybern Part B Cybern 28(6):947

    Article  Google Scholar 

  14. Oommen BJ, Raghunath G, Kuipers B (2006) IEEE Trans Syst Man Cybern Part B Cybern 36(4):820

    Article  Google Scholar 

  15. Huang DS, Jiang W (2012) IEEE Trans Syst Man Cybern Part B Cybern 42(5):1489

    Article  Google Scholar 

  16. Yazidi A, Granmo OC, Oommen BJ, Goodwin M (2014) IEEE Trans Cybern 44(11):2202

    Article  Google Scholar 

  17. Jiang W, Huang DS, Li S (2015)

  18. Oommen BJ, Granmo OC, Pedersen A (2007) . In: IEEE Symposium on Computational Intelligence and Games, 2007. CIG 2007, pp 161–167

  19. Calitoiu D (2009)

  20. Maravall D, De Lope J, Fuentes JP (2013) Pattern Recogn Lett 34(14):1719

    Article  Google Scholar 

  21. Cuevas E, Wario F, Zaldivar D, Pérez-Cisneros M . In: Artificial Intelligence, Evolutionary Computing and Metaheuristics (Springer 2013), pp. 545–570

  22. Oommen BJ, Hashem MK (2010) IEEE Trans Syst Man Cybern Part B Cybern 40(2):481

    Article  Google Scholar 

  23. Oommen BJ, Hashem MK (2013) IEEE transactions on cybernetics 43(6):2020

    Article  Google Scholar 

  24. Ge H, Wang Y, Li S, Chen CLP, Guo Y (2016) Neurocomputing 188:311

    Article  Google Scholar 

  25. Misra S, Tiwari V, Obaidat MS (2009) IEEE J Sel Areas Commun 27(4):466

    Article  Google Scholar 

  26. Xu Y, Wang J, Wu Q, Anpalagan A, Yao YD (2012) IEEE Trans Wirel Commun 11(4):1380

    Article  Google Scholar 

  27. Kumar N, Misra S, Obaidat MS (2015) IEEE Syst J 9(3):1081

    Article  Google Scholar 

  28. Misra S, Krishna PV, Saritha V, Agarwal H, Shu L, Obaidat MS (2015) IEEE Syst J 9(1):22

    Article  Google Scholar 

  29. Rezvanian A, Rahmati M, Meybodi M, Physica A (2014) Statistical Mechanics and its Applications 396:224

    Article  Google Scholar 

  30. Misra S, Krishna PV, Kalaiselvan K, Saritha V, Obaidat MS (2014) IEEE Trans Netw Serv Manag 11(1):15

    Article  Google Scholar 

  31. Zhong W, Xu Y, Wang J, Li D, Tianfield H (2014) EURASIP J Wirel Commun Netw 2014(1):1

    Article  Google Scholar 

  32. Jiang W, Zhao CL, Li SH, Chen L (2014) Neurocomputing 137:205

    Article  Google Scholar 

  33. Misra S, Krishna PV, Saritha V, Obaidat MS (2013) IEEE Commun Mag 51(1):98

    Article  Google Scholar 

  34. Narendra KS, Thathachar MA Learning automata: an introduction (Courier Corporation, 2012)

  35. Howell M, Gordon T, Brandao F (2002) IEEE Trans Syst Man Cybern Part B Cybern 32(6):804

    Article  Google Scholar 

  36. Haupt RL, Haupt SE (2004) Practical genetic algorithms. Wiley

  37. Zeng X, Liu Z (2005) Inf Sci 174(3):165

    Article  Google Scholar 

  38. Wu Q, Liao H (2013) Inf Sci 220:379

    Article  Google Scholar 

  39. Beigy H, Meybodi M (2005) Scientia Iranica 12(1):14

    MathSciNet  Google Scholar 

  40. Beigy H, Meybodi M (2006) J Frankl Inst 343(1):27

    Article  Google Scholar 

  41. Howell MN, Frost GP, Gordon TJ, Wu QH (1997) Mechatronics 7(3):263

    Article  Google Scholar 

  42. Rabaseda S, Rakotomalala R, Sebban M (1996) Inf Sci 92(1): 137

    Article  Google Scholar 

  43. Sakhnovich LA (2012) Interpolation theory and its applications, vol 428. Springer Science & Business Media

  44. Brochu E, Cora VM, de Freitas N (2009) CoRR abs/1012.2599

  45. Deb K (2015) . In: SP

Download references

Acknowledgments

This research work is funded by the National Science Foundation of China (61271316), Key Laboratory for Shanghai Integrated Information Security Management Technology Research, and Chinese National Engineering Laboratory for Information Content Analysis Technology.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ying Guo.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Guo, Y., Ge, H. & Li, S. A set of novel continuous action-set reinforcement learning automata models to optimize continuous functions. Appl Intell 46, 845–864 (2017). https://doi.org/10.1007/s10489-016-0853-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-016-0853-4

Keywords

Navigation