Abstract
Learning automata (LA) as a powerful tool for reinforcement learning which belongs to the subject of Artificial Intelligence, could search for the optimal state adaptively in a random environment. In the past decades quite a few FALA algorithms are maturely developed but exposing critical defects, when they are applied to optimize continuous functions. In order to overcome their shortcomings and explore a higher-performance LA, we propose a novel CALA algorithm to solve the function optimization problems via one kind of LA prototypes, i.e, the continuous action-set reinforcement learning automata, which is abbreviated as CARLA. The key mechanism of the proposed algorithm lies in a combination of equidistant discretization and linear interpolation. Specifically, four categories of application models are constructed. Two of them are created to obtain continuous actions when the priori information is finite ones, thus avoiding the drawbacks of FALA. The realization of this functionality recourses to the so-called cumulative distribution function (CDF) and a new concept of area surrounded by curves (AsbC) respectively. The other two models are modified versions to balance the trade-off between accuracy and speed. Moreover, these models are expanded to their generalized versions so that multidimensional function optimization problems can be handled as well. A massive amount of experiments including four benchmarks and three scenarios are designed to demonstrate the effectiveness and efficiency of the proposed application models. The proposed algorithm outperforms the state of the arts of LA as well as optimization algorithms, with a high accuracy rate, a fast convergence speed, and a competitive time consumption, especially in noised environments.
Similar content being viewed by others
Notes
The parameter space is the scope where we search for an optimum.
A could be either finite points {α 1,α 2,⋯ ,α r } or a continuous interval chosen from real line (α m i n ,α m a x ), corresponding to FALA and CALA respectively.
Throughout the paper, β=1 means the environment rewards the selected action to the maximum extent. And vice versa.
Through out the paper, D α does not change over time. That is, only a stationary random environment is considered.
It is exactly the order that we introduced the exsiting algorithms in Section 2.2.
The parameters of FALA are I: n=7; II: n=7; III: r=5; IV: D=600.
Different cases represent different initial parameters μ 0 and σ 0 in CALA, which are (3,5), (3,6), (-10,5), (-10,7), (10,5), (10,7) and (7,3) respectively.
References
Sutton RS, Barto AG (2013) IEEE Trans Neural Netw 9(5):1054
Thathachar M, Sastry PS (2002) IEEE Trans Syst Man Cybern B Cybern 32(6):711
Tsetlin M (1961) Avtomat I Telemekh 22(10):1345
Varshavskii V, Vorontsova I (1963) Avtomatika i Telemekhanika 24(3):353
Thathachar M, Oommen B (1979) J Cybern Inf Sci 2(1):24
Thathachar ML, Sastry PS (1985) IEEE Trans Syst Man Cybern 1:168
Papadimitriou GI, Sklira M, Pomportsis AS (2004) IEEE Trans Syst Man Cybern Part B Cybern 34 (1):246
Zhang X, Granmo OC, Oommen BJ (2013) Appl Intell 39(4):782
Ge H, Jiang W, Li S, Li J, Wang Y, Jing Y (2015) Appl Intell 42(2):262
Zhang J, Wang C, Zhou M (2014) IEEE Trans Cybern 44(12):2484
Zhang J, Wang C, Zhou M (2015) IEEE Trans Cybern 45(10):2089
Oommen BJ (1997) IEEE Trans Syst Man Cybern Part B Cybern 27(4):733
Oommen BJ, Raghunath G (1998) IEEE Trans Syst Man Cybern Part B Cybern 28(6):947
Oommen BJ, Raghunath G, Kuipers B (2006) IEEE Trans Syst Man Cybern Part B Cybern 36(4):820
Huang DS, Jiang W (2012) IEEE Trans Syst Man Cybern Part B Cybern 42(5):1489
Yazidi A, Granmo OC, Oommen BJ, Goodwin M (2014) IEEE Trans Cybern 44(11):2202
Jiang W, Huang DS, Li S (2015)
Oommen BJ, Granmo OC, Pedersen A (2007) . In: IEEE Symposium on Computational Intelligence and Games, 2007. CIG 2007, pp 161–167
Calitoiu D (2009)
Maravall D, De Lope J, Fuentes JP (2013) Pattern Recogn Lett 34(14):1719
Cuevas E, Wario F, Zaldivar D, Pérez-Cisneros M . In: Artificial Intelligence, Evolutionary Computing and Metaheuristics (Springer 2013), pp. 545–570
Oommen BJ, Hashem MK (2010) IEEE Trans Syst Man Cybern Part B Cybern 40(2):481
Oommen BJ, Hashem MK (2013) IEEE transactions on cybernetics 43(6):2020
Ge H, Wang Y, Li S, Chen CLP, Guo Y (2016) Neurocomputing 188:311
Misra S, Tiwari V, Obaidat MS (2009) IEEE J Sel Areas Commun 27(4):466
Xu Y, Wang J, Wu Q, Anpalagan A, Yao YD (2012) IEEE Trans Wirel Commun 11(4):1380
Kumar N, Misra S, Obaidat MS (2015) IEEE Syst J 9(3):1081
Misra S, Krishna PV, Saritha V, Agarwal H, Shu L, Obaidat MS (2015) IEEE Syst J 9(1):22
Rezvanian A, Rahmati M, Meybodi M, Physica A (2014) Statistical Mechanics and its Applications 396:224
Misra S, Krishna PV, Kalaiselvan K, Saritha V, Obaidat MS (2014) IEEE Trans Netw Serv Manag 11(1):15
Zhong W, Xu Y, Wang J, Li D, Tianfield H (2014) EURASIP J Wirel Commun Netw 2014(1):1
Jiang W, Zhao CL, Li SH, Chen L (2014) Neurocomputing 137:205
Misra S, Krishna PV, Saritha V, Obaidat MS (2013) IEEE Commun Mag 51(1):98
Narendra KS, Thathachar MA Learning automata: an introduction (Courier Corporation, 2012)
Howell M, Gordon T, Brandao F (2002) IEEE Trans Syst Man Cybern Part B Cybern 32(6):804
Haupt RL, Haupt SE (2004) Practical genetic algorithms. Wiley
Zeng X, Liu Z (2005) Inf Sci 174(3):165
Wu Q, Liao H (2013) Inf Sci 220:379
Beigy H, Meybodi M (2005) Scientia Iranica 12(1):14
Beigy H, Meybodi M (2006) J Frankl Inst 343(1):27
Howell MN, Frost GP, Gordon TJ, Wu QH (1997) Mechatronics 7(3):263
Rabaseda S, Rakotomalala R, Sebban M (1996) Inf Sci 92(1): 137
Sakhnovich LA (2012) Interpolation theory and its applications, vol 428. Springer Science & Business Media
Brochu E, Cora VM, de Freitas N (2009) CoRR abs/1012.2599
Deb K (2015) . In: SP
Acknowledgments
This research work is funded by the National Science Foundation of China (61271316), Key Laboratory for Shanghai Integrated Information Security Management Technology Research, and Chinese National Engineering Laboratory for Information Content Analysis Technology.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Guo, Y., Ge, H. & Li, S. A set of novel continuous action-set reinforcement learning automata models to optimize continuous functions. Appl Intell 46, 845–864 (2017). https://doi.org/10.1007/s10489-016-0853-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-016-0853-4