Abstract
In this paper, we propose a method to reduce the learning time of Q-learning by combining the method of updating even to Q-values of unexecuted actions with the method of adding a terminal reward to unvisited Q-values. To verify the method, its performance was compared to that of conventional Q-learning. The proposed approach showed the same performance as conventional Q-learning, with only 27 % of the learning episodes required for conventional Q-learning. Accordingly, we verified that the proposed method reduced learning time by updating more Q-values in the early stage of learning and distributing a terminal reward to more Q-values.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Sung Y, Cho K (2012) Collaborative programming by demonstration for human, robot, and software agent team members in a virtual environment. IEEE Intell Syst 27(2):674–679
Watkins CJCH, Dayan P (1992) Q-learning. Mach Learn 8:279–292
Melo FS, Ribeiro MI (2007) Q-learning with linear function approximation. In: Learning theory: 20th annual conference on learning theory, Lecture notes in artificial intelligence (LNAI), vol 4539, pp 308–322
Thomaz AL, Hoffman G, Breazeal C (2006) Reinforcement learning with human teachers: understanding how people want to teach robots. In: the 15th IEEE International Symposium on Robot Hum Interact Commun pp 352–257
Jeong SI, Lee YJ (2001) Fuzzy Q-learning using distributed eligibility. J Fuzzy Log Intell Syst 11(5):388–394
Kormushev P, Nomoto K, Dong F, Hirota K (2008) Time manipulation technique for speeding up reinforcement learning in simulations. Int J Cybern Inf Technol 8(1):12–24
Moore AW, Atkeson CG (1993) Prioritized sweeping: reinforcement learning with less data and less time. Mach Learn 13:103–130
Jeong SI, Lee YJ (2001) Fuzzy Q-learning using distributed eligibility. J Fuzzy Logic Intell Syst 11(5):388–394
Singh S, Sutton RS, Kaelbling P (1996) Reinforcement learning with replacing eligibility traces. Mach Learn 22:123–158
Lee SG (2006) A cooperation online reinforcement learning approach in Ant-Q. Lecture notes in computer science (LNCS) 4232, pp 487–494
Wiering MA (2004) QV(λ)-learning: a new on-policy reinforcement learning algorithm. Mach Learn 55(1):5–29
Peng J, Williams RJ (1994) Incremental multi-step Q-learning. Mach Learn 226–232
McGovern A, Sutton RS, Fagg AH (1997) Roles of macro-actions in accelerating reinforcement learning. In: Grace Hopper celebration of women in computing, pp 13–18
Kim BC, Yun BJ (1999) Reinforcement learning using propagation of goal-state-value. J Korea Inf Process 6(5):1303–1311
Acknowledgments
This work was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education, Science and Technology (2011-0011266).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media Dordrecht
About this paper
Cite this paper
Sung, Y., Ahn, E., Cho, K. (2013). Enhanced Reinforcement Learning by Recursive Updating of Q-values for Reward Propagation. In: Kim, K., Chung, KY. (eds) IT Convergence and Security 2012. Lecture Notes in Electrical Engineering, vol 215. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-5860-5_121
Download citation
DOI: https://doi.org/10.1007/978-94-007-5860-5_121
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-007-5859-9
Online ISBN: 978-94-007-5860-5
eBook Packages: EngineeringEngineering (R0)