Enhanced Reinforcement Learning by Recursive Updating of Q-values for Reward Propagation

Sung, Yunsick; Ahn, Eunyoung; Cho, Kyungeun

doi:10.1007/978-94-007-5860-5_121

Yunsick Sung⁵,
Eunyoung Ahn⁶ &
Kyungeun Cho⁷

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 215))

941 Accesses

Abstract

In this paper, we propose a method to reduce the learning time of Q-learning by combining the method of updating even to Q-values of unexecuted actions with the method of adding a terminal reward to unvisited Q-values. To verify the method, its performance was compared to that of conventional Q-learning. The proposed approach showed the same performance as conventional Q-learning, with only 27 % of the learning episodes required for conventional Q-learning. Accordingly, we verified that the proposed method reduced learning time by updating more Q-values in the early stage of learning and distributing a terminal reward to more Q-values.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Sung Y, Cho K (2012) Collaborative programming by demonstration for human, robot, and software agent team members in a virtual environment. IEEE Intell Syst 27(2):674–679
Google Scholar
Watkins CJCH, Dayan P (1992) Q-learning. Mach Learn 8:279–292
MATH Google Scholar
Melo FS, Ribeiro MI (2007) Q-learning with linear function approximation. In: Learning theory: 20th annual conference on learning theory, Lecture notes in artificial intelligence (LNAI), vol 4539, pp 308–322
Google Scholar
Thomaz AL, Hoffman G, Breazeal C (2006) Reinforcement learning with human teachers: understanding how people want to teach robots. In: the 15th IEEE International Symposium on Robot Hum Interact Commun pp 352–257
Google Scholar
Jeong SI, Lee YJ (2001) Fuzzy Q-learning using distributed eligibility. J Fuzzy Log Intell Syst 11(5):388–394
Google Scholar
Kormushev P, Nomoto K, Dong F, Hirota K (2008) Time manipulation technique for speeding up reinforcement learning in simulations. Int J Cybern Inf Technol 8(1):12–24
MathSciNet Google Scholar
Moore AW, Atkeson CG (1993) Prioritized sweeping: reinforcement learning with less data and less time. Mach Learn 13:103–130
Google Scholar
Jeong SI, Lee YJ (2001) Fuzzy Q-learning using distributed eligibility. J Fuzzy Logic Intell Syst 11(5):388–394
Google Scholar
Singh S, Sutton RS, Kaelbling P (1996) Reinforcement learning with replacing eligibility traces. Mach Learn 22:123–158
Google Scholar
Lee SG (2006) A cooperation online reinforcement learning approach in Ant-Q. Lecture notes in computer science (LNCS) 4232, pp 487–494
Google Scholar
Wiering MA (2004) QV(λ)-learning: a new on-policy reinforcement learning algorithm. Mach Learn 55(1):5–29
Google Scholar
Peng J, Williams RJ (1994) Incremental multi-step Q-learning. Mach Learn 226–232
Google Scholar
McGovern A, Sutton RS, Fagg AH (1997) Roles of macro-actions in accelerating reinforcement learning. In: Grace Hopper celebration of women in computing, pp 13–18
Google Scholar
Kim BC, Yun BJ (1999) Reinforcement learning using propagation of goal-state-value. J Korea Inf Process 6(5):1303–1311
Google Scholar

Download references

Acknowledgments

This work was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education, Science and Technology (2011-0011266).

Author information

Authors and Affiliations

Department of Game Engineering, Graduate School, Dongguk University, 26, Pil-dong 3-ga Jung-gu, Seoul, Korea
Yunsick Sung
Department of Multimedia Engineering, Hanbat National University, San 16-1, Duckmyeong-dong Yuseong-gu, Deajeon, South Korea
Eunyoung Ahn
Department of Multimedia Engineering, Dongguk Unversity, 26, Pil-dong 3-ga Jung-gu, Seoul, Korea
Kyungeun Cho

Authors

Yunsick Sung
View author publications
You can also search for this author in PubMed Google Scholar
Eunyoung Ahn
View author publications
You can also search for this author in PubMed Google Scholar
Kyungeun Cho
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kyungeun Cho .

Editor information

Editors and Affiliations

Convergence Security, Kyoung-gi University, Suwon, Gyeonggi-do, Korea, Republic of (South Korea)
Kuinam J. Kim
Dept. of Computer Information Engineerin, Sangji University, Wonju-si Gangwon-do, Korea, Republic of (South Korea)
Kyung-Yong Chung

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sung, Y., Ahn, E., Cho, K. (2013). Enhanced Reinforcement Learning by Recursive Updating of Q-values for Reward Propagation. In: Kim, K., Chung, KY. (eds) IT Convergence and Security 2012. Lecture Notes in Electrical Engineering, vol 215. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-5860-5_121

Download citation

DOI: https://doi.org/10.1007/978-94-007-5860-5_121
Published: 11 December 2012
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-007-5859-9
Online ISBN: 978-94-007-5860-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics