Abstract
Reinforcement learning is an area of machine learning that does not require detailed teaching signals by a human, which is expected to be applied to real robots. In its application to real robots, the learning processes are required to be finished in a short learning period of time. A reinforcement learning method of model-free type has fast convergence speeds in the tasks such as Sutton’s maze problem that aims to reach the target state in a minimum time. However, these methods are difficult to learn task to keep a stable state as long as possible. In this study, we improve the reward allocation method for the stabilizing control tasks. In stabilizing control tasks, we use the Semi-Markov decision process as an environment model. The validity of our method is demonstrated through simulation for stabilizing control of an inverted pendulum.






Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Sutton RS,Barto AG (1998) Reinforcement learning an introduction. MIT Press, Cambridge
Grefenstette JJ (1988) Credit assignment in rule discovery systems based on genetic algorithms. In: Shavlik JW, Dietterich TG (eds) Readings in machine learning. Kaufmann, San Mateo, pp 524–534
Ito J, Nakano K, Sakurama K, Hosokawa S (2008) Adaptive immunity based reinforcement learning. Artif Life Robot 13(1):188–193
Peng J (1993) Efficient dynamic programming-based learning for control. Northeastern University
Tyler S, Oliver J, Sannier A (2006) Verve: a general purpose open source reinforcement learning toolkit. In: ASME conference proceedings, vol 4255X, pp 359–369
Watkins CJCH, Dayan P (1992) Technical note: q-learning. Mach Learn 8(3–4):279–292
Hosokawa S, Nakano K, Sakurama K (2010) A consideration of human immunity-based reinforcement learning with continuous states. Artif Life Robot 15(4):560–564
Arai S, Sycara K, Payne TR (2000) Experience-based reinforcement learning to acquire effective behavior in a multi-agent domain
Kazuteru M, Masayuki Y, Shigenobu K (1994) A theory of profit sharing in reinforcement learning. J Jpn Soc Artif Intell 9:580–587 (in japanese)
Zheng Y, Luo S, Lv Z (2006) Control double inverted pendulum by reinforcement learning with double cmac network. In: Proceedings of the 18th international conference on pattern recongnition, vol 4 of ICPR06
Atsushi S, Tohgoroh M, Hirohisa S (2003) Profit sharing considering penalty. In: The 17th annual conference of the japanese society for artificial intelligence, pp 3F4-02 (in japanese)
Sutton RS, Precup D, Singh S (1999) Between mdps and semi-mdps: a framework for temporal abstraction in reinforcement learning. Artif Intell 112:181–211
Rummery GA, Niranjan M (1994) On line q-learning using connectionist systems. Technical Report CUED/F-INFENG /TR 166, Department of Engineering, Cambridge University
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was presented in part at the 17th International Symposium on Artificial Life and Robotics, Beppu, Oita, January 19–21, 2012.
About this article
Cite this article
Hosokawa, S., Kato, J. & Nakano, K. A reward allocation method for reinforcement learning in stabilizing control tasks. Artif Life Robotics 19, 109–114 (2014). https://doi.org/10.1007/s10015-014-0146-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10015-014-0146-0