A reward allocation method for reinforcement learning in stabilizing control tasks

Hosokawa, Shu; Kato, Joji; Nakano, Kazushi

doi:10.1007/s10015-014-0146-0

A reward allocation method for reinforcement learning in stabilizing control tasks

Original Article
Published: 29 July 2014

Volume 19, pages 109–114, (2014)
Cite this article

Artificial Life and Robotics Aims and scope Submit manuscript

Shu Hosokawa¹,
Joji Kato¹ &
Kazushi Nakano¹

261 Accesses
4 Citations
Explore all metrics

Abstract

Reinforcement learning is an area of machine learning that does not require detailed teaching signals by a human, which is expected to be applied to real robots. In its application to real robots, the learning processes are required to be finished in a short learning period of time. A reinforcement learning method of model-free type has fast convergence speeds in the tasks such as Sutton’s maze problem that aims to reach the target state in a minimum time. However, these methods are difficult to learn task to keep a stable state as long as possible. In this study, we improve the reward allocation method for the stabilizing control tasks. In stabilizing control tasks, we use the Semi-Markov decision process as an environment model. The validity of our method is demonstrated through simulation for stabilizing control of an inverted pendulum.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reinforcement Learning Algorithm for Partially Observable Environment Considering State Estimation Error Prediction

Inverse Reinforcement Learning Based on Behaviors of a Learning Agent

A Neurodynamic Approach to Stabilization of a 10 DOF Biped Mechanism Using Reinforcement Learning

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Sutton RS,Barto AG (1998) Reinforcement learning an introduction. MIT Press, Cambridge
Grefenstette JJ (1988) Credit assignment in rule discovery systems based on genetic algorithms. In: Shavlik JW, Dietterich TG (eds) Readings in machine learning. Kaufmann, San Mateo, pp 524–534
Google Scholar
Ito J, Nakano K, Sakurama K, Hosokawa S (2008) Adaptive immunity based reinforcement learning. Artif Life Robot 13(1):188–193
Article Google Scholar
Peng J (1993) Efficient dynamic programming-based learning for control. Northeastern University
Tyler S, Oliver J, Sannier A (2006) Verve: a general purpose open source reinforcement learning toolkit. In: ASME conference proceedings, vol 4255X, pp 359–369
Watkins CJCH, Dayan P (1992) Technical note: q-learning. Mach Learn 8(3–4):279–292
MATH Google Scholar
Hosokawa S, Nakano K, Sakurama K (2010) A consideration of human immunity-based reinforcement learning with continuous states. Artif Life Robot 15(4):560–564
Article Google Scholar
Arai S, Sycara K, Payne TR (2000) Experience-based reinforcement learning to acquire effective behavior in a multi-agent domain
Kazuteru M, Masayuki Y, Shigenobu K (1994) A theory of profit sharing in reinforcement learning. J Jpn Soc Artif Intell 9:580–587 (in japanese)
Google Scholar
Zheng Y, Luo S, Lv Z (2006) Control double inverted pendulum by reinforcement learning with double cmac network. In: Proceedings of the 18th international conference on pattern recongnition, vol 4 of ICPR06
Atsushi S, Tohgoroh M, Hirohisa S (2003) Profit sharing considering penalty. In: The 17th annual conference of the japanese society for artificial intelligence, pp 3F4-02 (in japanese)
Sutton RS, Precup D, Singh S (1999) Between mdps and semi-mdps: a framework for temporal abstraction in reinforcement learning. Artif Intell 112:181–211
Article MathSciNet MATH Google Scholar
Rummery GA, Niranjan M (1994) On line q-learning using connectionist systems. Technical Report CUED/F-INFENG /TR 166, Department of Engineering, Cambridge University

Download references

Author information

Authors and Affiliations

Department of Electronic Engineering, The University of Electro-Communications, 1-5-1 Chofu-ga-oka, Tokyo, Chofu, 182-8585, Japan
Shu Hosokawa, Joji Kato & Kazushi Nakano

Authors

Shu Hosokawa
View author publications
You can also search for this author inPubMed Google Scholar
Joji Kato
View author publications
You can also search for this author inPubMed Google Scholar
Kazushi Nakano
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Shu Hosokawa.

Additional information

This work was presented in part at the 17th International Symposium on Artificial Life and Robotics, Beppu, Oita, January 19–21, 2012.

About this article

Cite this article

Hosokawa, S., Kato, J. & Nakano, K. A reward allocation method for reinforcement learning in stabilizing control tasks. Artif Life Robotics 19, 109–114 (2014). https://doi.org/10.1007/s10015-014-0146-0

Download citation

Received: 14 September 2012
Accepted: 19 May 2014
Published: 29 July 2014
Issue Date: September 2014
DOI: https://doi.org/10.1007/s10015-014-0146-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A reward allocation method for reinforcement learning in stabilizing control tasks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Reinforcement Learning Algorithm for Partially Observable Environment Considering State Estimation Error Prediction

Inverse Reinforcement Learning Based on Behaviors of a Learning Agent

A Neurodynamic Approach to Stabilization of a 10 DOF Biped Mechanism Using Reinforcement Learning

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now