Skip to main content
Log in

A reward allocation method for reinforcement learning in stabilizing control tasks

  • Original Article
  • Published:
Artificial Life and Robotics Aims and scope Submit manuscript

Abstract

Reinforcement learning is an area of machine learning that does not require detailed teaching signals by a human, which is expected to be applied to real robots. In its application to real robots, the learning processes are required to be finished in a short learning period of time. A reinforcement learning method of model-free type has fast convergence speeds in the tasks such as Sutton’s maze problem that aims to reach the target state in a minimum time. However, these methods are difficult to learn task to keep a stable state as long as possible. In this study, we improve the reward allocation method for the stabilizing control tasks. In stabilizing control tasks, we use the Semi-Markov decision process as an environment model. The validity of our method is demonstrated through simulation for stabilizing control of an inverted pendulum.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Sutton RS,Barto AG (1998) Reinforcement learning an introduction. MIT Press, Cambridge

  2. Grefenstette JJ (1988) Credit assignment in rule discovery systems based on genetic algorithms. In: Shavlik JW, Dietterich TG (eds) Readings in machine learning. Kaufmann, San Mateo, pp 524–534

    Google Scholar 

  3. Ito J, Nakano K, Sakurama K, Hosokawa S (2008) Adaptive immunity based reinforcement learning. Artif Life Robot 13(1):188–193

    Article  Google Scholar 

  4. Peng J (1993) Efficient dynamic programming-based learning for control. Northeastern University

  5. Tyler S, Oliver J, Sannier A (2006) Verve: a general purpose open source reinforcement learning toolkit. In: ASME conference proceedings, vol 4255X, pp 359–369

  6. Watkins CJCH, Dayan P (1992) Technical note: q-learning. Mach Learn 8(3–4):279–292

    MATH  Google Scholar 

  7. Hosokawa S, Nakano K, Sakurama K (2010) A consideration of human immunity-based reinforcement learning with continuous states. Artif Life Robot 15(4):560–564

    Article  Google Scholar 

  8. Arai S, Sycara K, Payne TR (2000) Experience-based reinforcement learning to acquire effective behavior in a multi-agent domain

  9. Kazuteru M, Masayuki Y, Shigenobu K (1994) A theory of profit sharing in reinforcement learning. J Jpn Soc Artif Intell 9:580–587 (in japanese)

    Google Scholar 

  10. Zheng Y, Luo S, Lv Z (2006) Control double inverted pendulum by reinforcement learning with double cmac network. In: Proceedings of the 18th international conference on pattern recongnition, vol 4 of ICPR06

  11. Atsushi S, Tohgoroh M, Hirohisa S (2003) Profit sharing considering penalty. In: The 17th annual conference of the japanese society for artificial intelligence, pp 3F4-02 (in japanese)

  12. Sutton RS, Precup D, Singh S (1999) Between mdps and semi-mdps: a framework for temporal abstraction in reinforcement learning. Artif Intell 112:181–211

    Article  MathSciNet  MATH  Google Scholar 

  13. Rummery GA, Niranjan M (1994) On line q-learning using connectionist systems. Technical Report CUED/F-INFENG /TR 166, Department of Engineering, Cambridge University

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shu Hosokawa.

Additional information

This work was presented in part at the 17th International Symposium on Artificial Life and Robotics, Beppu, Oita, January 19–21, 2012.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hosokawa, S., Kato, J. & Nakano, K. A reward allocation method for reinforcement learning in stabilizing control tasks. Artif Life Robotics 19, 109–114 (2014). https://doi.org/10.1007/s10015-014-0146-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10015-014-0146-0

Keywords

Navigation