Skip to main content
Log in

Development of a reinforcement learning system to play Othello

  • Original Article
  • Published:
Artificial Life and Robotics Aims and scope Submit manuscript

Abstract

The purpose of the reinforcement learning system is to learn an optimal policy in general. On the other hand, in two-player games such as Othello, it is important to acquire a penalty-avoiding policy that can avoid losing the game. We know the penalty avoiding rational policy making algorithm (PARP) to learn the policy. If we apply PARP to large-scale problems, we are confronted with an explosion of the number of states. In this article, we focus on Othello, a game that has huge state spaces. We introduce several ideas and heuristics to adapt PARP to Othello. We show that our learning player beats the well-known Othello program, KITTY.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Sutton RS, Barto A (1998) Reinforcement learning: an introduction. A Bradford Book. The MIT Press, Cambridge

    Google Scholar 

  2. Watkins CJH, Dayan P (1992) Technical note: Q-learning. Mach Learn 8:55–68

    Google Scholar 

  3. Miyazaki K, Yamamura M, Kobayashi S (1997) k-Certainty exploration method: an action selector on reinforcement learning to identify the environment. Artif Intell 91:155–171

    Article  MATH  Google Scholar 

  4. Miyazaki K, Kobayashi S (2000) Reinforcement learning for penalty avoiding policy making. 2000 IEEE International Conference on Systems, Man and Cybernetics, Nashville, October, 2000, pp 206–211

  5. Matsubara H (1995) Recent progresses on game programming researches (in Japanese). J Jpn Soc Artif Intell 10:835–845

    Google Scholar 

  6. Miyazaki K, Kobayashi S (1999) On the rationality of profit sharing in partially observable markov decision processes. Proceedings of the 5th International Conference on Information Systems Analysis and Synthesis, pp 190–197

  7. Miyazaki K, Kobayashi S (2001) Rationality of reward sharing in multi-agent reinforcement learning. New Generat Comput 19:157–172

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kazuteru Miyazaki.

About this article

Cite this article

Miyazaki, K., Tsuboi, S. & Kobayashi, S. Development of a reinforcement learning system to play Othello. Artificial Life and Robotics 7, 177–181 (2004). https://doi.org/10.1007/BF02471202

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02471202

Key words

Navigation