Development of a reinforcement learning system to play Othello

Miyazaki, Kazuteru; Tsuboi, Sougo; Kobayashi, Shigenobu

doi:10.1007/BF02471202

Development of a reinforcement learning system to play Othello

Original Article
Published: April 2004

Volume 7, pages 177–181, (2004)
Cite this article

Artificial Life and Robotics Aims and scope Submit manuscript

Kazuteru Miyazaki¹,
Sougo Tsuboi² &
Shigenobu Kobayashi³

76 Accesses
1 Citation
Explore all metrics

Abstract

The purpose of the reinforcement learning system is to learn an optimal policy in general. On the other hand, in two-player games such as Othello, it is important to acquire a penalty-avoiding policy that can avoid losing the game. We know the penalty avoiding rational policy making algorithm (PARP) to learn the policy. If we apply PARP to large-scale problems, we are confronted with an explosion of the number of states. In this article, we focus on Othello, a game that has huge state spaces. We introduce several ideas and heuristics to adapt PARP to Othello. We show that our learning player beats the well-known Othello program, KITTY.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Sutton RS, Barto A (1998) Reinforcement learning: an introduction. A Bradford Book. The MIT Press, Cambridge
Google Scholar
Watkins CJH, Dayan P (1992) Technical note: Q-learning. Mach Learn 8:55–68
Google Scholar
Miyazaki K, Yamamura M, Kobayashi S (1997) k-Certainty exploration method: an action selector on reinforcement learning to identify the environment. Artif Intell 91:155–171
Article MATH Google Scholar
Miyazaki K, Kobayashi S (2000) Reinforcement learning for penalty avoiding policy making. 2000 IEEE International Conference on Systems, Man and Cybernetics, Nashville, October, 2000, pp 206–211
Matsubara H (1995) Recent progresses on game programming researches (in Japanese). J Jpn Soc Artif Intell 10:835–845
Google Scholar
Miyazaki K, Kobayashi S (1999) On the rationality of profit sharing in partially observable markov decision processes. Proceedings of the 5th International Conference on Information Systems Analysis and Synthesis, pp 190–197
Miyazaki K, Kobayashi S (2001) Rationality of reward sharing in multi-agent reinforcement learning. New Generat Comput 19:157–172
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

National Institution for Academic Degrees and University Evaluation, 1-29-1 Gakuen-nishimachi, Kodaira, 187-8587, Tokyo, Japan
Kazuteru Miyazaki
Toshiba, Kawasaki, Kanagawa, Japan
Sougo Tsuboi
Tokyo Institute of Technology, Yokohama, Kanagawa, Japan
Shigenobu Kobayashi

Authors

Kazuteru Miyazaki
View author publications
You can also search for this author in PubMed Google Scholar
Sougo Tsuboi
View author publications
You can also search for this author in PubMed Google Scholar
Shigenobu Kobayashi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kazuteru Miyazaki.

About this article

Cite this article

Miyazaki, K., Tsuboi, S. & Kobayashi, S. Development of a reinforcement learning system to play Othello. Artificial Life and Robotics 7, 177–181 (2004). https://doi.org/10.1007/BF02471202

Download citation

Received: 29 September 2003
Accepted: 29 September 2003
Issue Date: April 2004
DOI: https://doi.org/10.1007/BF02471202

Key words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Development of a reinforcement learning system to play Othello

Abstract

Access this article

Similar content being viewed by others

Reinforcement Learning for N-player Games: The Importance of Final Adaptation

Play Ms. Pac-Man Using an Advanced Reinforcement Learning Agent

Playout Policy Adaptation for Games

References

Author information

Authors and Affiliations

Corresponding author

About this article

Cite this article

Key words

Navigation

Development of a reinforcement learning system to play Othello

Abstract

Access this article

Similar content being viewed by others

Reinforcement Learning for N-player Games: The Importance of Final Adaptation

Play Ms. Pac-Man Using an Advanced Reinforcement Learning Agent

Playout Policy Adaptation for Games

References

Author information

Authors and Affiliations

Corresponding author

About this article

Cite this article

Share this article

Key words

Search

Navigation