Improvement of move naturalness for playing good-quality games with middle-level players

Hsueh, Chu-Hsuan; Ikeda, Kokolo

doi:10.1007/s10489-023-05210-2

Improvement of move naturalness for playing good-quality games with middle-level players

Published: 13 January 2024

Volume 54, pages 1637–1655, (2024)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

147 Accesses
Explore all metrics

Abstract

In the game field, computer programs have surpassed top human players in many games. A well-known example is AlphaZero. These strong programs provide human players with opportunities to improve their skills. However, human players may not enjoy such strong opponents. To make middle-level players learn from playing good-quality games with strong programs, we have proposed to combine programs with distinct roles in our previous research. One role is a superhuman program that proposes and accurately evaluates candidate moves. The other role is a naturalness (or human likeness) evaluator. Candidate moves are evaluated by combining the two roles using a function, and the moves with the highest scores are played. This study builds upon our earlier work to further improve the naturalness of moves. First, we propose a search mechanism inspired by the sequential halving algorithm to decide candidate moves and the moves to play. Second, we propose a new score function to address several issues of the previous approach. We conduct experiments to compare the proposed approaches with several existing approaches. The results show that the move naturalness of he proposed approaches is greatly improved and that performance in other aspects is at least as good as existing approaches.

Graphical abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Algorithm 1

Algorithm 2

General Game Playing

A Fundamental Study of a Computer Player Giving Fun To the Opponent: Targeting Hanafuda, a Card Game in Japan

Availability of data and materials

The game records generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

Notes

This is widely known among Go players and has been verified by us with necessary data. An average Euclidean distance for a KataGo’s move to its previous move was 4.649, while that of middle-level human players was 3.426. KataGo used 1,000 simulations per move, where the program version was 20d34784703c5b4000643d3ccc43bb37d418f3b5 and the neural network version was kata1-b40c256-s9948109056-d2425397051. The middle-level human players’ data is from Table 4.
KGS is an online Go server for players (including humans and computer programs) to play Go, which also assigns rankings to the players, https://www.gokgs.com.
https://sjeng.org/zero/best v1.txt.zip, released along with the Leela Zero project [12]. The input of the network contains 17 binary planes of size 19\(\times \)19, where 16=8\(\times \)2 planes represent the black and white stones (\(\times \)2) for the 8 most recent board states and the remaining 1 plane to indicate which player is going to play. The output of the network contains a 362-dimensional policy for possible moves (19\(\times \)19 intersections + PASS) and a 1-dimensional value to predict the degree of advantage of the player to move. The network was trained using strong human players’ games that were openly available, referring to https://github.com/leela-zero/leela-zero/issues/628.
For games other than Go, we consider that the idea of separating roles is also applicable. Taking chess as an example, one may use Stockfish [25] as the superhuman role and Maia [20] as the naturalness evaluator.
The term and notation used in our previous work were ideal loss and \(l^*\). However, the word “ideal” was misleading, and “target” is more proper.
https://www.gnu.org/software/gnugo/
https://github.com/pasky/pachi
We confirmed that moves within the top 25 generally contained those possibly considered by human players. We also confirmed that moves outside the top 25 generally need not be considered, though in some rare cases, some critical moves are not included in the top 25, as shown in Section 4.1.1. The number 25 resulted in a good balance between strength adjustment and thinking time.
https://github.com/featurecat/go-dataset
The program version was 20d34784703c5b4000643d3ccc43bb37d418f3b5 and the neural network was kata1-b40c256-s9948109056-d2425397051.
Another way to create weaker players is to do supervised learning on weaker players’ games, as McIlroy-Young et al. [20] did. However, it is much more expensive to obtain neural networks with desired strength.
With an optimistic komi of k, KataGo evaluated the advantages with k more points. Assume that we have a board state close to terminal games with two candidate moves, one leading to a win of \(+\)0.5 points and the other to a loss of −0.5 points. When doing MCTS with the normal komi, the former move’s win rate is close to 100% while the latter move’s close to 0%; thus, the latter move is rarely visited and is deleted due to the thresholds of visit counts (i.e., \(n_{max}\times R_{th}\) for P3 and 10 for P4 (line 5 in Algorithm 1)). In the experiments, k was set to 4. For the same example, KataGo evaluates the former move as a win of \(+\)4.5 and the latter move as a win of \(+\)3.5, both with win rates close to 100%; thus, the latter move has a chance of being selected.
Conversely, moves’ advantages might also be underestimated with only a few simulations, and the new search mechanism of P5 could also alleviate this problem. However, this issue does not relate to playing moves with big losses and is beyond the scope of this discussion.
https://en.wikipedia.org/wiki/Phi coefficient

References

Silver D, Hubert T, Schrittwieser J, Antonoglou I, Lai M, Guez A, Lanctot M, Sifre L, Kumaran D, Graepel T, Lillicrap TP, Simonyan K, Hassabis D (2018) A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science 352(6419):1140–1144. https://doi.org/10.1126/science.aar6404
Article ADS MathSciNet CAS Google Scholar
Hollosi A, Pahle M (2018) Teaching Game at Sensei’s Library. https://senseis.xmp.net/?TeachingGame. Accessed 16 Dec 2022
Hsueh C-H, Ikeda K (2022) Playing good-quality games with weak players by combining programs with different roles. In: 2022 IEEE Conf. on Games (CoG), pp 612–615. https://doi.org/10.1109/CoG51982.2022.9893698
Karnin Z, Koren T, Somekh O (2013) Almost optimal exploration in multi-armed bandits. In: Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research , vol 28, pp 1238–1246. https://proceedings.mlr.press/v28/karnin13.html
Fabiano N, Cazenave T (2022) Sequential halving using scores. In: Lecture Notes in Computer Science, pp 41–52. https://doi.org/10.1007/978-3-031-11488-5_4
van den Herik HJ, Uiterwijk JWHM, van Rijswijck J (2002) Games solved: Now and in the future. Artif Intell 134(1–2):277–311. https://doi.org/10.1016/S0004-3702(01)00152-7
Article Google Scholar
Coulom R (2007) Efficient selectivity and backup operators in Monte-Carlo tree search. In: Computers and Games, pp 72–83. https://doi.org/10.1007/978-3-540-75538-8_7
Chaslot GMJB, Winands MHM, van den Herik HJ, Uiterwijk JWHM, Bouzy B (2008) Progressive strategies for Monte Carlo tree search. New Math Nat Comput 4(3):343–357. https://doi.org/10.1142/S1793005708001094
Article MathSciNet Google Scholar
Ikeda K, Viennot S (2013) Efficiency of static knowledge bias in Monte-Carlo tree search. In: The 8th international conference on computers and games (CG 2013), pp 26–38. https://doi.org/10.1007/978-3-319-09165-5_3
Silver D, Huang A, Maddison CJ, Guez A, Sifre L, van den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, Dieleman S, Grewe D, Nham J, Kalchbrenner N, Sutskever I, Lillicrap TP, Leach M, Kavukcuoglu K, Graepel T, Hassabis D (2016) Mastering the game of Go with deep neural networks and tree search. Nature 529(7587):484–489. https://doi.org/10.1038/nature16961
Article ADS CAS PubMed Google Scholar
Tian Y, Ma J, Gong Q, Sengupta S, Chen Z, Pinkerton J, Zitnick L (2019) ELF OpenGo: An analysis and open reimplementation of AlphaZero. In: Chaudhuri K, Salakhutdinov R (eds) Proceedings of the 36th international conference on machine learning (ICML 2019). Proceedings of Machine Learning Research vol 97, pp 6244–6253. https://proceedings.mlr.press/v97/tian19a.html
Leela Zero (2017) GitHub - leela-zero/leela-zero: Go engine with no human-provided knowledge, modeled after the AlphaGo Zero paper. https://github.com/leela-zero. Accessed 31 Oct 2022
Wu DJ (2020) Accelerating self-play learning in Go. In: The 34th AAAI Conf. Artif. Intell. (AAAI-20). Workshop Reinforcement Learn Games. https://arxiv.org/abs/1902.10565
Moschovitis P, Denisova A (2023) Keep calm and aim for the head: Biofeedback-controlled dynamic difficulty adjustment in a horror game. IEEE Trans on Games 15(3):368–377. https://doi.org/10.1109/tg.2022.3179842
Shohieb SM, Doenyas C, Elhady AM (2022) Dynamic difficulty adjustment technique-based mobile vocabulary learning game for children with autism spectrum disorder. Entertain Comput 42:100495. https://doi.org/10.1016/j.entcom.2022.100495
Article Google Scholar
Sephton N, Cowling PI, Slaven NH (2015) An experimental study of action selection mechanisms to create an entertaining opponent. In: 2015 IEEE Conf. on Comput. Intell. and Games (CIG), pp 122–129. https://doi.org/10.1109/CIG.2015.7317939
Liu A-J, Wu T-R, Wu I-C, Guei H, Wei T-H (2020) Strength adjustment and assessment for MCTS-based programs [research frontier]. IEEE Comput Intell Mag 15(3):60–73. https://doi.org/10.1109/mci.2020.2998315
Nakamichi T, Ito T (2018) Adjusting the evaluation function for weakening the competency level of a computer shogi program. ICGA J 40(1):15–31. https://doi.org/10.3233/ICG-180042
Article Google Scholar
Rosemarin H, Rosenfeld A (2019) Playing chess at a human desired level and style. In: the 7th Int. Conf. on Human-Agent Interact., pp 76–80. https://doi.org/10.1145/3349537.3351904
McIlroy-Young R, Sen S, Kleinberg J, Anderson A (2020) Aligning superhuman AI with human behavior. In: the 26th ACM SIGKDD Int. Conf. on Knowl. Discovery & Data Mining, pp 1677–1687. https://doi.org/10.1145/3394486.3403219
Jacob AP, Wu DJ, Farina G, Lerer A, Hu H, Bakhtin A, Andreas J, Brown N (2022) Modeling strong and human-like gameplay with KL-regularized search. In: Proceedings of the 39th international conference on machine learning. Proceedings of Machine Learning Research, vol 162, pp 9695–9728. https://proceedings.mlr.press/v162/jacob22a.html
Baier H, Sattaur A, Powley EJ, Devlin S, Rollason J, Cowling PI (2019) Emulating human play in a leading mobile card game. IEEE Trans on Games 11(4):386–395. https://doi.org/10.1109/TG.2018.2835764
Article Google Scholar
Shi Y, Fan T, Li W, Hsueh C-H, Ikeda K (2021) Position control and production of various strategies for game of Go using deep learning methods. J of Inf Sci Eng 37(3):553–573. https://doi.org/10.6688/JISE.202105_37(3).0004
Article Google Scholar
Moon J, Choi Y, Park T, Choi J, Hong J-H, Kim K-J (2022) Diversifying dynamic difficulty adjustment agent by integrating player state models into Monte-Carlo tree search. Expert Syst Appl 205:117677. https://doi.org/10.1016/j.eswa.2022.117677
Article Google Scholar
official-stockfish (2008) GitHub - official-stockfish/Stockfish: UCI chess engine. https://github.com/official-stockfish/Stockfish. Accessed 18 Dec 2022
Beal DF (1990) A generalised quiescence search algorithm. Artif Intell 43(1):85–98. https://doi.org/10.1016/0004-3702(90)90072-8
Article Google Scholar

Download references

Acknowledgements

This work was supported by the Japan Society for the Promotion of Science (JSPS) KAKENHI Grant Numbers JP20K12121, JP23K11381, and JP23K17021.

Author information

Chu-Hsuan Hsueh and Kokolo Ikeda contributed equally to this work.

Authors and Affiliations

School of Information Science, Japan Advanced Institute of Science and Technology, 1-1 Asahidai, Nomi, 923-1292, Ishikawa, Japan
Chu-Hsuan Hsueh & Kokolo Ikeda

Authors

Chu-Hsuan Hsueh
View author publications
You can also search for this author in PubMed Google Scholar
Kokolo Ikeda
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization: Kokolo Ikeda and Chu-Hsuan Hsueh; Methodology: Kokolo Ikeda and Chu-Hsuan Hsueh; Software: Chu-Hsuan Hsueh and Kokolo Ikeda; Validation: Chu-Hsuan Hsueh and Kokolo Ikeda; Formal analysis: Kokolo Ikeda and Chu-Hsuan Hsueh; Investigation: Chu-Hsuan Hsueh and Kokolo Ikeda; Resources: Kokolo Ikeda and Chu-Hsuan Hsueh; Data curation: Kokolo Ikeda and Chu-Hsuan Hsueh; Writing—original draft preparation: Chu-Hsuan Hsueh; Writing—review and editing: Kokolo Ikeda; Visualization: Chu-Hsuan Hsueh and Kokolo Ikeda; Supervision: Kokolo Ikeda; Project administration: Kokolo Ikeda; Funding acquisition: Kokolo Ikeda and Chu-Hsuan Hsueh. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Chu-Hsuan Hsueh.

Ethics declarations

Ethical and informed consent for data used

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Hsueh, CH., Ikeda, K. Improvement of move naturalness for playing good-quality games with middle-level players. Appl Intell 54, 1637–1655 (2024). https://doi.org/10.1007/s10489-023-05210-2

Download citation

Accepted: 05 December 2023
Published: 13 January 2024
Issue Date: January 2024
DOI: https://doi.org/10.1007/s10489-023-05210-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions