Skip to main content

Advertisement

Log in

Improvement of move naturalness for playing good-quality games with middle-level players

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

In the game field, computer programs have surpassed top human players in many games. A well-known example is AlphaZero. These strong programs provide human players with opportunities to improve their skills. However, human players may not enjoy such strong opponents. To make middle-level players learn from playing good-quality games with strong programs, we have proposed to combine programs with distinct roles in our previous research. One role is a superhuman program that proposes and accurately evaluates candidate moves. The other role is a naturalness (or human likeness) evaluator. Candidate moves are evaluated by combining the two roles using a function, and the moves with the highest scores are played. This study builds upon our earlier work to further improve the naturalness of moves. First, we propose a search mechanism inspired by the sequential halving algorithm to decide candidate moves and the moves to play. Second, we propose a new score function to address several issues of the previous approach. We conduct experiments to compare the proposed approaches with several existing approaches. The results show that the move naturalness of he proposed approaches is greatly improved and that performance in other aspects is at least as good as existing approaches.

Graphical abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Algorithm 2
Algorithm 3
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Availability of data and materials

The game records generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

Notes

  1. This is widely known among Go players and has been verified by us with necessary data. An average Euclidean distance for a KataGo’s move to its previous move was 4.649, while that of middle-level human players was 3.426. KataGo used 1,000 simulations per move, where the program version was 20d34784703c5b4000643d3ccc43bb37d418f3b5 and the neural network version was kata1-b40c256-s9948109056-d2425397051. The middle-level human players’ data is from Table 4.

  2. KGS is an online Go server for players (including humans and computer programs) to play Go, which also assigns rankings to the players, https://www.gokgs.com.

  3. https://sjeng.org/zero/best v1.txt.zip, released along with the Leela Zero project [12]. The input of the network contains 17 binary planes of size 19\(\times \)19, where 16=8\(\times \)2 planes represent the black and white stones (\(\times \)2) for the 8 most recent board states and the remaining 1 plane to indicate which player is going to play. The output of the network contains a 362-dimensional policy for possible moves (19\(\times \)19 intersections + PASS) and a 1-dimensional value to predict the degree of advantage of the player to move. The network was trained using strong human players’ games that were openly available, referring to https://github.com/leela-zero/leela-zero/issues/628.

  4. For games other than Go, we consider that the idea of separating roles is also applicable. Taking chess as an example, one may use Stockfish [25] as the superhuman role and Maia [20] as the naturalness evaluator.

  5. The term and notation used in our previous work were ideal loss and \(l^*\). However, the word “ideal” was misleading, and “target” is more proper.

  6. https://www.gnu.org/software/gnugo/

  7. https://github.com/pasky/pachi

  8. We confirmed that moves within the top 25 generally contained those possibly considered by human players. We also confirmed that moves outside the top 25 generally need not be considered, though in some rare cases, some critical moves are not included in the top 25, as shown in Section 4.1.1. The number 25 resulted in a good balance between strength adjustment and thinking time.

  9. https://github.com/featurecat/go-dataset

  10. The program version was 20d34784703c5b4000643d3ccc43bb37d418f3b5 and the neural network was kata1-b40c256-s9948109056-d2425397051.

  11. Another way to create weaker players is to do supervised learning on weaker players’ games, as McIlroy-Young et al. [20] did. However, it is much more expensive to obtain neural networks with desired strength.

  12. With an optimistic komi of k, KataGo evaluated the advantages with k more points. Assume that we have a board state close to terminal games with two candidate moves, one leading to a win of \(+\)0.5 points and the other to a loss of −0.5 points. When doing MCTS with the normal komi, the former move’s win rate is close to 100% while the latter move’s close to 0%; thus, the latter move is rarely visited and is deleted due to the thresholds of visit counts (i.e., \(n_{max}\times R_{th}\) for P3 and 10 for P4 (line 5 in Algorithm 1)). In the experiments, k was set to 4. For the same example, KataGo evaluates the former move as a win of \(+\)4.5 and the latter move as a win of \(+\)3.5, both with win rates close to 100%; thus, the latter move has a chance of being selected.

  13. Conversely, moves’ advantages might also be underestimated with only a few simulations, and the new search mechanism of P5 could also alleviate this problem. However, this issue does not relate to playing moves with big losses and is beyond the scope of this discussion.

  14. https://en.wikipedia.org/wiki/Phi coefficient

References

  1. Silver D, Hubert T, Schrittwieser J, Antonoglou I, Lai M, Guez A, Lanctot M, Sifre L, Kumaran D, Graepel T, Lillicrap TP, Simonyan K, Hassabis D (2018) A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science 352(6419):1140–1144. https://doi.org/10.1126/science.aar6404

    Article  ADS  MathSciNet  CAS  Google Scholar 

  2. Hollosi A, Pahle M (2018) Teaching Game at Sensei’s Library. https://senseis.xmp.net/?TeachingGame. Accessed 16 Dec 2022

  3. Hsueh C-H, Ikeda K (2022) Playing good-quality games with weak players by combining programs with different roles. In: 2022 IEEE Conf. on Games (CoG), pp 612–615. https://doi.org/10.1109/CoG51982.2022.9893698

  4. Karnin Z, Koren T, Somekh O (2013) Almost optimal exploration in multi-armed bandits. In: Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research , vol 28, pp 1238–1246. https://proceedings.mlr.press/v28/karnin13.html

  5. Fabiano N, Cazenave T (2022) Sequential halving using scores. In: Lecture Notes in Computer Science, pp 41–52. https://doi.org/10.1007/978-3-031-11488-5_4

  6. van den Herik HJ, Uiterwijk JWHM, van Rijswijck J (2002) Games solved: Now and in the future. Artif Intell 134(1–2):277–311. https://doi.org/10.1016/S0004-3702(01)00152-7

    Article  Google Scholar 

  7. Coulom R (2007) Efficient selectivity and backup operators in Monte-Carlo tree search. In: Computers and Games, pp 72–83. https://doi.org/10.1007/978-3-540-75538-8_7

  8. Chaslot GMJB, Winands MHM, van den Herik HJ, Uiterwijk JWHM, Bouzy B (2008) Progressive strategies for Monte Carlo tree search. New Math Nat Comput 4(3):343–357. https://doi.org/10.1142/S1793005708001094

    Article  MathSciNet  Google Scholar 

  9. Ikeda K, Viennot S (2013) Efficiency of static knowledge bias in Monte-Carlo tree search. In: The 8th international conference on computers and games (CG 2013), pp 26–38. https://doi.org/10.1007/978-3-319-09165-5_3

  10. Silver D, Huang A, Maddison CJ, Guez A, Sifre L, van den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, Dieleman S, Grewe D, Nham J, Kalchbrenner N, Sutskever I, Lillicrap TP, Leach M, Kavukcuoglu K, Graepel T, Hassabis D (2016) Mastering the game of Go with deep neural networks and tree search. Nature 529(7587):484–489. https://doi.org/10.1038/nature16961

    Article  ADS  CAS  PubMed  Google Scholar 

  11. Tian Y, Ma J, Gong Q, Sengupta S, Chen Z, Pinkerton J, Zitnick L (2019) ELF OpenGo: An analysis and open reimplementation of AlphaZero. In: Chaudhuri K, Salakhutdinov R (eds) Proceedings of the 36th international conference on machine learning (ICML 2019). Proceedings of Machine Learning Research vol 97, pp 6244–6253. https://proceedings.mlr.press/v97/tian19a.html

  12. Leela Zero (2017) GitHub - leela-zero/leela-zero: Go engine with no human-provided knowledge, modeled after the AlphaGo Zero paper. https://github.com/leela-zero. Accessed 31 Oct 2022

  13. Wu DJ (2020) Accelerating self-play learning in Go. In: The 34th AAAI Conf. Artif. Intell. (AAAI-20). Workshop Reinforcement Learn Games. https://arxiv.org/abs/1902.10565

  14. Moschovitis P, Denisova A (2023) Keep calm and aim for the head: Biofeedback-controlled dynamic difficulty adjustment in a horror game. IEEE Trans on Games 15(3):368–377. https://doi.org/10.1109/tg.2022.3179842

  15. Shohieb SM, Doenyas C, Elhady AM (2022) Dynamic difficulty adjustment technique-based mobile vocabulary learning game for children with autism spectrum disorder. Entertain Comput 42:100495. https://doi.org/10.1016/j.entcom.2022.100495

    Article  Google Scholar 

  16. Sephton N, Cowling PI, Slaven NH (2015) An experimental study of action selection mechanisms to create an entertaining opponent. In: 2015 IEEE Conf. on Comput. Intell. and Games (CIG), pp 122–129. https://doi.org/10.1109/CIG.2015.7317939

  17. Liu A-J, Wu T-R, Wu I-C, Guei H, Wei T-H (2020) Strength adjustment and assessment for MCTS-based programs [research frontier]. IEEE Comput Intell Mag 15(3):60–73. https://doi.org/10.1109/mci.2020.2998315

  18. Nakamichi T, Ito T (2018) Adjusting the evaluation function for weakening the competency level of a computer shogi program. ICGA J 40(1):15–31. https://doi.org/10.3233/ICG-180042

    Article  Google Scholar 

  19. Rosemarin H, Rosenfeld A (2019) Playing chess at a human desired level and style. In: the 7th Int. Conf. on Human-Agent Interact., pp 76–80. https://doi.org/10.1145/3349537.3351904

  20. McIlroy-Young R, Sen S, Kleinberg J, Anderson A (2020) Aligning superhuman AI with human behavior. In: the 26th ACM SIGKDD Int. Conf. on Knowl. Discovery & Data Mining, pp 1677–1687. https://doi.org/10.1145/3394486.3403219

  21. Jacob AP, Wu DJ, Farina G, Lerer A, Hu H, Bakhtin A, Andreas J, Brown N (2022) Modeling strong and human-like gameplay with KL-regularized search. In: Proceedings of the 39th international conference on machine learning. Proceedings of Machine Learning Research, vol 162, pp 9695–9728. https://proceedings.mlr.press/v162/jacob22a.html

  22. Baier H, Sattaur A, Powley EJ, Devlin S, Rollason J, Cowling PI (2019) Emulating human play in a leading mobile card game. IEEE Trans on Games 11(4):386–395. https://doi.org/10.1109/TG.2018.2835764

    Article  Google Scholar 

  23. Shi Y, Fan T, Li W, Hsueh C-H, Ikeda K (2021) Position control and production of various strategies for game of Go using deep learning methods. J of Inf Sci Eng 37(3):553–573. https://doi.org/10.6688/JISE.202105_37(3).0004

    Article  Google Scholar 

  24. Moon J, Choi Y, Park T, Choi J, Hong J-H, Kim K-J (2022) Diversifying dynamic difficulty adjustment agent by integrating player state models into Monte-Carlo tree search. Expert Syst Appl 205:117677. https://doi.org/10.1016/j.eswa.2022.117677

    Article  Google Scholar 

  25. official-stockfish (2008) GitHub - official-stockfish/Stockfish: UCI chess engine. https://github.com/official-stockfish/Stockfish. Accessed 18 Dec 2022

  26. Beal DF (1990) A generalised quiescence search algorithm. Artif Intell 43(1):85–98. https://doi.org/10.1016/0004-3702(90)90072-8

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the Japan Society for the Promotion of Science (JSPS) KAKENHI Grant Numbers JP20K12121, JP23K11381, and JP23K17021.

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization: Kokolo Ikeda and Chu-Hsuan Hsueh; Methodology: Kokolo Ikeda and Chu-Hsuan Hsueh; Software: Chu-Hsuan Hsueh and Kokolo Ikeda; Validation: Chu-Hsuan Hsueh and Kokolo Ikeda; Formal analysis: Kokolo Ikeda and Chu-Hsuan Hsueh; Investigation: Chu-Hsuan Hsueh and Kokolo Ikeda; Resources: Kokolo Ikeda and Chu-Hsuan Hsueh; Data curation: Kokolo Ikeda and Chu-Hsuan Hsueh; Writing—original draft preparation: Chu-Hsuan Hsueh; Writing—review and editing: Kokolo Ikeda; Visualization: Chu-Hsuan Hsueh and Kokolo Ikeda; Supervision: Kokolo Ikeda; Project administration: Kokolo Ikeda; Funding acquisition: Kokolo Ikeda and Chu-Hsuan Hsueh. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Chu-Hsuan Hsueh.

Ethics declarations

Ethical and informed consent for data used

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hsueh, CH., Ikeda, K. Improvement of move naturalness for playing good-quality games with middle-level players. Appl Intell 54, 1637–1655 (2024). https://doi.org/10.1007/s10489-023-05210-2

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-05210-2

Keywords

Navigation