skip to main content
10.1145/3489525.3511685acmconferencesArticle/Chapter ViewAbstractPublication PagesicpeConference Proceedingsconference-collections
short-paper

The Cost of Reinforcement Learning for Game Engines: The AZ-Hive Case-study

Published: 09 April 2022 Publication History

Abstract

Although utilising computers to play board games has been a topic of research for many decades, the recent rapid developments in the field of reinforcement learning - like AlphaZero and variants - brought unprecedented progress in games such as chess and Go. However, the efficiency of this process remains unknown. In this work, we analyse the cost and efficiency of the AlphaZero approach when building a new game engine. Thus, we present our experience building AZ-Hive, an AlphaZero-based playing engine for the game of Hive. Using only the rules of the game and a quality of play assessment, AZ-Hive learns to play the game from scratch. Getting AZ-Hive up and running requires encoding the game in AlphaZero, i.e., capturing the board, the game state, the rules and the assessment of play-quality. And different encodings lead to significantly different AZ-Hive engines, with very different performance results. Thus, we propose a design space for configuring AZ-Hive, and demonstrate the costs and benefits of different configurations in this space. We find that different configurations lead to a less or more competitive playing-engine, but the training and evaluation for different such engines is prohibitively expensive. Moreover, no systematic, efficient exploration or pruning of the space is possible. In turn, an exhaustive exploration can easily take tens of training-years.

References

[1]
[n. d.]. Hive: The Game. Gen42. https://www.gen42.com/games/hive Accessed: 2020-04--15.
[2]
Bruno Bouzy and Tristan Cazenave. 2001. Computer Go: An AI oriented survey. Artificial Intelligence 132, 1 (oct 2001), 39--103. https://doi.org/10.1016/s0004- 3702(01)00127--8
[3]
J. Burmeister and J. Wiles. 1995. The challenge of Go as a domain for AI research: a comparison between Go and chess. In Proceedings of Third Australian and New Zealand Conference on Intelligent Information Systems. ANZIIS-95. IEEE. https://doi.org/10.1109/anziis.1995.705737
[4]
Tristan Cazenave, Yen-Chi Chen, Guan-Wei Chen, et al. 2021. Polygames: Improved zero learning. ICGA Journal 42, 4 (jan 2021). https://doi.org/10.3233/icg- 200157
[5]
R. Coulom. 2010. BayesElo. https://www.remi-coulom.fr/Bayesian-Elo/ Accessed: 2020-06--13.
[6]
Danilo de Goede. 2021. Enhancing a Hive Playing Engine with Reinforcement Learning. B.S. thesis.
[7]
Arpad E Elo. 1978. The rating of chessplayers, past and present. Arco Pub.
[8]
Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning. PMLR, 448--456.
[9]
Duncan Kampert. 2021. Creating and optimizing a game-playing program for Hive. (July 2021).
[10]
Sangyeob Kim, Juhyoung Lee, Sanghoon Kang, et al. 2021. PNPU: An Energy- Efficient Deep-Neural-Network Learning Processor With Stochastic Coarse--Fine Level Weight Pruning and Adaptive Input/Output/Weight Zero Skipping. IEEE Solid-State Circuits Letters 4 (2021), 22--25.
[11]
C. Piech. 2013. Deep Blue. Stanford University. https://stanford.edu/~cpiech/ cs221/apps/deepBlue.html Accessed: 2020-04-01.
[12]
Aske Plaat. 2020. Self-Play. In Learning to Play. Springer International Publishing, 195--232. https://doi.org/10.1007/978--3-030--59238--7_7
[13]
Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, et al. 2020. Mastering Atari, Go, chess and shogi by planning with a learned model. Nature 588, 7839 (dec 2020), 604--609. https://doi.org/10.1038/s41586-020-03051--4
[14]
Roy Schwartz, Jesse Dodge, Noah A. Smith, and Oren Etzioni. 2019. Green AI. CoRR abs/1907.10597 (2019). arXiv:1907.10597 http://arxiv.org/abs/1907.10597
[15]
Claude E. Shannon. 1950. XXII. Programming a computer for playing chess. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 41, 314 (1950), 256--275. https://doi.org/10.1080/14786445008521796
[16]
David Silver, Aja Huang, Chris J. Maddison, et al. 2016. Mastering the game of Go with deep neural networks and tree search. Nature 529, 7587 (jan 2016), 484--489. https://doi.org/10.1038/nature16961
[17]
David Silver, Thomas Hubert, Julian Schrittwieser, et al. 2017. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm. arXiv:1712.01815 [cs.AI]
[18]
David Silver, Thomas Hubert, Julian Schrittwieser, et al. 2018. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science 362, 6419 (dec 2018), 1140--1144. https://doi.org/10.1126/science.aar6404
[19]
David Silver, Julian Schrittwieser, Karen Simonyan, et al. 2017. Nature 550, 7676 (oct 2017), 354--359. https://doi.org/10.1038/nature24270
[20]
Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, and Joel Emer. 2017. Efficient processing of deep neural networks: A tutorial and survey. Proc. IEEE (2017).
[21]
Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, and Joel Emer. 2020. Efficient Processing of Deep Neural Networks. Morgan and Claypool Publishers.
[22]
Hui Wang, Michael Emmerich, Mike Preuss, and Aske Plaat. 2019. Hyperparameter sweep on alphazero general. arXiv preprint arXiv:1903.08129 (2019).
[23]
Yang Wang, Yubin Qin, Leibo Liu, Shaojun Wei, and Shouyi Yin. 2021. HPPU: An Energy-Efficient Sparse DNN Training Processor with Hybrid Weight Pruning. In 2021 IEEE AICAS. 1--4. https://doi.org/10.1109/AICAS51828.2021.9458410
[24]
Yuxin Wang, Qiang Wang, Shaohuai Shi, et al. 2020. Benchmarking the Performance and Energy Efficiency of AI Accelerators for AI Training. In 2020 IEEE/ACM CCGRID. 744--751. https://doi.org/10.1109/CCGrid49817.2020.00--15
[25]
Wikipedia contributors. 2021. Hive (game) - Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=Hive_(game)&oldid=1007191566

Cited By

View all
  • (2023)Advancements in Artificial Intelligence Circuits and Systems (AICAS)Electronics10.3390/electronics1301010213:1(102)Online publication date: 26-Dec-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICPE '22: Proceedings of the 2022 ACM/SPEC on International Conference on Performance Engineering
April 2022
242 pages
ISBN:9781450391436
DOI:10.1145/3489525
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 April 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. alphazero: hive
  2. computational cost
  3. design space exploration
  4. energy efficiency
  5. game-playing engines
  6. reinforcement learning

Qualifiers

  • Short-paper

Conference

ICPE '22

Acceptance Rates

ICPE '22 Paper Acceptance Rate 14 of 58 submissions, 24%;
Overall Acceptance Rate 252 of 851 submissions, 30%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)17
  • Downloads (Last 6 weeks)4
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Advancements in Artificial Intelligence Circuits and Systems (AICAS)Electronics10.3390/electronics1301010213:1(102)Online publication date: 26-Dec-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media