Adversarial neural networks for playing hide-and-search board game Scotland Yard

Dash, Tirtharaj; Dambekodi, Sahith N.; Reddy, Preetham N.; Abraham, Ajith

doi:10.1007/s00521-018-3701-0

Adversarial neural networks for playing hide-and-search board game Scotland Yard

Original Article
Published: 19 September 2018

Volume 32, pages 3149–3164, (2020)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Tirtharaj Dash ORCID: orcid.org/0000-0001-5965-8286¹,
Sahith N. Dambekodi²,
Preetham N. Reddy² &
…
Ajith Abraham³

756 Accesses
4 Citations
Explore all metrics

Abstract

This paper investigates the design of game playing agents, which should automatically play an asymmetric hide-and-search-based board game with imperfect information, called Scotland Yard. Neural network approaches have been developed to make the agents behave human-like in the sense that they would assess the game environment in a way a human would assess it. Specifically, a thorough investigation has been conducted on the application of adversarial neural network combined with Q-learning for designing the game playing agents in the game. The searchers, called detectives and the hider, called Mister X (Mr. X) have been modeled as neural network agents, which play the game of Scotland Yard. Though it is a type of two-player (or, two-sided) game, all the five detectives must cooperate to capture the hider to win the game. A special kind of feature space has been designed for both detectives and Mr. X that would aid the process of cooperation among the detectives. Rigorous experiments have been conducted, and the performance in each experiment has been noted. The evidence from the obtained results demonstrates that the designed neural agents could show promising performance in terms of learning the game, cooperating, and making efforts to win the game.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Algorithms for Mastering Board Game Nanahoshi Considering Deep Neural Networks

Cheat-FlipIt: An Approach to Modeling and Perception of a Deceptive Opponent

Playing First-Person Perspective Games with Deep Reinforcement Learning Using the State-of-the-Art Game-AI Research Platforms

Notes

\(l_i\) denotes the number of neurons in the layer i.

References

McGonigal J (2011) Reality is broken: why games make us better and how they can change the world. Penguin, London
Google Scholar
Peled N, Kraus S (2015) A study of computational and human strategies in revelation games. Auton Agents Multi Agent Syst 29(1):73–97
Article Google Scholar
Kroll M, Geiß R (2007) A ludo board game for the agtive 2007 tool contest. http://www.informatik.uni-marburg.de/~swt/agtive-contest/Ludo-Karlsruhe.pdf. Accessed 23 June 2017
Adler M, Räcke H, Sivadasan N, Sohler C, Vöcking B (2003) Randomized pursuit-evasion in graphs. Comb Probab Comput 12(03):225–244
Article MathSciNet Google Scholar
Sujit PB, Ghose D (2004) Multiple agent search of an unknown environment using game theoretical models. In: American control conference, 2004. Proceedings of the 2004, vol 6. IEEE, pp 5564–5569
Subelman EJ (1981) A hide-search game. J Appl Probab 18(03):628–640
Article MathSciNet Google Scholar
Isaacs R (1999) Differential games: a mathematical theory with applications to warfare and pursuit, control and optimization. Courier Corporation, North Chelmsford
MATH Google Scholar
Nijssen JAM, Winands MHM (2011) Monte-carlo tree search for the game of Scotland Yard. In: IEEE conference on computational intelligence and games (CIG). IEEE, pp 158–165
Coulom R (2006) Efficient selectivity and backup operators in Monte-Carlo tree search. In: International conference on computers and games. Springer, pp 72–83
Luckhart C, Irani KB (1986) An algorithmic solution of n-person games. In :AAAI, vol 86, pp 158–162
Sturtevant NR, Korf RE (2000) On pruning techniques for multi-player games. AAAI/IAAI 49:201–207
Google Scholar
Thrun S (1995) Learning to play the game of chess. In: Advances in neural information processing systems, pp. 1069–1076
Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484–489
Article Google Scholar
Kocsis L, Szepesvári C (2006) Bandit based Monte-Carlo planning. In: European conference on machine learning. Springer, pp 282–293
Nijssen P, Winands MHM (2012) Monte carlo tree search for the hide-and-seek game scotland yard. IEEE Trans Comput Intell AI Games 4(4):282–294
Article Google Scholar
Sevenster M (2006) The complexity of scotland yard. In: van Benthem J, Löwe B, Gabbay D (eds) Interactive logic, pp 209–246
Risi S, Togelius J (2015) Neuroevolution in games: state of the art and open challenges. IEEE Trans Comput Intell AI Games 9:25–41
Article Google Scholar
Li J, Kendall G (2015) A hyper-heuristic methodology to generate adaptivestrategies for games. IEEE Trans Computat Intell AI Games 9:1–10
Google Scholar
Scotland Yard (2016). https://www.boardgamegeek.com/boardgame/ 438/scotland-yard. Online; accessed 3 Nov 2016
Turing AM (1956) Can a machine think. World Math 4:2099–2123
Google Scholar
Wang D, Subagdja B, Tan A-H, Ng G-W (2009) Creating human-like autonomous players in real-time first person shooter computer games. In: Proceedings, twenty-first annual conference on innovative applications of artificial intelligence, pp 173–178
Schrum J, Karpov IV, Miikkulainen R (2011) Ut\({\hat{2}}\): human-like behavior via neuroevolution of combat behavior and replay of human traces. In: IEEE conference on computational intelligence and games (CIG). IEEE, pp 329–336
Abadi M, Andersen DG (2016) Learning to protect communications with adversarial neural cryptography. arXiv preprint arXiv:1610.06918
Richards N, Moriarty DE, Miikkulainen R (1998) Evolving neural networks to play go. Appl Intell 8(1):85–96
Article Google Scholar
Lin W, Baldi P (2008) Learning to play go using recursive neural networks. Neural Netw 21(9):1392–1400
Article Google Scholar
Clark C, Storkey A (2015) Training deep convolutional neural networks to play go. In: International conference on machine learning, pp 1766–1774
Jetal Hunt K, Sbarbaro D, Żbikowski R, Gawthrop PJ (1992) Neural networks for control systems—a survey. Automatica 28(6):1083–1112
Article MathSciNet Google Scholar
López de Lacalle LN, Lamikiz A, Sánchez JA, Arana JL (2002) Improving the surface finish in high speed milling of stamping dies. J Mater Process Technol 123(2):292–302
Article Google Scholar
Ho W-H, Tsai J-T, Lin B-T, Chou J-H (2009) Adaptive network-based fuzzy inference system for prediction of surface roughness in end milling process using hybrid taguchi-genetic learning algorithm. Expert Syst Appl 36(2):3216–3222
Article Google Scholar
Arnaiz-González Á, Fernández-Valdivielso A, Bustillo A, López de Lacalle LN (2016) Using artificial neural networks for the prediction of dimensional error on inclined surfaces manufactured by ball-end milling. Int J Adv Manuf Technol 83(5–8):847–859
Article Google Scholar
Swain RR, Khilar PM, Dash T (2018) Neural network based automated detection of link failures in wireless sensor networks and extension to a study on the detection of disjoint nodes. J Ambient Intell Hum Comput. https://doi.org/10.1007/s12652-018-0709-3
Article Google Scholar
Watkins CJCH, Dayan P (1992) Q-learning. Mach Learn 8(3—-4):279–292
MATH Google Scholar
Reddy PN, Dambekodi SN, Dash T (2017) Towards continuous monitoring of environment under uncertainty: a fuzzy granular decision tree approach
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction, vol 1. MIT press, Cambridge
MATH Google Scholar
Puterman Martin L (2014) Markov decision processes: discrete stochastic dynamic programming. Wiley, Hoboken
MATH Google Scholar
Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M et al. (2016) Tensorflow: large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467
McKinney W (2011) pandas: a foundational python library for data analysis and statistics. In: Python for high performance and scientific computing, pp. 1–9
Nair V, Hinton GE (2010) Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th international conference on machine learning (ICML-10), pp 807–814
Kingma D, Adam JB (2014) A method for stochastic optimization. arXiv preprint arXiv:1412.6980

Download references

Author information

Authors and Affiliations

Department of Computer Science and Information Systems, Birla Institute of Technology and Science Pilani, K.K. Birla Goa Campus, Sancoale, 403726, Goa, India
Tirtharaj Dash
Department of Electrical and Electronics Engineering, Birla Institute of Technology and Science Pilani, K.K. Birla Goa Campus, Sancoale, Goa, 403726, India
Sahith N. Dambekodi & Preetham N. Reddy
Machine Intelligence Research Labs (MIR Labs), Scientific Network for Innovation and Research Excellence, Washington, 98071-2259, USA
Ajith Abraham

Authors

Tirtharaj Dash
View author publications
You can also search for this author in PubMed Google Scholar
Sahith N. Dambekodi
View author publications
You can also search for this author in PubMed Google Scholar
Preetham N. Reddy
View author publications
You can also search for this author in PubMed Google Scholar
Ajith Abraham
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tirtharaj Dash.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interest in relation to the publication of this article.

Additional information

Sahith N. Dambekodi and Preetham N. Reddy have contributed equally to this work.

Appendix 1: Some additional details

1.1 Scotland Yard: the implemented version

In our present work, we implement the original Scotland Yard board game. This might bear some similarities to a game with same name that is played in Tokyo (Scotland Yard Tokyo), and in Switzerland (Scotland Yard - Swiss Edition). For additional clarification, we again provide information on the version that is being implemented in this work. There are two teams in this game. The two teams are of Mr. X. (alone) and 5 Detectives out to catch him. Mr. X. moves covertly and his position is revealed only 5 times throughout the entire game. The detectives must cleverly corner and capture Mr. X by the end of their game. Each location on the map is a node, numbered from 1 to 199. To travel between nodes, a player can use one of 3 transportation mechanisms: taxi, bus and subway, each of which has different connectivity across the map. Each of these services can be availed using tokens given to the players at the start of the game. We have not used double turns and waterways in our implementation of the game. What makes the game more interesting is that after each move by a detective, the detective would hand its used token used to Mr. X, who can use it himself in his covert movements. The overall result is more overlap between the two parties, and the detectives being able to affect the game in more than one way.

1.2 Q-learning: configuration

We have used the following configuration for the Q-Network. The detectives and Mr. X move using a static network until the turn is done. Each of their moves is stored in the replay memory. At the end of the game, the appropriate reward is given according to which a team won (either Mr. X or detective side). After the game is done and during the learning phase, the reward is back-propagated through each move made during the entire game. As we move backwards through the memory, the reward is diminished exponentially. In this way, greater weight is given to the late game turns.

Exploration is linearly decreased from 1 to 0 (as shown in above figures) depending on the total number of turns in the game. Exploration is defined as the probability that the agent will take a random move instead using the network.

The reward system is +100 for a win, −100 for a loss. For the detectives, the reward is calculated dynamically based on the distance of the detective from Mr. X when the game is over. The closer a detective is to Mr. X, during a win it will have a higher fraction of the +100 reward, and for a loss it will have a lower fraction of the −100 reward.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dash, T., Dambekodi, S.N., Reddy, P.N. et al. Adversarial neural networks for playing hide-and-search board game Scotland Yard. Neural Comput & Applic 32, 3149–3164 (2020). https://doi.org/10.1007/s00521-018-3701-0

Download citation

Received: 22 November 2017
Accepted: 10 September 2018
Published: 19 September 2018
Issue Date: April 2020
DOI: https://doi.org/10.1007/s00521-018-3701-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adversarial neural networks for playing hide-and-search board game Scotland Yard

Abstract

Access this article

Similar content being viewed by others

Algorithms for Mastering Board Game Nanahoshi Considering Deep Neural Networks

Cheat-FlipIt: An Approach to Modeling and Perception of a Deceptive Opponent

Playing First-Person Perspective Games with Deep Reinforcement Learning Using the State-of-the-Art Game-AI Research Platforms

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Appendix 1: Some additional details

1.1 Scotland Yard: the implemented version

1.2 Q-learning: configuration

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Adversarial neural networks for playing hide-and-search board game Scotland Yard

Abstract

Access this article

Similar content being viewed by others

Algorithms for Mastering Board Game Nanahoshi Considering Deep Neural Networks

Cheat-FlipIt: An Approach to Modeling and Perception of a Deceptive Opponent

Playing First-Person Perspective Games with Deep Reinforcement Learning Using the State-of-the-Art Game-AI Research Platforms

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Appendix 1: Some additional details

Appendix 1: Some additional details

1.1 Scotland Yard: the implemented version

1.2 Q-learning: configuration

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation