Skip to main content
Log in

Adversarial neural networks for playing hide-and-search board game Scotland Yard

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

This paper investigates the design of game playing agents, which should automatically play an asymmetric hide-and-search-based board game with imperfect information, called Scotland Yard. Neural network approaches have been developed to make the agents behave human-like in the sense that they would assess the game environment in a way a human would assess it. Specifically, a thorough investigation has been conducted on the application of adversarial neural network combined with Q-learning for designing the game playing agents in the game. The searchers, called detectives and the hider, called Mister X (Mr. X) have been modeled as neural network agents, which play the game of Scotland Yard. Though it is a type of two-player (or, two-sided) game, all the five detectives must cooperate to capture the hider to win the game. A special kind of feature space has been designed for both detectives and Mr. X that would aid the process of cooperation among the detectives. Rigorous experiments have been conducted, and the performance in each experiment has been noted. The evidence from the obtained results demonstrates that the designed neural agents could show promising performance in terms of learning the game, cooperating, and making efforts to win the game.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. \(l_i\) denotes the number of neurons in the layer i.

References

  1. McGonigal J (2011) Reality is broken: why games make us better and how they can change the world. Penguin, London

    Google Scholar 

  2. Peled N, Kraus S (2015) A study of computational and human strategies in revelation games. Auton Agents Multi Agent Syst 29(1):73–97

    Article  Google Scholar 

  3. Kroll M, Geiß R (2007) A ludo board game for the agtive 2007 tool contest. http://www.informatik.uni-marburg.de/~swt/agtive-contest/Ludo-Karlsruhe.pdf. Accessed 23 June 2017

  4. Adler M, Räcke H, Sivadasan N, Sohler C, Vöcking B (2003) Randomized pursuit-evasion in graphs. Comb Probab Comput 12(03):225–244

    Article  MathSciNet  Google Scholar 

  5. Sujit PB, Ghose D (2004) Multiple agent search of an unknown environment using game theoretical models. In: American control conference, 2004. Proceedings of the 2004, vol 6. IEEE, pp 5564–5569

  6. Subelman EJ (1981) A hide-search game. J Appl Probab 18(03):628–640

    Article  MathSciNet  Google Scholar 

  7. Isaacs R (1999) Differential games: a mathematical theory with applications to warfare and pursuit, control and optimization. Courier Corporation, North Chelmsford

    MATH  Google Scholar 

  8. Nijssen JAM, Winands MHM (2011) Monte-carlo tree search for the game of Scotland Yard. In: IEEE conference on computational intelligence and games (CIG). IEEE, pp 158–165

  9. Coulom R (2006) Efficient selectivity and backup operators in Monte-Carlo tree search. In: International conference on computers and games. Springer, pp 72–83

  10. Luckhart C, Irani KB (1986) An algorithmic solution of n-person games. In :AAAI, vol 86, pp 158–162

  11. Sturtevant NR, Korf RE (2000) On pruning techniques for multi-player games. AAAI/IAAI 49:201–207

    Google Scholar 

  12. Thrun S (1995) Learning to play the game of chess. In: Advances in neural information processing systems, pp. 1069–1076

  13. Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484–489

    Article  Google Scholar 

  14. Kocsis L, Szepesvári C (2006) Bandit based Monte-Carlo planning. In: European conference on machine learning. Springer, pp 282–293

  15. Nijssen P, Winands MHM (2012) Monte carlo tree search for the hide-and-seek game scotland yard. IEEE Trans Comput Intell AI Games 4(4):282–294

    Article  Google Scholar 

  16. Sevenster M (2006) The complexity of scotland yard. In: van Benthem J, Löwe B, Gabbay D (eds) Interactive logic, pp 209–246

  17. Risi S, Togelius J (2015) Neuroevolution in games: state of the art and open challenges. IEEE Trans Comput Intell AI Games 9:25–41

    Article  Google Scholar 

  18. Li J, Kendall G (2015) A hyper-heuristic methodology to generate adaptivestrategies for games. IEEE Trans Computat Intell AI Games 9:1–10

    Google Scholar 

  19. Scotland Yard (2016). https://www.boardgamegeek.com/boardgame/ 438/scotland-yard. Online; accessed 3 Nov 2016

  20. Turing AM (1956) Can a machine think. World Math 4:2099–2123

    Google Scholar 

  21. Wang D, Subagdja B, Tan A-H, Ng G-W (2009) Creating human-like autonomous players in real-time first person shooter computer games. In: Proceedings, twenty-first annual conference on innovative applications of artificial intelligence, pp 173–178

  22. Schrum J, Karpov IV, Miikkulainen R (2011) Ut\({\hat{2}}\): human-like behavior via neuroevolution of combat behavior and replay of human traces. In: IEEE conference on computational intelligence and games (CIG). IEEE, pp 329–336

  23. Abadi M, Andersen DG (2016) Learning to protect communications with adversarial neural cryptography. arXiv preprint arXiv:1610.06918

  24. Richards N, Moriarty DE, Miikkulainen R (1998) Evolving neural networks to play go. Appl Intell 8(1):85–96

    Article  Google Scholar 

  25. Lin W, Baldi P (2008) Learning to play go using recursive neural networks. Neural Netw 21(9):1392–1400

    Article  Google Scholar 

  26. Clark C, Storkey A (2015) Training deep convolutional neural networks to play go. In: International conference on machine learning, pp 1766–1774

  27. Jetal Hunt K, Sbarbaro D, Żbikowski R, Gawthrop PJ (1992) Neural networks for control systems—a survey. Automatica 28(6):1083–1112

    Article  MathSciNet  Google Scholar 

  28. López de Lacalle LN, Lamikiz A, Sánchez JA, Arana JL (2002) Improving the surface finish in high speed milling of stamping dies. J Mater Process Technol 123(2):292–302

    Article  Google Scholar 

  29. Ho W-H, Tsai J-T, Lin B-T, Chou J-H (2009) Adaptive network-based fuzzy inference system for prediction of surface roughness in end milling process using hybrid taguchi-genetic learning algorithm. Expert Syst Appl 36(2):3216–3222

    Article  Google Scholar 

  30. Arnaiz-González Á, Fernández-Valdivielso A, Bustillo A, López de Lacalle LN (2016) Using artificial neural networks for the prediction of dimensional error on inclined surfaces manufactured by ball-end milling. Int J Adv Manuf Technol 83(5–8):847–859

    Article  Google Scholar 

  31. Swain RR, Khilar PM, Dash T (2018) Neural network based automated detection of link failures in wireless sensor networks and extension to a study on the detection of disjoint nodes. J Ambient Intell Hum Comput. https://doi.org/10.1007/s12652-018-0709-3

    Article  Google Scholar 

  32. Watkins CJCH, Dayan P (1992) Q-learning. Mach Learn 8(3—-4):279–292

    MATH  Google Scholar 

  33. Reddy PN, Dambekodi SN, Dash T (2017) Towards continuous monitoring of environment under uncertainty: a fuzzy granular decision tree approach

  34. Sutton RS, Barto AG (1998) Reinforcement learning: an introduction, vol 1. MIT press, Cambridge

    MATH  Google Scholar 

  35. Puterman Martin L (2014) Markov decision processes: discrete stochastic dynamic programming. Wiley, Hoboken

    MATH  Google Scholar 

  36. Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M et al. (2016) Tensorflow: large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467

  37. McKinney W (2011) pandas: a foundational python library for data analysis and statistics. In: Python for high performance and scientific computing, pp. 1–9

  38. Nair V, Hinton GE (2010) Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th international conference on machine learning (ICML-10), pp 807–814

  39. Kingma D, Adam JB (2014) A method for stochastic optimization. arXiv preprint arXiv:1412.6980

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tirtharaj Dash.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interest in relation to the publication of this article.

Additional information

Sahith N. Dambekodi and Preetham N. Reddy have contributed equally to this work.

Appendix 1: Some additional details

Appendix 1: Some additional details

1.1 Scotland Yard: the implemented version

In our present work, we implement the original Scotland Yard board game. This might bear some similarities to a game with same name that is played in Tokyo (Scotland Yard Tokyo), and in Switzerland (Scotland Yard - Swiss Edition). For additional clarification, we again provide information on the version that is being implemented in this work. There are two teams in this game. The two teams are of Mr. X. (alone) and 5 Detectives out to catch him. Mr. X. moves covertly and his position is revealed only 5 times throughout the entire game. The detectives must cleverly corner and capture Mr. X by the end of their game. Each location on the map is a node, numbered from 1 to 199. To travel between nodes, a player can use one of 3 transportation mechanisms: taxi, bus and subway, each of which has different connectivity across the map. Each of these services can be availed using tokens given to the players at the start of the game. We have not used double turns and waterways in our implementation of the game. What makes the game more interesting is that after each move by a detective, the detective would hand its used token used to Mr. X, who can use it himself in his covert movements. The overall result is more overlap between the two parties, and the detectives being able to affect the game in more than one way.

1.2 Q-learning: configuration

We have used the following configuration for the Q-Network. The detectives and Mr. X move using a static network until the turn is done. Each of their moves is stored in the replay memory. At the end of the game, the appropriate reward is given according to which a team won (either Mr. X or detective side). After the game is done and during the learning phase, the reward is back-propagated through each move made during the entire game. As we move backwards through the memory, the reward is diminished exponentially. In this way, greater weight is given to the late game turns.

Exploration is linearly decreased from 1 to 0 (as shown in above figures) depending on the total number of turns in the game. Exploration is defined as the probability that the agent will take a random move instead using the network.

The reward system is  +100 for a win, −100 for a loss. For the detectives, the reward is calculated dynamically based on the distance of the detective from Mr. X when the game is over. The closer a detective is to Mr. X, during a win it will have a higher fraction of the  +100 reward, and for a loss it will have a lower fraction of the −100 reward.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dash, T., Dambekodi, S.N., Reddy, P.N. et al. Adversarial neural networks for playing hide-and-search board game Scotland Yard. Neural Comput & Applic 32, 3149–3164 (2020). https://doi.org/10.1007/s00521-018-3701-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-018-3701-0

Keywords

Navigation