Abstract
This paper proposes a new approach for the non-supervised learning process of multiagent player systems operating in a high performance environment, being that the cooperative agents are trained so as to be expert in specific stages of a game. This proposal is implemented by means of the Checkers automatic player denominated D-MA-Draughts, which is composed of 26 agents. The first is specialized in initial and intermediary game stages, whereas the remaining are specialists in endgame stages (defined by board-games containing, at most, 12 pieces). Each of these agents consists of a Multilayer Neural Network, trained without human supervision through Temporal Difference Methods. The best move is determined by the distributed search algorithm known as Young Brothers Wait Concept. Each endgame agent is able to choose a move from a determined profile of endgame board. These profiles are defined by a clustering process performed by a Kohonen-SOM network from a database containing endgame boards retrieved from real matches. Once trained, the D-MA-Draughts agents can actuate in a match according to two distinct game dynamics. In fact, the D-MA-Draughts architecture corresponds to an extension of two preliminary versions: MP-Draughts, which is a multiagent system with a serial search algorithm, and D-VisionDraughts, which is a single agent with a distributed search algorithm. The D-MA-Draughts gains are estimated through several tournaments against these preliminary versions. The results show that D-MA-Draughts improves upon its predecessors by significantly reducing training time and the endgame loops, thus beating them in several tournaments.
Similar content being viewed by others
References
(2012) The message passing interface (mpi) standard. http://www.mcs.anl.gov/research/projects/mpi/
Barcelos ARA, Julia RMS, Matias R Jr (2011) D-visiondraughts: a draughts player neural network that learns by reinforcement a high performance environment European symposium on artificial neural networks, computational intelligence and machine learning
Baxter J, Trigdell A, Weaver L (1998) Knightcap: a chess program that learns by combining TD(λ) with game-tree search Proceedings of the 15th international conference on machine learning. Morgan Kaufmann, San Francisco, CA, pp 29–37
Brockington MG, Schaeffer J (2000) APHID: asynchronous parallel game-tree search. J Parallel Distrib Comput 60(2):247–273
Caexeta GS (2008) Visiondraughts - um sistema de aprendizagem de jogos de damas baseado em redes neurais, diferenças temporais, algoritmos eficientes de busca em árvores e informações perfeitas contidas em bases de dados. Master’s thesis, Federal University of Uberlandia, Uberlandia, Brazil
Caexeta GS, Julia RMS (2008) A draughts learning system based on neural networks and temporal differences: the impact of an efficient tree-search algorithm The 19th Brazilian symposium on artificial intelligence, SBIA, LNAI series of Springer-Verlag
Campos P, Langlois T (2003) Abalearn: efficient self-play learning of the game abalone INESC-ID, neural networks and signal processing group
Cao Y, Yu W, Ren W, Chen G (2013) An overview of recent progress in the study of distributed multi-agent coordination. IEEE Trans Ind Inf 9(1):427–438
Chellapilla K, Fogel DB (2000) Anaconda defeats hoyle 6-0: a case study competing an evolved checkers program against commercially available software Proceedings of the 2000 congress on evolutionary computation CEC00, pp 857–863
Chellapilla K, Fotel DB (2001) Evolving an expert checkers playin program without using human expertise. IEEE Trans Evol Comput 5(4):422–428
Duarte VAR, Julia RMS (2012) Mp-draughts: ordering the search tree and refining the game board representation to improve a multi-agent system for draughts IEEE international conference on tools with artificial intelligence (ICTAI)
Duarte VAR, Julia RMS, Barcelos ARA, Otsuka AB (2009) Mp-draughts: a multiagent reinforcement learning system based on mlp and kohonen-som neural networks IEEE international conference on systems, man, and cybernetics
Epstein SL (2001) Learning to play expertly: a tutorial on Hoyle. pp 153–178
Fierz MC (2016) Cake informations. http://www.fierz.ch/cake.php (disponível em 22/11/2016)
Fogel DB, Chellapilla K (2001) Verifying anaconda’s expert rating by competing against chinook: experiments in co-evolving a neural checkers player. Neurocomputing 42(1–4):69–86
Golpayegani F, Dusparic I, Taylor A, Clarke S (2016) Multi-agent collaboration for conflict management in residential demand response. Comput Commun 96:63–72
Grossberg S (1976) Adaptive pattern classification and universal recoding: I. Parallel development and coding of neural feature detectors. Biol Cybern 23(3):121–134
Hadzibeganovic T, Xia C (2016) Cooperation and strategy coexistence in a tag-based multi-agent system with contingent mobility. Knowl-Based Syst 112(Complete):1–13. doi:10.1016/j.knosys.2016.08.024
Haykin S (1998) Neural networks: a comprehensive foundation, 2nd edn. Printice Hall
van den Herik HJ, Uiterwijk JWHM, van Rijswijck J (2002) Games solved: now and in the future. Artif Intell 134(1-2):277–311
Kohonen T (1990) The self-organizing map. Proc IEEE 78:1464–1480
Kohonen T (2001) Self-organizing maps. Springer
Leouski A (1995) Learning of position evaluation in the game of othello. Tech. rep., available in: http://people.ict.usc.edu/leuski/publications
Levinson R, Weber R (2002) Chess neighborhoods, function combination, and reinforcement learning Revised papers from the second international conference on computers and games. Springer, London, UK
Liu H, Zhang P, Hu B, Moore P (2015) A novel approach to task assignment in a cooperative multi-agent design system. Appl Intell 43(1):162–175
Lu CPP (1993) Parallel search of narrow game trees. Master’s thesis, University of Alberta
Lynch M (1997) Neurodraughts: an application of temporal difference learning to draughts. Master’s thesis, University of Limerick, Ireland
Lynch M, Griffith N (1997) Neurodraughts: the role of representation, search, training regime and architecture in a td draughts player Eighth Ireland conference on artificial intelligence, pp 67–72
Manohararajah V (2001) Parallel alpha-beta search on shared memory multiprocessors. Master’s thesis, University of Toronto
McCulloch W, Pitts W (1943) A logical calculus of the ideas immanent in nervous activity. Bull Math Biophys 5:115– 133
Millington I (2006) Artificial intelligence for games. Morgan Kaufmann Publishers Inc, San Francisco, CA, USA
Moulin B, Chaib-Draa B (1996) An overview of distributed artificial intelligence Foundations of distributed artificial intelligence, Wiley
Neto HC (2007) Ls-draughts - um sistema de aprendizagem de jogos de damas baseado em algoritmos genéticos, redes neurais e diferenças temporais. Master’s thesis, Federal University of Uberlandia, Uberlandia, Brazil
Neto HC, Julia RMS (2007) Ls-draughts - a draughts learning system based on genetic algorithms, neural network and temporal differences Proceedings of the IEEE congress on evolutionary computation, CEC 2007, Singapore, pp 2523–2529
Neto HC, Julia RMS, Caexeta GS, Barcelos ARA (2014) Ls-visiondraughts: improving the performance of an agent for checkers by integrating computational intelligence, reinforcement learning and a powerful search method. Appl Intell 41(2):525–550. doi:10.1007/s10489-014-0536-y
Neumann JV, Morgenstern O (1944) Theory of games and economic behavior. teste
Feldmann PMOVB Monien R (1990) Distributed game tree search. In: Kumar V, Kanal LN, Gopalakrishnan PS (eds) Parallel algorithms for machine intelligence and vision, Springer, pp 66–101
Rosaci D (2007) Cilios: connectionist inductive learning and inter-ontology similarities for recommending information agents. Inf Syst 32(6):793–825
Rosaci D, Sarné GML (2011) Eva: an evolutionary approach to mutual monitoring of learning information agents. Appl Artif Intell 25(5):341–361
Rosaci D, Sarné GML (2013) Cloning mechanisms to improve agent performances. J Netw Comput Appl 36(1):402–408
Russell S, Norvig P (2003) Artificial intelligence: a modern approach, 2nd edn. Prentice Hall
Samuel AL (1959) Some studies in machine learning using the game of checkers. IBM J Res Dev 3(3):210–229
Samuel AL (1967) Some studies in machine learning using the game of checkers ii. IBM J Res Dev 11 (6):601–617
Schaeffer J (1992) man versus machine: the silicon graphics world checkers championship
Schaeffer J, Culberson J, Treloar N, Knight B, Lu P, Szafron D (1992) A world championship caliber checkers program. Artif Intell 53(2-3):273–289
Schaeffer J, Lake R, Lu P, Bryant M (1996) Chinook: the world man-machine checkers champion. AI Mag 17(1):21–30
Schaeffer J, Hlynka M, Jussila V (2001) Temporal difference learning applied to a high performance game-playing program International joint conference on artificial intelligence, pp 529–534
Schaeffer J, Burch N, Bjornsson Y, Kishimoto A, Muller M, Lake R, Lu P, Sutphen S (2007) Checkers is solved. Science Express 328(5844):1518
Schraudolph NN, Dayan P, Sejnowski TJ (2001) Learning to evaluate go positions via temporal difference methods Computational intelligence in games studies in fuzziness and soft computing, vol 62. Springer
Sutton RS (1988) Learning to predict by the methods of temporal differences. Mach Learn 3(1):9–44
Tesauro G (1994) Td-gammon, a self-teaching backgammon program, achieves master-level play. Neural Comput 6(2):215–219
Thrun S (1995) Learning to play the game of chess Advances in neural information processing systems, vol 7. The MIT Press, pp 1069–1076
Tomaz LBP, Julia RMS, Barcelos ARA (2013) Improving the accomplishment of a neural network based agent for draughts that operates in a distributed learning environment IEEE 14th international conference on information reuse and integration
WInands MHM (2004) Informed search in complex games. PhD thesis, Maastricht University
Wooldridge M (2009) An introduction to multiagent systems, 2nd edn. Wiley, New York, NY, USA
Woolridge M, Wooldridge M J (2001) Introduction to multiagent systems. Wiley, New York, NY, USA
Zobrist A L (1969) A hashing method with applications for game playing. Tech rep, University of Wisconsin, Wisconsin
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Paiva Tomaz, L.B., Silva Julia, R.M. & Duarte, V.A. A multiagent player system composed by expert agents in specific game stages operating in high performance environment. Appl Intell 48, 1–22 (2018). https://doi.org/10.1007/s10489-017-0952-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-017-0952-x