Abstract
This paper presents LS-VisionDraughts: an efficient unsupervised evolutionary learning system for Checkers whose contribution is to automate the process of selecting an appropriate representation for the board states – by means of Evolutionary Computation – keeping a deep look-ahead (search depth) at the moment of choosing an adequate move. It corresponds to a player Multi Layer Perceptron Neural Network whose weights are updated through an evaluation function that is automatically adjusted by means of the Temporal Difference methods. A Genetic Algorithm automatically chooses a concise and efficient set of functions, which describe various scenarios associated with Checkers – called features – to represent the board states in the input layer of the Neural Network. It means that each individual of the Genetic Algorithm is a candidate set of features that is associated to a distinct Multi Layer Perceptron Neural Network. The output layer of the Neural Network is a real number (prediction) that indicates to which extent the input state is favorable to provide a better agent performance. In LS-VisionDraughts, a particular version of the search algorithm Alpha-Beta, called fail-soft Alpha-Beta, combined with Table Transposition, Iterative Deepening and ordered tree, uses this prediction value to choose the best move corresponding to the current board state. The best individual is chosen by means of numerous tournaments involving these selfsame Neural Networks. The architecture of LS-VisionDraught is inspired on the agent NeuroDraughts. However, the former system enhances the performance of the latter by automating the selection of the features through Evolutionary Computation and by replacing its Minimax search algorithm with the improved search strategy resumed above. This procedure allows for a 95 % reduction in the search runtime. Further, it remarkably increases the search tree depth. The results obtained from evaluative tournaments confirm the advances of LS-VisionDraughts compared to its opponents. It is however important to point out that LS-VisionDraughts learns practically without human supervision, contrary to the current automatic world champion Chinook, which has been built in a strongly supervised manner.
Similar content being viewed by others
References
Abdoos M, Mozayani N, Bazzan ALC (2014) Hierarchical control of traffic signals using q-learning with tile coding. Appl Intell 40(2):201–213
Al-Khateeb B, Kendall G (2012) Effect of look-ahead depth in evolutionary checkers. J Comput Sci Tech 27(5):996–1006
Al-Khateeb B, Kendall G (2012) Introducing individual and social learning into evolutionary checkers. IEEE Trans Comput Intell AI Games 258–269
Barcelos ARA, Julia RMS, Matias R Jr (2011) D-visiondraughts: a draughts player neural network that learns by reinforcement a high performance environment. Eur Symp Artif Neural Netw Comput Intell Mach Learn
Baxter J, Trigdell A, Weaver L (1998) Knightcap: a chess program that learns by combining TD(λ) with game-tree search. In: Proceedings 15th international conference on machine learning. Morgan Kaufmann, San Francisco, pp 29–37
Breuker D, Uiterwijk J, Herik H (1994) Replacement schemes for transposition tables. Technical report. Available in: http://citeseer.ist.psu.edu/112066.html
Caexeta GS, Julia RMS (2008) A draughts learning system based on neural networks and temporal differences: the impact of an efficent tree-search algorithm. In: The 19th Brazilian symposium on artificial intelligence, SBIA, LNAI series of Springer-Verlag
Campos P, Langlois T (2003) Abalearn: Efficient self-play learning of the game abalone. In: INESC-ID, neural networks and signal processing group
Cheheltani SH, Ebadzadeh MM (2012) Immune based fuzzy agent plays checkers game. Appl Soft Comput 12(8):2227–2236
Darwen PJ (2001) Why co-evolution beats temporal difference learning at backgammon for a linear architecture, but not a non-linear architecture. In: Proceedings of the 2001 congress on evolutionary computation CEC2001. IEEE Press, pp 1003– 1010
Derrac J, Garcia S, Molina D, Herrera F (2011) A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evol Comput 1(1):3–18
Duarte VAR, Julia RMS (2009) Mp-draughts: a multiagent reiforcement learning system based on mpl and kohonen-som neural networks. In: IEEE international conference on systems, man and cybernetics, pp 2270–2275
Dutta PK (1999) Strategies and games: theory and practice. MIT Press, Cambridge
Eiben AE, Smith JE (2003) Introduction to evolutionary computing. Springer
Epstein S (2001) Learning to play expertly: a tutorial on hoyle. In: Machines that learn to play games. Nova Science Publishers, Huntington
Fierz MC (2008) Cake 1.85. Technical report. Available in: http://www.fierz.ch/checkers.htm
Fierz MC (2012) Checkerboard program - version 1.72. Technical report. Available in http://www.fierz.ch/checkerboard.php
Fogel DB (2002) Blondie24: playing at the edge of AI. Morgan Kaufmann, San Francisco
Fogel DB, Chellapilla K (2001) Verifying anaconda’s expert rating by competing against chinook: experiments in co-evolving a neural checkers player. Neurocomputing 42(1–4):69–86
Fogel DB, Hays TJ, Hahn SL, Quon J (2004) A self-learning evolutionary chess program. Proc IEEE 92(12):1947–1954
Gilbert E (2000) Kingsrow. Technical report. Available in: http://edgilbert.org/Checkers/KingsRow.htm
Haykin S (1998) Neural networks: a comprehensive foundation, 2ł edn. Printice Hall
Herik HJV, Uiterwijk JW, Rijswijck JV (2002) Games solved: now and in the future. Artif Intell 134(1-2):277–311
Holland JH (1992) Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence, 2ł edn. MIT Press
Kim KJ, Cho SB (2005) Systematically incorporating domain-specific knowledge into evolutionary speciated checkers players. IEEE Trans Evol Comput 9(6):615–627
Kortenkamp D, Bonasso RP, Murphy R (1988) AI- based mobile robots: case studies of successful robot systems. MIT Press
Leouski A (1995) Learning of position evaluation in the game of othello. Technical Report. Available in: http://people.ict.usc.edu/leuski/publications
Levinson R, Weber R (2002) Chess neighborhoods, function combination, and reinforcement learning. In: Revised papers from the second international conference on computers and games. Springer, London
Lynch M (1997) An application of temporal difference learning to draughts. Master’s Thesis, University of Limerick, Ireland
Lynch M, Griffith N (1997) Neurodraughts: the role of representation, search, training regime and architecture in a td draughts player. In: 8th Ireland conference on artificial intelligence, pp 67–72
Marsland TA (1986) A review of game-tree pruning. In: International computer chess association journal, pp 3–19
McCarthy JL, Feigenbaum EA (1990) In memoriam - Arthur Samuel: pioneer in machine learning. AI Mag 11(3):10–11
McCulloch W, Pitts W (1943) A logical calculus of the ideas immanent in nervous activity. Bull Math Biophys 5:115–133
Mendonca M, Arruda LVR, Junior FN (2012) Autonomous navigation system using event driven-fuzzy cognitive maps. Appl Intell 37(2):175–188
Millington I (2006) Artificial intelligence for games. Morgan Kaufmann, San Francisco
Neto HC, Julia RMS (2007) Ls-draughts - a draughts learning system based on genetic algorithms, neural network and temporal differences. In: Proceedings of the IEEE congress on evolutionary computation, CEC 2007. Singapore, pp 2523– 2529
Neto HC, Julia RMS, Caexeta GS (2009) Theory and novel application of machine learning. chap. In: LSDraughts: using databases to treat endgame loop in a hybrid evolutionary learning system. I-Tech Education and Publishing
Neumann JV, Morgenstern O (1944) Theory of games and economic behavior. Princeton University Press. Available in: http://en.wikipedia.org/wiki/Theory_of_Games_and_Economic_Behavior
Plaat A (1996) Research re: search & re-search. Ph.D. Thesis, Rotterdam, The Netherlands
Plaat A, Schaeffer J, Pijls W, Bruin A (1995) A new paradigm for minimax search
Ribeiro CHC, Monteiro ST (2003) Navigation learning in map building for autonomous mobile robots. In: 4th National meeting of artificial intelligence (ENIA). Article title translated from original version in portuguese
Richards N, Moriarty DE, Miikkulainen R (1998) Evolving neural networks to play go. Appl Intell 8(1):85–96
Russell S, Norvig P (2003) Artificial intelligence: a modern approach, 2ł edn. Prentice Hall
Salcedo-Sanz S, Matias-Roman JM, Jimenez-Fernandez S, Portilla-Figueras A, Cuadra L (2013) An evolutionary-based hyper-heuristic approach for the jawbreaker puzzle. Appl Intell 1–11
Samuel AL (1959) Some studies in machine learning using the game of checkers. IBM J Res Dev 3(3):210–229
Samuel AL (1967) Some studies in machine learning using the game of checkers II. IBM J Res Dev 11(6):601–617
Schaeffer J (2002) Applying the experience of building a high performance search engine for one domain to another
Schaeffer J, Hlynka M, Jussila V (2001) Temporal difference learning applied to a high performance game-playing program. In: International joint conference on artificial intelligence, pp 529–534
Schaeffer J, Lake R, Lu P, Bryant M (1996) Chinook: the world man-machine checkers champion. AI Mag 17(1):21–30
Schraudolph NN, Dayan P, Sejnowski TJ (2001) Learning to evaluate go positions via temporal difference methods. In: Computational intelligence in games studies in fuzziness and soft computing, vol 62. Springer
Shams R, Kaindl H, Horacek H (1991) Using aspiration windows for minimax algorithms. In: Proceedings of the 12th international joint conference on artificial intelligence. Morgan Kaufmann, pp 192–197
Silver D, Sutton RS, Muller M (2012) Temporal-difference search in computer go. Mach Learn 87(2):183–219
Slate DJ, Atkin LR (1977) Chess skill in man and machine. Springer
Stevanovic R (2007) Quantum random bit generator service. Technical report. Available in: http://random.irb.hr
Sutton RS (1988) Learning to predict by the methods of temporal differences. Mach Learn 3(1):9–44
Tesauro G (1992) Practical issues in temporal difference learning. In: Advances in neural information processing systems, vol 4. Morgan Kaufmann, pp 259–266
Tesauro G (1994) Td-gammon, a self-teaching backgammon program, achieves master-level play. Neural Comput 6(2):215–219
Tesauro G (1995) Temporal difference learning and td-gammon. Commun ACM 38(3):58–68
Thrun S (1995) Learning to play the game of chess. In: Advances in neural information processing systems 7. The MIT Press, pp 1069–1076
Tucker AW (1950) Prisioner’s dilema problem. Technical report. Available in: http://www.answers.com/topic/prisoner-s-dilemma
Walker MA (2000) An application of reinforcement learning to dialogue strategy in a spoken dialogue system for email. Artif Intell Res 12:387–416
Burgard W, Fox D, Jans H, Matenar C, Thrun S (1999) Sonar-based mapping with mobile robots using em. In: Proceedings 16th international conference on machine learning
Wiering M (2000) Multi-agent reinforcement learning for traffic light control. In: 17th international conference on machine learning, pp 1151–1158
Zobrist AL (1969) A hashing method with applications for game playing. Technical report
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Neto, H.C., Julia, R.M.S., Caexeta, G.S. et al. LS-VisionDraughts: improving the performance of an agent for checkers by integrating computational intelligence, reinforcement learning and a powerful search method. Appl Intell 41, 525–550 (2014). https://doi.org/10.1007/s10489-014-0536-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-014-0536-y