Abstract
Learning a strategy that maximises total reward in a multi-agent system is a hard problem when it depends on other agents’ strategies. Many previous approaches consider opponents which are reactive and memoryless. In this paper, we use sequence prediction algorithms to perform opponent modelling in two-player games, to model opponents with memory. We argue that to compete with opponents with memory, lookahead is required. We combine these algorithms with reinforcement learning and lookahead action selection, allowing them to find strategies that maximise total reward up to a limited depth. Experiments confirm lookahead is required, and show these algorithms successfully model and exploit opponent strategies with different memory lengths. The proposed approach outperforms popular and state-of-the-art reinforcement learning algorithms in terms of learning speed and final performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Watkins, C.J.C.H.: Learning from delayed rewards. PhD thesis, Cambridge (1989)
Brown, G.: Iterative Solutions of Games by Fictitious Play. In: Activity Analysis of Production and Allocation. Wiley, New York (1951)
Carmel, Markovitch: Learning models of intelligent agents. In: Proc. of 13th Int. Conf. on AI, AAAI , pp. 62–67 (1996)
Jensen, B., Gini, S.: Non-stationary policy learning in 2-player zero sum games. In: Proc. of 20th Int. Conf. on AI, pp. 789–794 (2005)
Knoll, de Freitas: A machine learning perspective on predictive coding with paq. arXiv:1108.3298 (2011)
Treisman, Faulkner: Generation of random sequences by human subjects: Cognitive operations or psychological process? JEP: General 116, 337–355 (1987)
Axelrod, R.: The evolution of strategies in the iterated prisoner’s dilemma. In: Genetic Algorithms and Simulated Annealing, pp. 32–41. Morgan Kaufmann (1987)
Littman, M.L.: Markov games as a framework for multi-agent reinforcement learning. In: 11th Proc. of ICML, pp. 157–163. Morgan Kaufmann (1994)
Boytsov, Zaslavsky: Context Prediction in Pervasive Computing Systems. In: Burstein, F. (ed.) Supporting Real Time Decision-Making, pp. 35–63. Springer (2011)
Lempel, Ziv: Compression of individual sequences via variable-rate coding (1978)
Knoll, B.: Text prediction and classification using string matching (2009)
Moffat, A.: Implementing the ppm data compression scheme. IEEE Transactions on Communications 38, 1917–1921 (1990)
Gopalratnam, K., Cook, D.J.: Activelezi: An incremental parsing algorithm for sequential prediction. In: 16th Int. FLAIRS Conf., pp. 38–42 (2003)
Laird, P., Saul, R.: Discrete sequence prediction and its applications. Machine Learning 15, 43–68 (1994)
Millington, I.: Learning. In: Artificial Intelligence for Games, pp. 583–590. Morgan Kaufmann (2006)
Gers, F.A., Schraudolph, N.N., Schmidhuber, J.: Learning precise timing with lstm recurrent networks. JMLR 3, 115–143 (2002)
Bowling, M., Veloso, M.: Multiagent learning using a variable learning rate. Artificial Intelligence 136, 215–250 (2002)
Abdallah, S., Lesser, V.R.: Non-linear dynamics in multiagent reinforcement learning algorithms. In: AAMAS (3), pp. 1321–1324 (2008)
Zhang, Lesser: Multi-agent learning with policy prediction. In: AAAI (2010)
Piccolo, E., Squillero, G.: Adaptive opponent modelling for the iterated prisoner’s dilemma. In: IEEE CEC, pp. 836–841 (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mealing, R., Shapiro, J.L. (2013). Opponent Modelling by Sequence Prediction and Lookahead in Two-Player Games. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing. ICAISC 2013. Lecture Notes in Computer Science(), vol 7895. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38610-7_36
Download citation
DOI: https://doi.org/10.1007/978-3-642-38610-7_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38609-1
Online ISBN: 978-3-642-38610-7
eBook Packages: Computer ScienceComputer Science (R0)