Skip to main content

Opponent Modelling by Sequence Prediction and Lookahead in Two-Player Games

  • Conference paper
Artificial Intelligence and Soft Computing (ICAISC 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7895))

Included in the following conference series:

Abstract

Learning a strategy that maximises total reward in a multi-agent system is a hard problem when it depends on other agents’ strategies. Many previous approaches consider opponents which are reactive and memoryless. In this paper, we use sequence prediction algorithms to perform opponent modelling in two-player games, to model opponents with memory. We argue that to compete with opponents with memory, lookahead is required. We combine these algorithms with reinforcement learning and lookahead action selection, allowing them to find strategies that maximise total reward up to a limited depth. Experiments confirm lookahead is required, and show these algorithms successfully model and exploit opponent strategies with different memory lengths. The proposed approach outperforms popular and state-of-the-art reinforcement learning algorithms in terms of learning speed and final performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Watkins, C.J.C.H.: Learning from delayed rewards. PhD thesis, Cambridge (1989)

    Google Scholar 

  2. Brown, G.: Iterative Solutions of Games by Fictitious Play. In: Activity Analysis of Production and Allocation. Wiley, New York (1951)

    Google Scholar 

  3. Carmel, Markovitch: Learning models of intelligent agents. In: Proc. of 13th Int. Conf. on AI, AAAI , pp. 62–67 (1996)

    Google Scholar 

  4. Jensen, B., Gini, S.: Non-stationary policy learning in 2-player zero sum games. In: Proc. of 20th Int. Conf. on AI, pp. 789–794 (2005)

    Google Scholar 

  5. Knoll, de Freitas: A machine learning perspective on predictive coding with paq. arXiv:1108.3298 (2011)

    Google Scholar 

  6. Treisman, Faulkner: Generation of random sequences by human subjects: Cognitive operations or psychological process? JEP: General 116, 337–355 (1987)

    Google Scholar 

  7. Axelrod, R.: The evolution of strategies in the iterated prisoner’s dilemma. In: Genetic Algorithms and Simulated Annealing, pp. 32–41. Morgan Kaufmann (1987)

    Google Scholar 

  8. Littman, M.L.: Markov games as a framework for multi-agent reinforcement learning. In: 11th Proc. of ICML, pp. 157–163. Morgan Kaufmann (1994)

    Google Scholar 

  9. Boytsov, Zaslavsky: Context Prediction in Pervasive Computing Systems. In: Burstein, F. (ed.) Supporting Real Time Decision-Making, pp. 35–63. Springer (2011)

    Google Scholar 

  10. Lempel, Ziv: Compression of individual sequences via variable-rate coding (1978)

    Google Scholar 

  11. Knoll, B.: Text prediction and classification using string matching (2009)

    Google Scholar 

  12. Moffat, A.: Implementing the ppm data compression scheme. IEEE Transactions on Communications 38, 1917–1921 (1990)

    Article  Google Scholar 

  13. Gopalratnam, K., Cook, D.J.: Activelezi: An incremental parsing algorithm for sequential prediction. In: 16th Int. FLAIRS Conf., pp. 38–42 (2003)

    Google Scholar 

  14. Laird, P., Saul, R.: Discrete sequence prediction and its applications. Machine Learning 15, 43–68 (1994)

    MATH  Google Scholar 

  15. Millington, I.: Learning. In: Artificial Intelligence for Games, pp. 583–590. Morgan Kaufmann (2006)

    Google Scholar 

  16. Gers, F.A., Schraudolph, N.N., Schmidhuber, J.: Learning precise timing with lstm recurrent networks. JMLR 3, 115–143 (2002)

    MathSciNet  Google Scholar 

  17. Bowling, M., Veloso, M.: Multiagent learning using a variable learning rate. Artificial Intelligence 136, 215–250 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  18. Abdallah, S., Lesser, V.R.: Non-linear dynamics in multiagent reinforcement learning algorithms. In: AAMAS (3), pp. 1321–1324 (2008)

    Google Scholar 

  19. Zhang, Lesser: Multi-agent learning with policy prediction. In: AAAI (2010)

    Google Scholar 

  20. Piccolo, E., Squillero, G.: Adaptive opponent modelling for the iterated prisoner’s dilemma. In: IEEE CEC, pp. 836–841 (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mealing, R., Shapiro, J.L. (2013). Opponent Modelling by Sequence Prediction and Lookahead in Two-Player Games. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing. ICAISC 2013. Lecture Notes in Computer Science(), vol 7895. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38610-7_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-38610-7_36

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-38609-1

  • Online ISBN: 978-3-642-38610-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics