Skip to main content
Log in

Learning and Exploiting Relative Weaknesses of Opponent Agents

  • Published:
Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Abstract

Agents in a competitive interaction can greatly benefit from adapting to a particular adversary, rather than using the same general strategy against all opponents. One method of such adaptation isOpponent Modeling, in which a model of an opponent is acquired and utilized as part of the agent’s decision procedure in future interactions with this opponent. However, acquiring an accurate model of a complex opponent strategy may be computationally infeasible. In addition, if the learned model is not accurate, then using it to predict the opponent’s actions may potentially harm the agent’s strategy rather than improving it. We thus define the concept ofopponent weakness, and present a method for learning a model of this simpler concept. We analyze examples of past behavior of an opponent in a particular domain, judging its actions using a trusted judge. We then infer aweakness model based on the opponent’s actions relative to the domain state, and incorporate this model into our agent’s decision procedure. We also make use of a similar self-weakness model, allowing the agent to prefer states in which the opponent is weak and our agent strong; where we have arelative advantage over the opponent. Experimental results spanning two different test domains demonstrate the agents’ improved performance when making use of the weakness models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • V Allis (1988) A knowledge-based approach of Connect-Four - the game is solved White wins, Master’s thesis, Department of mathematics and Computer Science Vrije Universiteit Amsterdam, The Netherlands

    Google Scholar 

  • D Angluin (1978) ArticleTitleOn the complexity of minimum inference of regular sets Information and Control 39 337–350 Occurrence Handle0393.68066 Occurrence Handle523447

    MATH  MathSciNet  Google Scholar 

  • C. Atkeson, and J. Santamaria, “A comparison of direct and model-based reinforcement learning,” 1997.

  • D. Billings, D. Papp, J. Schaeffer, and D. Szafron, “Opponent modeling in poker”, in Proceedings of the Fifteenth National Conference on Artificial Intelligence, Madison, Wisconsin, pp. 493–499, 1998.

  • J. Bruce, M. Bowling, B. Browning, and M. Veloso, “Multi-robot team response to a multi-robot opponent team”, in Proceedings of IROS-2002 workshop on Collaborative Robots, 2002.

  • D. Carmel, and S. Markovitch, “Learning models of the opponent’s strategy in game-playing,” in, Proceedings of The AAAI Fall Symposium on Games: Planning and Learning, North Carolina, 1993.

  • D. Carmel, and S. Markovitch, “Incorporating Opponent Models into Adversary Search”. in, Proceedings of the Thirteenth National Conference on Artificial Intelligence. Portland, Oregon, pp. 120–125.

  • D. Carmel, and S. Markovitch, “Learning and using opponent models in adversary search”, Technical Report CIS9609, Technion, 1996b.

  • D. Carmel, and S. Markovitch, “Learning models of intelligent agents”, in, Proceedings of the Thirteenth National Conference on Artificial Intelligence. Portland, Oregon, pp. 62–67, 1996c.

  • D Carmel S Markovitch (1998) ArticleTitleModel-based learning of interaction strategies in multi-agent systems Journal of Experimental and Theoretical Artificial Intelligence 10 IssueID3 309–332 Occurrence Handle1053.68591

    MATH  Google Scholar 

  • D Carmel S Markovitch (1999) ArticleTitleExploration strategies for model-based learning in multiagent systems Autonomous Agents and Multi-agent Systems 2 IssueID2 141–172

    Google Scholar 

  • H. Donkers J. Uiterwijk H. den Herik Particlevan (2001) ArticleTitle“Probabilistic opponent-model search” Information Sciences 135 IssueID3-4 123–149 Occurrence Handle1002.68780 Occurrence Handle1845826

    MATH  MathSciNet  Google Scholar 

  • Y. Freund, M. Kearns, Y. Mansour, D. Ron, and R. Rubinfeld, “Efficient algorithms for learning to play repeated games against computationally bounded adversaries”, in, Proceeding. of the 36th Annual Symposium on Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, CA, pp. 332–341, 1995.

  • X. Gao H. Iida J. W. Uiterwijk H. J. den Herik Particlevan (2001) ArticleTitle“Strategies anticipating a difference in search depth using opponent-model search” Theoretical Computer Science 252 IssueID1-2 83–104 Occurrence Handle0954.68059 Occurrence Handle1806226

    MATH  MathSciNet  Google Scholar 

  • X. Gao, H. Iida, J. W. H. M. Uiterwijk, and H. J. van den Herik, “Performance of (D,d)-OM search in othello”, in, Proceedings of JSSST 14th Conference, Shikawa, Japan, pp. 229–232, 1997.

  • X. Gao H. Iida J. W. Uiterwijk H. J. den Herik Particlevan (1999) ArticleTitle“A Speculative Strategy” Lecture Notes in Computer Science 1558 74–92 Occurrence Handle10.1007/3-540-48957-6_5

    Article  Google Scholar 

  • P.J. Gmytrasiewicz E.H. Durfee (1995) “A rigorous, operational formalization of recursive modeling” V. Lesser L. Gasser (Eds) Proceedings of the First International Conference on Multi-Agent Systems (ICMAS-95) AAAI Press San Francisco, CA, USA

    Google Scholar 

  • P.J. Gmytrasiewicz S. Noh T. Kellogg (1998) ArticleTitle“Bayesian update of recursive agent models” User Modeling and User-Adapted Interaction 8 IssueID1-2 49–69

    Google Scholar 

  • F.-H. Hsu T. Ananthraman M. Campbell A. Nowatzyk (1990) “Deep thought” T. Marsland J. Schaeffer (Eds) Computers, Chess and Cognition Springer New York 55–78

    Google Scholar 

  • Y.-J. Hu, and D. F. Kibler, “Generation of attributes for learning algorithms”, in, Proceedings of the Thirteenth National Conference on Artificial Intelligence and the Eighth Innovative Applications of Artificial Intelligence Conference, Menlo Park, AAAI Press / MIT Press, pp. 806–811, 1996.

  • J.H. Iida H.W. H. M. Uiterwijk J. den Herik Particle van I.S. Herschberg (1993) ArticleTitle“Potential applications of opponent-model search, Part IThe Domain of Applicability” ICCA Journal 16 IssueID4 201–208

    Google Scholar 

  • J.H. Iida H.W. H. M. Uiterwijk J. Den Herik Particlevan I. S. Herschberg (1994) ArticleTitle“Potentialapplications of opponent-model search Part II: Risks and strategies” ICCA Journal 17 IssueID1 10–14

    Google Scholar 

  • P. J. Jansen, “Using knowledge about the opponent in game-tree search”, Ph.D. thesis, Carnegie Mellon University, 1992.

  • A. Junghanns, and J. Schaeffer, “Search versus knowledge in game-playing programs revisited”. in Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence (IJCAI-97). Nagoya, Japan, pp. 692–697, 1997.

  • L. P. Kaelbling M.L. Littman A.P. Moore (1996) ArticleTitle“Reinforcement learning a survey” Journal of Artificial Intelligence Research 4 237–285

    Google Scholar 

  • M.L. Littman, “Markov games as a framework for multi-agent reinforcement learning”, in, Proceedings of the 11th International Conference on Machine Learning (ML-94), New Brunswick, NJ, Morgan Kaufmann, pp. 157–163, 1994.

  • S. Markovitch D. Rosenstein (2001) ArticleTitle“Feature generation using general constructor functions” Machine Learning 49 59–98

    Google Scholar 

  • S. Markovitch Y. Sella (1996) ArticleTitle“Learning of resource allocation strategies for game playing” Computational Intelligence 12 IssueID1 88–105

    Google Scholar 

  • C. J. Matheus, and L. A. Rendell, “Constructive induction on decision trees”, in, N. S. Sridharan (ed.), Proceedings of the 11th International Joint Conference on Artificial Intelligence, Detroit, MI, USA, Morgan Kaufmann, pp. 645–650, 1989.

  • A.W. Moore C.G. Atkeson (1993) ArticleTitle“Prioritized sweeping, reinforcement learning with less data and less time” Machine Learning 13 103–130

    Google Scholar 

  • Y. Mor, C. Goldman, and J. Rosenschein, “Learn your opponent’s strategy (in polynomial time)!”, in, G. Weiss and S. Sen (eds.), Adaptation and Learning in Multi-agent Systems, Lecture Notes in Artificial Intelligence, vol. 1042. Springer-Verlag, 1996.

  • D.S.Nau 1980A“Pathology on game trees summary of results”, in Proceedings of the First National Conference on Artificial Intelligence, Stanford, California, pp. 102–104

  • D.S. Nau (1982) ArticleTitle“An investigation of the causes of pathology in games” Artificial Intelligence 19 257–278 Occurrence Handle0503.68070 Occurrence Handle644524

    MATH  MathSciNet  Google Scholar 

  • G. Pagallo D. Haussler (1990) ArticleTitle“Boolean feature discovery in empirical learning” Machine Learning 5 IssueID1 71–99

    Google Scholar 

  • J. Pearl (1983) ArticleTitle“On the nature of pathology in game searching” Artificial Intelligence 20 427–453 Occurrence Handle0509.68105 Occurrence Handle691548

    MATH  MathSciNet  Google Scholar 

  • J.R. Quinlan (1986) ArticleTitle“Induction of decision trees” In Machine Learning 1 81–106

    Google Scholar 

  • A. Reibman, and B. Ballard, “Non-minimax strategies for use against fallible opponents”, in, Proceedings of the international conference on artificial intelligence AAAI-83, Los Altos, CA, William Kaufman,pp. 338–343, 1983.

  • S. Russell E. Wefald (1991) Do the right thing :studies in limited rationality, Artificial Intelligence MIT Press Cambridge, Mass

    Google Scholar 

  • T.W. Sandholm R.H. Crites (1995) ArticleTitle“Multiagent reinforcement learning and the iterated Prisoner’s Dilemma” Biosystems Journal 37 147–166

    Google Scholar 

  • R. Schapire, P. Stone, D. McAllester, M. Littman, and J. Csirik, “Modeling auction price uncertainty using boosting-based conditional density estimation”, in Proceedings of the Nineteenth International Conference on Machine Learning, 2002.

  • S. Sen, and N. Arora, “Learning to take risks”, in AAAI-97 Workshop on Multiagent Learning, pp. 59–64, 1997.

  • S. Sen, and G. Weiss, “Learning in multiagent systems,” in, G. Weiss (ed.),Multiagent Systems: A Modern Approach to Distributed Artificial Intelligence. Cambridge, Massachusetts, The MIT Press, Chapt. 6, pp. 259--298, 1999.

  • H. A. Simon (1982) Models of Bounded Rationality Cambridge, Massachusetts The MIT Press

    Google Scholar 

  • P.Stone, P. Riley, and M. Veloso, “Defining and using ideal teammate and opponent agent models”, in, Proceedings of the 7th Conference on Artificial Intelligence (AAAI-00) and of the 12th Conference on Innovative Applications of Artificial Intelligence (IAAI-00). Menlo Park, CA, AAAI Press, pp. 1040–1045, 2000.

  • R.Sutton, “Integrated architectures for learning, planning, and reacting based on approximating dynamic programming”, in Proceedings of the Seventh International Conference on Machine Learning. pp. 216–224, 1990.

  • W. T. B. Uther, and M. M. Veloso, “Generalizing adversarial reinforcement learning”, in Proceedings of the AAAI Fall Symposium on Model Directed Autonomous Systems, 1997.

  • J. M. Vidal, and E. H. Durfee, “The impact of nested agent models in an information economy”, in, V. Lesser (ed.), Proceedings of the Second International Conference on Multi-Agent Systems (ICMAS’96). Kyoto, Japan, The MIT Press, Cambridge, MA, USA, 1995.

  • J. M. Vidal, and E. H. Durfee, “Using recursive agent models effectively,” in M. Wooldridge, J. P. Müller, and M. Tambe (eds.),Proceedings on the IJCAI Workshop on Intelligent Agents II: Agent Theories, Architectures, and Languages, vol. 1037 of LNAI. Springer-Verlag, Heidelberg, Germany, pp. 171--186, 1996.

  • C. J. Watkins, “Learning from delayed rewards”, Ph.D. thesis, University of Cambridge, 1989.

  • C.J. Watkins P. Dayan (1992) ArticleTitle“Q-Learning” Machine Learning 8 279–292 Occurrence Handle0773.68062

    MATH  Google Scholar 

  • G. Weiss, and S. Sen, Adaptation and learning in multi-agent systems, Lectures Notes in Articial Intelligence, vol. 1042. Springer-Verlag, 1996.

  • S. Zilberstein,“Optimizing decision quality with contract algorithms”. in, Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence. Montreal, Canada, pp. 1576–1582, 1995.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shaul Markovitch.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Markovitch, S., Reger, R. Learning and Exploiting Relative Weaknesses of Opponent Agents. Auton Agent Multi-Agent Syst 10, 103–130 (2005). https://doi.org/10.1007/s10458-004-6977-7

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10458-004-6977-7

Keywords

Navigation