Learning and Exploiting Relative Weaknesses of Opponent Agents

Markovitch, Shaul; Reger, Ronit

doi:10.1007/s10458-004-6977-7

Learning and Exploiting Relative Weaknesses of Opponent Agents

Published: March 2005

Volume 10, pages 103–130, (2005)
Cite this article

Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Shaul Markovitch¹ &
Ronit Reger¹

221 Accesses
12 Citations
Explore all metrics

Abstract

Agents in a competitive interaction can greatly benefit from adapting to a particular adversary, rather than using the same general strategy against all opponents. One method of such adaptation isOpponent Modeling, in which a model of an opponent is acquired and utilized as part of the agent’s decision procedure in future interactions with this opponent. However, acquiring an accurate model of a complex opponent strategy may be computationally infeasible. In addition, if the learned model is not accurate, then using it to predict the opponent’s actions may potentially harm the agent’s strategy rather than improving it. We thus define the concept ofopponent weakness, and present a method for learning a model of this simpler concept. We analyze examples of past behavior of an opponent in a particular domain, judging its actions using a trusted judge. We then infer aweakness model based on the opponent’s actions relative to the domain state, and incorporate this model into our agent’s decision procedure. We also make use of a similar self-weakness model, allowing the agent to prefer states in which the opponent is weak and our agent strong; where we have arelative advantage over the opponent. Experimental results spanning two different test domains demonstrate the agents’ improved performance when making use of the weakness models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

V Allis (1988) A knowledge-based approach of Connect-Four - the game is solved White wins, Master’s thesis, Department of mathematics and Computer Science Vrije Universiteit Amsterdam, The Netherlands
Google Scholar
D Angluin (1978) ArticleTitleOn the complexity of minimum inference of regular sets Information and Control 39 337–350 Occurrence Handle0393.68066 Occurrence Handle523447
MATH MathSciNet Google Scholar
C. Atkeson, and J. Santamaria, “A comparison of direct and model-based reinforcement learning,” 1997.
D. Billings, D. Papp, J. Schaeffer, and D. Szafron, “Opponent modeling in poker”, in Proceedings of the Fifteenth National Conference on Artificial Intelligence, Madison, Wisconsin, pp. 493–499, 1998.
J. Bruce, M. Bowling, B. Browning, and M. Veloso, “Multi-robot team response to a multi-robot opponent team”, in Proceedings of IROS-2002 workshop on Collaborative Robots, 2002.
D. Carmel, and S. Markovitch, “Learning models of the opponent’s strategy in game-playing,” in, Proceedings of The AAAI Fall Symposium on Games: Planning and Learning, North Carolina, 1993.
D. Carmel, and S. Markovitch, “Incorporating Opponent Models into Adversary Search”. in, Proceedings of the Thirteenth National Conference on Artificial Intelligence. Portland, Oregon, pp. 120–125.
D. Carmel, and S. Markovitch, “Learning and using opponent models in adversary search”, Technical Report CIS9609, Technion, 1996b.
D. Carmel, and S. Markovitch, “Learning models of intelligent agents”, in, Proceedings of the Thirteenth National Conference on Artificial Intelligence. Portland, Oregon, pp. 62–67, 1996c.
D Carmel S Markovitch (1998) ArticleTitleModel-based learning of interaction strategies in multi-agent systems Journal of Experimental and Theoretical Artificial Intelligence 10 IssueID3 309–332 Occurrence Handle1053.68591
MATH Google Scholar
D Carmel S Markovitch (1999) ArticleTitleExploration strategies for model-based learning in multiagent systems Autonomous Agents and Multi-agent Systems 2 IssueID2 141–172
Google Scholar
H. Donkers J. Uiterwijk H. den Herik Particlevan (2001) ArticleTitle“Probabilistic opponent-model search” Information Sciences 135 IssueID3-4 123–149 Occurrence Handle1002.68780 Occurrence Handle1845826
MATH MathSciNet Google Scholar
Y. Freund, M. Kearns, Y. Mansour, D. Ron, and R. Rubinfeld, “Efficient algorithms for learning to play repeated games against computationally bounded adversaries”, in, Proceeding. of the 36th Annual Symposium on Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, CA, pp. 332–341, 1995.
X. Gao H. Iida J. W. Uiterwijk H. J. den Herik Particlevan (2001) ArticleTitle“Strategies anticipating a difference in search depth using opponent-model search” Theoretical Computer Science 252 IssueID1-2 83–104 Occurrence Handle0954.68059 Occurrence Handle1806226
MATH MathSciNet Google Scholar
X. Gao, H. Iida, J. W. H. M. Uiterwijk, and H. J. van den Herik, “Performance of (D,d)-OM search in othello”, in, Proceedings of JSSST 14th Conference, Shikawa, Japan, pp. 229–232, 1997.
X. Gao H. Iida J. W. Uiterwijk H. J. den Herik Particlevan (1999) ArticleTitle“A Speculative Strategy” Lecture Notes in Computer Science 1558 74–92 Occurrence Handle10.1007/3-540-48957-6_5
Article Google Scholar
P.J. Gmytrasiewicz E.H. Durfee (1995) “A rigorous, operational formalization of recursive modeling” V. Lesser L. Gasser (Eds) Proceedings of the First International Conference on Multi-Agent Systems (ICMAS-95) AAAI Press San Francisco, CA, USA
Google Scholar
P.J. Gmytrasiewicz S. Noh T. Kellogg (1998) ArticleTitle“Bayesian update of recursive agent models” User Modeling and User-Adapted Interaction 8 IssueID1-2 49–69
Google Scholar
F.-H. Hsu T. Ananthraman M. Campbell A. Nowatzyk (1990) “Deep thought” T. Marsland J. Schaeffer (Eds) Computers, Chess and Cognition Springer New York 55–78
Google Scholar
Y.-J. Hu, and D. F. Kibler, “Generation of attributes for learning algorithms”, in, Proceedings of the Thirteenth National Conference on Artificial Intelligence and the Eighth Innovative Applications of Artificial Intelligence Conference, Menlo Park, AAAI Press / MIT Press, pp. 806–811, 1996.
J.H. Iida H.W. H. M. Uiterwijk J. den Herik Particle van I.S. Herschberg (1993) ArticleTitle“Potential applications of opponent-model search, Part IThe Domain of Applicability” ICCA Journal 16 IssueID4 201–208
Google Scholar
J.H. Iida H.W. H. M. Uiterwijk J. Den Herik Particlevan I. S. Herschberg (1994) ArticleTitle“Potentialapplications of opponent-model search Part II: Risks and strategies” ICCA Journal 17 IssueID1 10–14
Google Scholar
P. J. Jansen, “Using knowledge about the opponent in game-tree search”, Ph.D. thesis, Carnegie Mellon University, 1992.
A. Junghanns, and J. Schaeffer, “Search versus knowledge in game-playing programs revisited”. in Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence (IJCAI-97). Nagoya, Japan, pp. 692–697, 1997.
L. P. Kaelbling M.L. Littman A.P. Moore (1996) ArticleTitle“Reinforcement learning a survey” Journal of Artificial Intelligence Research 4 237–285
Google Scholar
M.L. Littman, “Markov games as a framework for multi-agent reinforcement learning”, in, Proceedings of the 11th International Conference on Machine Learning (ML-94), New Brunswick, NJ, Morgan Kaufmann, pp. 157–163, 1994.
S. Markovitch D. Rosenstein (2001) ArticleTitle“Feature generation using general constructor functions” Machine Learning 49 59–98
Google Scholar
S. Markovitch Y. Sella (1996) ArticleTitle“Learning of resource allocation strategies for game playing” Computational Intelligence 12 IssueID1 88–105
Google Scholar
C. J. Matheus, and L. A. Rendell, “Constructive induction on decision trees”, in, N. S. Sridharan (ed.), Proceedings of the 11th International Joint Conference on Artificial Intelligence, Detroit, MI, USA, Morgan Kaufmann, pp. 645–650, 1989.
A.W. Moore C.G. Atkeson (1993) ArticleTitle“Prioritized sweeping, reinforcement learning with less data and less time” Machine Learning 13 103–130
Google Scholar
Y. Mor, C. Goldman, and J. Rosenschein, “Learn your opponent’s strategy (in polynomial time)!”, in, G. Weiss and S. Sen (eds.), Adaptation and Learning in Multi-agent Systems, Lecture Notes in Artificial Intelligence, vol. 1042. Springer-Verlag, 1996.
D.S.Nau 1980A“Pathology on game trees summary of results”, in Proceedings of the First National Conference on Artificial Intelligence, Stanford, California, pp. 102–104
D.S. Nau (1982) ArticleTitle“An investigation of the causes of pathology in games” Artificial Intelligence 19 257–278 Occurrence Handle0503.68070 Occurrence Handle644524
MATH MathSciNet Google Scholar
G. Pagallo D. Haussler (1990) ArticleTitle“Boolean feature discovery in empirical learning” Machine Learning 5 IssueID1 71–99
Google Scholar
J. Pearl (1983) ArticleTitle“On the nature of pathology in game searching” Artificial Intelligence 20 427–453 Occurrence Handle0509.68105 Occurrence Handle691548
MATH MathSciNet Google Scholar
J.R. Quinlan (1986) ArticleTitle“Induction of decision trees” In Machine Learning 1 81–106
Google Scholar
A. Reibman, and B. Ballard, “Non-minimax strategies for use against fallible opponents”, in, Proceedings of the international conference on artificial intelligence AAAI-83, Los Altos, CA, William Kaufman,pp. 338–343, 1983.
S. Russell E. Wefald (1991) Do the right thing :studies in limited rationality, Artificial Intelligence MIT Press Cambridge, Mass
Google Scholar
T.W. Sandholm R.H. Crites (1995) ArticleTitle“Multiagent reinforcement learning and the iterated Prisoner’s Dilemma” Biosystems Journal 37 147–166
Google Scholar
R. Schapire, P. Stone, D. McAllester, M. Littman, and J. Csirik, “Modeling auction price uncertainty using boosting-based conditional density estimation”, in Proceedings of the Nineteenth International Conference on Machine Learning, 2002.
S. Sen, and N. Arora, “Learning to take risks”, in AAAI-97 Workshop on Multiagent Learning, pp. 59–64, 1997.
S. Sen, and G. Weiss, “Learning in multiagent systems,” in, G. Weiss (ed.),Multiagent Systems: A Modern Approach to Distributed Artificial Intelligence. Cambridge, Massachusetts, The MIT Press, Chapt. 6, pp. 259--298, 1999.
H. A. Simon (1982) Models of Bounded Rationality Cambridge, Massachusetts The MIT Press
Google Scholar
P.Stone, P. Riley, and M. Veloso, “Defining and using ideal teammate and opponent agent models”, in, Proceedings of the 7th Conference on Artificial Intelligence (AAAI-00) and of the 12th Conference on Innovative Applications of Artificial Intelligence (IAAI-00). Menlo Park, CA, AAAI Press, pp. 1040–1045, 2000.
R.Sutton, “Integrated architectures for learning, planning, and reacting based on approximating dynamic programming”, in Proceedings of the Seventh International Conference on Machine Learning. pp. 216–224, 1990.
W. T. B. Uther, and M. M. Veloso, “Generalizing adversarial reinforcement learning”, in Proceedings of the AAAI Fall Symposium on Model Directed Autonomous Systems, 1997.
J. M. Vidal, and E. H. Durfee, “The impact of nested agent models in an information economy”, in, V. Lesser (ed.), Proceedings of the Second International Conference on Multi-Agent Systems (ICMAS’96). Kyoto, Japan, The MIT Press, Cambridge, MA, USA, 1995.
J. M. Vidal, and E. H. Durfee, “Using recursive agent models effectively,” in M. Wooldridge, J. P. Müller, and M. Tambe (eds.),Proceedings on the IJCAI Workshop on Intelligent Agents II: Agent Theories, Architectures, and Languages, vol. 1037 of LNAI. Springer-Verlag, Heidelberg, Germany, pp. 171--186, 1996.
C. J. Watkins, “Learning from delayed rewards”, Ph.D. thesis, University of Cambridge, 1989.
C.J. Watkins P. Dayan (1992) ArticleTitle“Q-Learning” Machine Learning 8 279–292 Occurrence Handle0773.68062
MATH Google Scholar
G. Weiss, and S. Sen, Adaptation and learning in multi-agent systems, Lectures Notes in Articial Intelligence, vol. 1042. Springer-Verlag, 1996.
S. Zilberstein,“Optimizing decision quality with contract algorithms”. in, Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence. Montreal, Canada, pp. 1576–1582, 1995.

Download references

Author information

Authors and Affiliations

Computer Science Department, Technion, Israel Institute of Technology, Israel
Shaul Markovitch & Ronit Reger

Authors

Shaul Markovitch
View author publications
You can also search for this author in PubMed Google Scholar
Ronit Reger
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shaul Markovitch.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Markovitch, S., Reger, R. Learning and Exploiting Relative Weaknesses of Opponent Agents. Auton Agent Multi-Agent Syst 10, 103–130 (2005). https://doi.org/10.1007/s10458-004-6977-7

Download citation

Issue Date: March 2005
DOI: https://doi.org/10.1007/s10458-004-6977-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning and Exploiting Relative Weaknesses of Opponent Agents

Abstract

Access this article

Similar content being viewed by others

Modeling the Opponent’s Action Using Control-Based Reinforcement Learning

Efficiently detecting switches against non-stationary opponents

Learning in Multi Agent Social Environments with Opponent Models

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Learning and Exploiting Relative Weaknesses of Opponent Agents

Abstract

Access this article

Similar content being viewed by others

Modeling the Opponent’s Action Using Control-Based Reinforcement Learning

Efficiently detecting switches against non-stationary opponents

Learning in Multi Agent Social Environments with Opponent Models

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation