Skip to main content

Advertisement

Log in

A unifying learning framework for building artificial game-playing agents

  • Published:
Annals of Mathematics and Artificial Intelligence Aims and scope Submit manuscript

Abstract

This paper investigates learning-based agents that are capable of mimicking human behavior in game playing, a central task in computational economics. Although computational economists have developed various game-playing agents, well-established machine learning methods such as graphical models have not been applied before. Leveraging probabilistic graphical models, this paper presents a novel sequential Bayesian network (SBN) framework for building artificial game-playing agents. We show that many existing agents, including reinforcement learning, fictitious play, and many of their variants, have a unified Bayesian explanation within the proposed SBN framework. Moreover, we discover that SBN can handle various important settings of game playing, allowing for a broad scope of its use in economics. SBN not only provides a unifying and satisfying framework to explain existing learning approaches in virtual economies, but also enables the development of new algorithms that are stronger or have fewer restrictions. In this paper, we derive a new algorithm, Hidden Markovian Play (HMP), from the generic SBN model to handle an important but difficult setting in which a player cannot observe the opponent’s strategy and payoff. It leverages Markovian learning to infer unobservable information, leading to higher quality of the agents. Experiments on real-world field experiments in evaluating economies show that our HMP model outperforms the baseline algorithms for building artificial agents.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Airiau, S., Endriss, U.: Multiagent resource allocation with sharable items: Simple protocols and nash equilibria In: Proceedings of the 9th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS-2010) (2010)

  2. Arora, S., Hazan, E., Kale, S.: The multiplicative weights update method: a meta-algorithm and applications. Theory Comput. 8(1), 121–164 (2012)

    Article  MathSciNet  Google Scholar 

  3. Bishop, C.M.: Pattern Recognition and Machine Learning. Springer-Verlag, New York, Inc. Secaucus, NJ (2006)

    MATH  Google Scholar 

  4. Bonabeau, E.: Agent-Based Modeling: Methods and Techniques for Simulating Human Systems. Proc. Natl. Acad. Sci. U. S. A. 99(10), 7280–7287 (2002)

    Article  Google Scholar 

  5. Boutilier, C., Shoham, Y., Wellman, M.P.: Economic principles of multi-agent systems. Artif. Intell. 94(1–2), 1–6 (1997)

    Article  Google Scholar 

  6. Bowling, M., Veloso, M.: Existence of multiagent equilibria with limited agents. J. Artif. Intell. Res. 22(1), 353–384 (2004) http://dl.acm.org/citation.cfm?id=1622487. 1622498

    MATH  MathSciNet  Google Scholar 

  7. Camerer, C., hua Ho, T.: Experience-weighted attraction learning in normal form games. Econometrica 67, 827–874 (1999)

    Article  MATH  Google Scholar 

  8. Camerer, C.F., Ho, T.H., Chong, J.K.: A cognitive hierarchy model of games. Q. J. Econ. 119(3), 861–898 (2004)

    Article  MATH  Google Scholar 

  9. Carmel, D., Markovitch, S.: Learning models of intelligent agents. In: Proceedings of the Thirteenth National Conference on Artificial Intelligence pp 62–67 Portland Oregon (1996)

  10. Chen, Y., Lai, J., Parkes, D.C., Procaccia, A.D.: Truth, justice, and cake cutting. In: Fox, M., Poole, D. (eds.) AAAI 2010, Atlanta, Georgia, USA, July 11–15, 2010. AAAI Press (2010)

  11. Chen, Y., Vaughan, J.W.: A new understanding of prediction markets via no-regret learning In: Proceedings of he 11th ACM conference on Electronic commerce, EC ’10 pp 189–198. ACM, New York, NY (2010)

  12. Crandall, J.W., Ahmed, A., Goodrich, M.A.: In: Burgard, W., Roth, D. (eds.) : Learning in repeated games with minimal information: The effects of learning bias. AAAI. AAAI Press (2011)

  13. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum Likelihood from Incomplete Data via the EM Algorithm. J. Royal Statist. Soc. Series B (Methodological) 39(1), 1–38 (1977)

    MATH  MathSciNet  Google Scholar 

  14. Devaine, M., Hollard, G., Daunizeau, J.: Theory of mind: did evolution fool us. PloS. One. 9(2), e87619 (2014)

    Article  Google Scholar 

  15. Dimicco, J.M., Greenwald, A., Maes, P.: Learning curve: A simulation-based approach to dynamic pricing. J. Electron. Commer. Res. 3, 245–276 (2003)

    Article  Google Scholar 

  16. Doshi, P., Qu, X., Goodie, A., Young, D.: Modeling recursive reasoning by humans using empirically informed interactive pomdps In: Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1, AAMAS ’10, pp.1223–1230. International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC (2010)

  17. Elkind, E., Golberg, L.A., Goldberg, P.W.: Computing good nash equilibria in graphical games Proceedings of the 8th ACM conference on Electronic commerce, EC ’07, pp 162–171. ACM, New York, NY USA (2007)

    Google Scholar 

  18. Elkind, E., Leyton-Brown, K.: Algorithmic game theory and artificial intelligence. Artif. Intell. Mag. 31(4), 9–12 (2010)

    Google Scholar 

  19. Erev, I.: On surprise, change, and the effect of recent outcomes. Front. Psychol. 3(0) (2012)

  20. Erev, I., Bereby-Meyer, Y., Roth, A.E.: The effect of adding a constant to all payoffs: experimental investigation, and implications for reinforcement learning models. J. Econ. Behav. & Organ. 39(1), 111–128 (1999)

    Article  Google Scholar 

  21. Erev, I., Ert, E., Roth, A.E.: A choice prediction competition for market entry games: An introduction. Games 1(2), 117–136 (2010). doi:10.3390/g1020117

    Article  MathSciNet  Google Scholar 

  22. Erev, I., Ert, E., Roth, A.E., Haruvy, E., Herzog, S., Hau, R., Hertwig, R., Steward, T., West, R., Lebiere, C.: A choice prediction competition, for choices from experience and from description. J. Behav. Decis. Mak. 23, 15–47 (2010)

    Article  Google Scholar 

  23. Erev, I., Roth, A., Slonim, R., Barron, G.: Learning and equilibrium as useful approximations: Accuracy of prediction on randomly selected constant sum games. Econ. Theory 33(1), 29–51 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  24. Erev, I., Roth, A.E.: Predicting how people play games: Reinforcement learning in experimental games with unique, mixed strategy equilibria. Am. Econ. Rev. 88(4), 848–81 (1998)

    Google Scholar 

  25. Erev, I., Roth, A.E., Slonim, R.L., Barron, G.: Predictive value and the usefulness of game theoretic models. Int. J. Forecast. 18(3), 359–368 (2002)

    Article  Google Scholar 

  26. Ert, E., Erev, I.: Replicated alternatives and the role of confusion, chasing, and regret in decisions from experience. J. Behav. Decis. Mak. 322, 305–322 (2007)

    Article  Google Scholar 

  27. Fudenberg, D., Levine, D.K.: The Theory of Learning in Games, MIT Press Books, vol. 1. The MIT Press (1998)

  28. Fudenberg, D., Levine, D.K.: Conditional universal consistency. Games Econ. Behav. 29(1–2), 104–130 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  29. Ho, T.H., Camerer, C.F., Chong, J.K.: Self-tuning experience weighted attraction learning in games. J. Econ. Theory 133(1), 177–198 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  30. Hu, J., Wellman, M.P.: Nash q-learning for general-sum stochastic games. J. Mach. Learn. Res. 4, 1039–1069 (2003)

    MathSciNet  Google Scholar 

  31. Jafari, A., Greenwald, A.R., Gondek, D., Ercal, G.: On no-regret learning, fictitious play, and nash equilibrium In: Proceedings of the Eighteenth International Conference on Machine Learning, ICML ’01, pp 226–233. Morgan Kaufmann Publishers Inc., San Francisco, CA (2001)

  32. Kearns, M., Littman, M.L., Singh, S.: Graphical models for game theory In: Proceedings of the Seventeenth conference on Uncertainty in artificial intelligence (2001)

  33. Kleinberg, R., Piliouras, G., Tardos, É.: Multiplicative updates outperform generic no-regret learning in congestion games In: Proceedings of the forty-first annual ACM symposium on Theory of computing (2009)

  34. Koller, D., Milch, B.: Multi-agent influence diagrams for representing and solving games. Games Econ. Behav. 45(1), 181–221 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  35. Levitt, S.D., List, J.A.: Field experiments in economics: The past, the present, and the future. NBER Workingc Papers 14356 National Bureau of Economic Research Inc (2008)

  36. Ng, B., Boakye, K., Meyers, C., Wang, A.: Bayes-adaptive interactive pomdps AAAI (2012)

  37. Pearl, J.: Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann Publishers Inc., San Francisco, CA (1988)

    Google Scholar 

  38. Rezek, I., Leslie, D.S., Reece, S., Roberts, S.J., Rogers, A., Dash, R.K., Jennings, N.R.: On similarities between inference in game theory and machine learning. J. Artif. Intell. Res. 33(1), 259–283 (2008)

    MATH  MathSciNet  Google Scholar 

  39. Shoham, Y., Powers, R., Grenager, T.: If multi-agent learning is the answer, what is the question?. Artif. Intell. 171(7), 365–377 (2007). doi:10.1016/j.artint.2006.02.006

    Article  MATH  MathSciNet  Google Scholar 

  40. Watkins, C.J.C.H.: Learning from delayed rewards. Ph.D. thesis, University of Cambridge (1989)

    Google Scholar 

  41. Waugh, K., Bagnell, D., Ziebart, B.D.: Computational rationalization: The inverse equilibrium problem. In: Proceedings of the 28th International Conference on Machine Learning, pp 1169–1176, New York, NY (2011)

  42. Wright, J.R., Leyton-Brown, K.: Behavioral game theoretic models: a bayesian framework for parameter analysis In; Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems-Volume 2 (2012)

  43. Yang, R., Ordonez, F., Tambe, M.: Computing optimal strategy against quantal response in security games In: Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems-Volume 2 (2012)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wenlin Chen.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, W., Chen, Y. & Levine, D.K. A unifying learning framework for building artificial game-playing agents. Ann Math Artif Intell 73, 335–358 (2015). https://doi.org/10.1007/s10472-015-9450-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10472-015-9450-1

Keywords

Navigation