Abstract
We evaluate the asymptotic performance of boundedly-rational strategies in multi-armed bandit problems, where performance is measured in terms of the tendency (in the limit) to play optimal actions in either (i) isolation or (ii) networks of other learners. We show that, for many strategies commonly employed in economics, psychology, and machine learning, performance in isolation and performance in networks are essentially unrelated. Our results suggest that the performance of various, common boundedly-rational strategies depends crucially upon the social context (if any) in which such strategies are to be employed.
Similar content being viewed by others
References
Argiento R, Pemantle R, Skyrms B, Volkov S (2009) Learning to signal: analysis of a micro-level reinforcement model. Stoch Process Appl 119(2):373–390
Agrawal R (1995) Sample mean based index policies with O(log n) regret for the multi-armed bandit problem. Adv Appl Probab 27(4): 1054–1078
Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Mach Learn 47: 235–256
Bala V, Goyal S (2008) Learning in networks. In: Benhabib J, Bisin A, Jackson MO (eds) Handbook of mathematical economics. Princeton University Press, Princeton
Baron J,Ritov I (2004) Omission bias, individual differences, and normality.Org BehavHum Decis Process 94:74–85
Beggs A (2005) On the convergence of reinforcement learning. J Econ Theory 122: 1–36
Berry DA, Fristedt B (1985) Bandit problems: sequential allocation of experiments. Chapman and Hall, chris
Bertsimas D, Tsitsiklis J (1993) Simulated annealing. Stat Sci 8(1): 10–15
Bolton P, Harris C (1999) Strategic experimentation. Econometrica 67(2): 349–374
Branke J, Meisel S, Schmidt C (2008) Simulated annealing in the presence of noise. J Heuristics 14(6): 627–654
Cesa-Bianchi N, Lugosi G (2006) Prediction, learning, and games. Cambridge University Press, chris
Ellison G, Fudenberg D (1993) Rules of thumb for social learning. J Polit Econ 101(4): 612–643
Hong L, Page S (2001) Problem solving by heterogeneous agents. J Econ Theory 97(1): 123–163
Hong L, Page S (2004) Groups of diverse problem solvers can outperform groups of high-ability problem solvers. Proc Natl Acad Sci 101(46): 16385–16389
Hopkins E (2002) Two competing models of how people learn in games. Econometrica 70(6): 2141–2166
Hopkins E, Posch M (2005) Attainability of boundary points under reinforcement learning. Games Econ Behav 53(1): 1105
Huttegger S (2011) Carnapian inductive logic for two-armed bandits
Huttegger S, Skyrms B (2008) Emergence of information transfer by inductive learning. Studia Logica 89: 2376
Keller G, Rady S, Cripps M (2005) Strategic experimentation with exponential bandits. Econometrica 73(1): 39
Kuhlman MD, Marshello AF (1975) Individual differences in game motivation as moderators of preprogrammed strategy effects in prisoner’s dilemma. J Pers Soc Psychol 32(5): 922–931
Mayo-Wilson C, Zollman K, Danks D (2010) Wisdom of the crowds vs. groupthink: connections between individual and group epistemology. Carnegie Mellon University, Department of Philosophy. Technical Report No. 187
Roth A, Erev I (1995) Learning in extensive-form games: experimental data and simple dynamic models in the intermediate term. Games Econ Behav 8: 164–212
Skyrms B, Pemantle R (2004) Network formation by reinforcement learning: the long and medium run. Math Soc Sci 48(3): 315–327
Stanovich KE, West RF (1998) Individual differences in rational thought. J Exp Psychol Gen 127(2): 161–188
Zollman K (2009) The epistemic benefit of transient diversity. Erkenntnis 72(1): 17–35
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Mayo-Wilson, C., Zollman, K. & Danks, D. Wisdom of crowds versus groupthink: learning in groups and in isolation. Int J Game Theory 42, 695–723 (2013). https://doi.org/10.1007/s00182-012-0329-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00182-012-0329-7