skip to main content
10.1145/3631991.3632010acmotherconferencesArticle/Chapter ViewAbstractPublication PageswsseConference Proceedingsconference-collections
research-article
Open access

Information spillover effect on human exploratory behavior in contextual multi-armed bandit problem

Published: 26 December 2023 Publication History

Abstract

Recently, the upper confidence bound (UCB) strategy, which combines belief updating by the Gaussian Process, has received much attention as a model of human vast space exploratory behavior. However, a major drawback of this model is that it retains the independence from irrelevant alternatives (IIA) property. This property implies that the evaluation of one alternative/arm is determined independently of its relationship with other alternatives/arms, eliminating the information spillover effect. Specifically, in the context of contextual bandit, this property seems to be a strong restriction. In this study, we first present an empirical example, in which the IIA property does not hold. Next, we propose a modification of the UCB model, in which the search bonus is given by the information gain from the alternatives rather than the uncertainty of the alternatives. The information gain is widely known as an efficient search criterion in the field of active learning and it considers the information spillover effect. Our empirical results show that this information spillover effect is an important guideline in human vast space search.

References

[1]
George Ainslie. 1975. Specious reward: a behavioral theory of impulsiveness and impulse control.Psychological bulletin 82, 4 (1975), 463.
[2]
Bruno B Averbeck. 2015. Theory of choice in bandit, information sampling and foraging tasks. PLoS Comput Biol 11, 3 (2015), e1004164.
[3]
Jonathan D Cohen, Samuel M McClure, and Angela J Yu. 2007. Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration. Philosophical Transactions of the Royal Society B: Biological Sciences 362, 1481 (2007), 933–942.
[4]
Vincent D Costa, Andrew R Mitz, and Bruno B Averbeck. 2019. Subcortical substrates of explore-exploit decisions in primates. Neuron 103, 3 (2019), 533–545.
[5]
NAC Cressie. 1993. Statistics for spatial data. Technical Report. John Wiley.
[6]
Nathaniel D Daw, John P O’doherty, Peter Dayan, Ben Seymour, and Raymond J Dolan. 2006. Cortical substrates for exploratory decisions in humans. Nature 441, 7095 (2006), 876–879.
[7]
Gerard Debreu. 1960. Review of RD Luce, Individual choice behavior: A theoretical analysis. American Economic Review 50, 1 (1960), 186–188.
[8]
Shane Frederick, George Loewenstein, and Ted O’donoghue. 2002. Time discounting and time preference: A critical review. Journal of economic literature 40, 2 (2002), 351–401.
[9]
Samuel J Gershman. 2018. Deconstructing the human algorithms for exploration. Cognition 173 (2018), 34–42.
[10]
Samuel J Gershman. 2019. Uncertainty and exploration.Decision 6, 3 (2019), 277.
[11]
Itzhak Gilboa. 2009. Theory of decision under uncertainty. Vol. 45. Cambridge university press.
[12]
John C Gittins. 1979. Bandit processes and dynamic allocation indices. Journal of the Royal Statistical Society: Series B (Methodological) 41, 2 (1979), 148–164.
[13]
Daniel Golovin and Andreas Krause. 2011. Adaptive submodularity: Theory and applications in active learning and stochastic optimization. Journal of Artificial Intelligence Research 42 (2011), 427–486.
[14]
Carlos Guestrin, Andreas Krause, and Ajit Paul Singh. 2005. Near-optimal sensor placements in gaussian processes. In Proceedings of the 22nd international conference on Machine learning. 265–272.
[15]
Chun-Wa Ko, Jon Lee, and Maurice Queyranne. 1995. An exact algorithm for maximum entropy sampling. Operations Research 43, 4 (1995), 684–691.
[16]
Andreas Krause and Carlos E Guestrin. 2012. Near-optimal nonmyopic value of information in graphical models. arXiv preprint arXiv:1207.1394 (2012).
[17]
Andreas Krause, Ajit Singh, and Carlos Guestrin. 2008. Near-optimal sensor placements in Gaussian processes: Theory, efficient algorithms and empirical studies.Journal of Machine Learning Research 9, 2 (2008).
[18]
Tze Leung Lai and Herbert Robbins. 1985. Asymptotically efficient adaptive allocation rules. Advances in applied mathematics 6, 1 (1985), 4–22.
[19]
Lihong Li, Wei Chu, John Langford, and Robert E Schapire. 2010. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web. 661–670.
[20]
R Duncan Luce. 1959. Individual choice behavior. (1959).
[21]
Filip Matějka and Alisdair McKay. 2015. Rational inattention to discrete choices: A new foundation for the multinomial logit model. American Economic Review 105, 1 (2015), 272–98.
[22]
Daniel McFadden. 2001. Economic choices. American economic review 91, 3 (2001), 351–378.
[23]
Daniel McFadden 1973. Conditional logit analysis of qualitative choice behavior. (1973).
[24]
Naren Ramakrishnan, Chris Bailey-Kellogg, Satish Tadepalli, and Varun N Pandey. 2005. Gaussian processes for active data mining of spatial aggregates. In Proceedings of the 2005 SIAM International Conference on Data Mining. SIAM, 427–438.
[25]
Herbert Robbins. 1952. Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. 58, 5 (1952), 527–535.
[26]
Eric Schulz, Rahul Bhui, Bradley C Love, Bastien Brier, Michael T Todd, and Samuel J Gershman. 2019. Structured, uncertainty-driven exploration in real-world consumer choice. Proceedings of the National Academy of Sciences 116, 28 (2019), 13903–13908.
[27]
Eric Schulz and Samuel J Gershman. 2019. The algorithmic architecture of exploration in the human brain. Current opinion in neurobiology 55 (2019), 7–14.
[28]
Niranjan Srinivas, Andreas Krause, Sham M Kakade, and Matthias Seeger. 2009. Gaussian process optimization in the bandit setting: No regret and experimental design. arXiv preprint arXiv:0912.3995 (2009).
[29]
Niranjan Srinivas, Andreas Krause, Sham M Kakade, and Matthias W Seeger. 2012. Information-theoretic regret bounds for gaussian process optimization in the bandit setting. IEEE Transactions on Information Theory 58, 5 (2012), 3250–3265.
[30]
Hrvoje Stojic, Pantelis P Analytis, and Maarten Speekenbrink. 2015. Human behavior in contextual multi-armed bandit problems. In CogSci. Citeseer.
[31]
Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction. MIT press.
[32]
Momchil S Tomov, Van Q Truong, Rohan A Hundia, and Samuel J Gershman. 2020. Dissociable neural correlates of uncertainty underlie different exploration strategies. Nature communications 11, 1 (2020), 1–12.
[33]
Kenneth E Train. 2009. Discrete choice methods with simulation. Cambridge university press.
[34]
Charley M Wu, Eric Schulz, Maarten Speekenbrink, Jonathan D Nelson, and Björn Meder. 2018. Generalization guides human exploration in vast decision spaces. Nature human behaviour 2, 12 (2018), 915–924.

Index Terms

  1. Information spillover effect on human exploratory behavior in contextual multi-armed bandit problem

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      WSSE '23: Proceedings of the 2023 5th World Symposium on Software Engineering
      September 2023
      352 pages
      ISBN:9798400708053
      DOI:10.1145/3631991
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 26 December 2023

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Exploration-exploitation dilemma
      2. Human exploratory behavior
      3. Independence from irrelevant alternatives
      4. Information gain
      5. Information spillover effect
      6. Multi-armed bandit problem

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      WSSE 2023

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 96
        Total Downloads
      • Downloads (Last 12 months)95
      • Downloads (Last 6 weeks)20
      Reflects downloads up to 26 Jan 2025

      Other Metrics

      Citations

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Login options

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media