research-article

Open access

Information spillover effect on human exploratory behavior in contextual multi-armed bandit problem

Authors:

Shinji Nakazato,

Tetsuya ShimokawaAuthors Info & Claims

WSSE '23: Proceedings of the 2023 5th World Symposium on Software Engineering

Pages 128 - 134

https://doi.org/10.1145/3631991.3632010

Published: 26 December 2023 Publication History

All formats PDF

Abstract

Recently, the upper confidence bound (UCB) strategy, which combines belief updating by the Gaussian Process, has received much attention as a model of human vast space exploratory behavior. However, a major drawback of this model is that it retains the independence from irrelevant alternatives (IIA) property. This property implies that the evaluation of one alternative/arm is determined independently of its relationship with other alternatives/arms, eliminating the information spillover effect. Specifically, in the context of contextual bandit, this property seems to be a strong restriction. In this study, we first present an empirical example, in which the IIA property does not hold. Next, we propose a modification of the UCB model, in which the search bonus is given by the information gain from the alternatives rather than the uncertainty of the alternatives. The information gain is widely known as an efficient search criterion in the field of active learning and it considers the information spillover effect. Our empirical results show that this information spillover effect is an important guideline in human vast space search.

References

[1]

George Ainslie. 1975. Specious reward: a behavioral theory of impulsiveness and impulse control.Psychological bulletin 82, 4 (1975), 463.

[2]

Bruno B Averbeck. 2015. Theory of choice in bandit, information sampling and foraging tasks. PLoS Comput Biol 11, 3 (2015), e1004164.

[3]

Jonathan D Cohen, Samuel M McClure, and Angela J Yu. 2007. Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration. Philosophical Transactions of the Royal Society B: Biological Sciences 362, 1481 (2007), 933–942.

[4]

Vincent D Costa, Andrew R Mitz, and Bruno B Averbeck. 2019. Subcortical substrates of explore-exploit decisions in primates. Neuron 103, 3 (2019), 533–545.

[5]

NAC Cressie. 1993. Statistics for spatial data. Technical Report. John Wiley.

[6]

Nathaniel D Daw, John P O’doherty, Peter Dayan, Ben Seymour, and Raymond J Dolan. 2006. Cortical substrates for exploratory decisions in humans. Nature 441, 7095 (2006), 876–879.

[7]

Gerard Debreu. 1960. Review of RD Luce, Individual choice behavior: A theoretical analysis. American Economic Review 50, 1 (1960), 186–188.

[8]

Shane Frederick, George Loewenstein, and Ted O’donoghue. 2002. Time discounting and time preference: A critical review. Journal of economic literature 40, 2 (2002), 351–401.

[9]

Samuel J Gershman. 2018. Deconstructing the human algorithms for exploration. Cognition 173 (2018), 34–42.

[10]

Samuel J Gershman. 2019. Uncertainty and exploration.Decision 6, 3 (2019), 277.

[11]

Itzhak Gilboa. 2009. Theory of decision under uncertainty. Vol. 45. Cambridge university press.

[12]

John C Gittins. 1979. Bandit processes and dynamic allocation indices. Journal of the Royal Statistical Society: Series B (Methodological) 41, 2 (1979), 148–164.

[13]

Daniel Golovin and Andreas Krause. 2011. Adaptive submodularity: Theory and applications in active learning and stochastic optimization. Journal of Artificial Intelligence Research 42 (2011), 427–486.

Digital Library

[14]

Carlos Guestrin, Andreas Krause, and Ajit Paul Singh. 2005. Near-optimal sensor placements in gaussian processes. In Proceedings of the 22nd international conference on Machine learning. 265–272.

Digital Library

[15]

Chun-Wa Ko, Jon Lee, and Maurice Queyranne. 1995. An exact algorithm for maximum entropy sampling. Operations Research 43, 4 (1995), 684–691.

Digital Library

[16]

Andreas Krause and Carlos E Guestrin. 2012. Near-optimal nonmyopic value of information in graphical models. arXiv preprint arXiv:1207.1394 (2012).

[17]

Andreas Krause, Ajit Singh, and Carlos Guestrin. 2008. Near-optimal sensor placements in Gaussian processes: Theory, efficient algorithms and empirical studies.Journal of Machine Learning Research 9, 2 (2008).

[18]

Tze Leung Lai and Herbert Robbins. 1985. Asymptotically efficient adaptive allocation rules. Advances in applied mathematics 6, 1 (1985), 4–22.

[19]

Lihong Li, Wei Chu, John Langford, and Robert E Schapire. 2010. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web. 661–670.

Digital Library

[20]

R Duncan Luce. 1959. Individual choice behavior. (1959).

[21]

Filip Matějka and Alisdair McKay. 2015. Rational inattention to discrete choices: A new foundation for the multinomial logit model. American Economic Review 105, 1 (2015), 272–98.

[22]

Daniel McFadden. 2001. Economic choices. American economic review 91, 3 (2001), 351–378.

[23]

Daniel McFadden 1973. Conditional logit analysis of qualitative choice behavior. (1973).

[24]

Naren Ramakrishnan, Chris Bailey-Kellogg, Satish Tadepalli, and Varun N Pandey. 2005. Gaussian processes for active data mining of spatial aggregates. In Proceedings of the 2005 SIAM International Conference on Data Mining. SIAM, 427–438.

[25]

Herbert Robbins. 1952. Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. 58, 5 (1952), 527–535.

[26]

Eric Schulz, Rahul Bhui, Bradley C Love, Bastien Brier, Michael T Todd, and Samuel J Gershman. 2019. Structured, uncertainty-driven exploration in real-world consumer choice. Proceedings of the National Academy of Sciences 116, 28 (2019), 13903–13908.

[27]

Eric Schulz and Samuel J Gershman. 2019. The algorithmic architecture of exploration in the human brain. Current opinion in neurobiology 55 (2019), 7–14.

[28]

Niranjan Srinivas, Andreas Krause, Sham M Kakade, and Matthias Seeger. 2009. Gaussian process optimization in the bandit setting: No regret and experimental design. arXiv preprint arXiv:0912.3995 (2009).

[29]

Niranjan Srinivas, Andreas Krause, Sham M Kakade, and Matthias W Seeger. 2012. Information-theoretic regret bounds for gaussian process optimization in the bandit setting. IEEE Transactions on Information Theory 58, 5 (2012), 3250–3265.

Digital Library

[30]

Hrvoje Stojic, Pantelis P Analytis, and Maarten Speekenbrink. 2015. Human behavior in contextual multi-armed bandit problems. In CogSci. Citeseer.

[31]

Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction. MIT press.

Digital Library

[32]

Momchil S Tomov, Van Q Truong, Rohan A Hundia, and Samuel J Gershman. 2020. Dissociable neural correlates of uncertainty underlie different exploration strategies. Nature communications 11, 1 (2020), 1–12.

[33]

Kenneth E Train. 2009. Discrete choice methods with simulation. Cambridge university press.

[34]

Charley M Wu, Eric Schulz, Maarten Speekenbrink, Jonathan D Nelson, and Björn Meder. 2018. Generalization guides human exploration in vast decision spaces. Nature human behaviour 2, 12 (2018), 915–924.

Index Terms

Information spillover effect on human exploratory behavior in contextual multi-armed bandit problem
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. Empirical studies in HCI
    2. HCI design and evaluation methods
      1. Heuristic evaluations

Recommendations

Multi-armed bandit problem with known trend

We consider a variant of the multi-armed bandit model, which we call multi-armed bandit problem with known trend, where the gambler knows the shape of the reward function of each arm but not its distribution. This new problem is motivated by different ...
Multi-armed Bandit with Additional Observations
SIGMETRICS '18

We study multi-armed bandit (MAB) problems with additional observations, where in each round, the decision maker selects an arm to play and can also observe rewards of additional arms (within a given budget) by paying certain costs. We propose ...
Knowledge Gradient for Multi-objective Multi-armed Bandit Algorithms
ICAART 2014: Proceedings of the 6th International Conference on Agents and Artificial Intelligence - Volume 1

We extend knowledge gradient (KG) policy for the multi-objective multi-armed bandit problems to efficiently

explore the Pareto optimal arms.

We consider two partial order relationships to order the mean vectors, i.e.

Pareto and scalarized functions.

...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

WSSE '23: Proceedings of the 2023 5th World Symposium on Software Engineering

September 2023

352 pages

ISBN:9798400708053

DOI:10.1145/3631991

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 December 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

WSSE 2023

WSSE 2023: 2023 The 5th World Symposium on Software Engineering

September 22 - 24, 2023

Tokyo, Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
96
Total Downloads

Downloads (Last 12 months)95
Downloads (Last 6 weeks)20

Reflects downloads up to 26 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten