Posterior Tracking Algorithm for Multi-objective Classification Bandits

Suzuki, Riku; Nakamura, Atsuyoshi

doi:10.1007/978-981-96-0351-0_7

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 15443))

Included in the following conference series:

Australasian Joint Conference on Artificial Intelligence

231 Accesses

Abstract

A classification bandit problem is a kind of pure exploration K-armed bandit problems in which a given set of K arms must be judged whether it contains at least L good arms or not with probability at least $1-\delta $ for a given positive integer $L(\le K)$ and $\delta >0$, by drawing as small number of arms as possible, where an arm is good if and only if its expected reward $\mu $ is at least a given threshold $\xi $. To apply algorithms for this problem to more diverse real-world problems, we extend the problem with one dimensional rewards to that with multi-dimensional rewards by defining good arms as arms whose i-th dimensional expected reward $\mu _i$ is at least given threshold $\xi _i$ for all the dimensions i. We also extend P-Tracking algorithm, which is reported to be a best performer for the original one-dimensional-reward problem, to that for our multi-dimensional-reward problem. Our results using numerical simulations demonstrate the superiority of the extended P-Tracking algorithm in sample efficiency compared to extended other existing algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

crepon, e., Garivier, A., M Koolen, W.: Sequential learning of the Pareto front for multi-objective bandits. In: Dasgupta, S., Mandt, S., Li, Y. (eds.) Proceedings of The 27th International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol. 238, pp. 3583–3591. PMLR (2024)
Google Scholar
Degenne, R., Koolen, W.M.: Pure exploration with multiple correct answers. In: Advances in Neural Information Processing Systems (2019)
Google Scholar
Drugan, M., Nowe, A.: Designing multi-objective multi-armed bandits algorithms: a study. In: The 2013 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2013)
Google Scholar
Even-Dar, E., Mannor, S., Mansour, Y.: Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems. J. Mach. Learn. Res. 7(39), 1079–1105 (2006)
MathSciNet Google Scholar
Garivier, A., Kaufmann, E.: Optimal best arm identification with fixed confidence. In: 29th Annual Conference on Learning Theory, pp. 998–1027 (2016)
Google Scholar
Kano, H., Honda, J., Sakamaki, K., Matsuura, K., Nakamura, A., Sugiyama, M.: Good arm identification via bandit feedback. Mach. Learn. 108(5), 721–745 (2019). https://doi.org/10.1007/s10994-019-05784-4
Article MathSciNet Google Scholar
Kaufmann, E., Koolen, W.M.: Mixture martingales revisited with applications to sequential tests and confidence intervals. J. Mach. Learn. Res. 22(246), 1–44 (2021)
MathSciNet Google Scholar
Kone, C., Kaufmann, E., Richert, L.: Adaptive algorithms for relaxed pareto set identification (2023)
Google Scholar
Locatelli, A., Gutzeit, M., Carpentier, A.: An optimal algorithm for the thresholding bandit problem. In: The 33rd International Conference on Machine Learning, pp. 1690–1698 (2016)
Google Scholar
Tabata, K., Komiyama, J., Nakamura, A., Komatsuzaki, T.: Posterior tracking algorithm for classification bandits. In: The 26th International Conference on Artificial Intelligence and Statistics, pp. 10994–11022 (2023)
Google Scholar
Thompson, W.R.: On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3-4), 285–294 (1933)
Google Scholar

Download references

Acknowledgement

This work was supported by JSPS KAKENHI Grant Number JP24H00685.

Author information

Authors and Affiliations

Graduate School of Information Science and Technology, Hokkaido University, Sapporo, Japan
Riku Suzuki & Atsuyoshi Nakamura

Authors

Riku Suzuki
View author publications
You can also search for this author in PubMed Google Scholar
Atsuyoshi Nakamura
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Atsuyoshi Nakamura .

Editor information

Editors and Affiliations

The University of Melbourne, Parkville, VIC, Australia
Mingming Gong
The University of Adelaide, Adelaide, SA, Australia
Yiliao Song
The University of Auckland, Auckland, Auckland, New Zealand
Yun Sing Koh
La Trobe University, Bundoora, VIC, Australia
Wei Xiang
CSIRO’s Data61, Clayton, VIC, Australia
Derui Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Suzuki, R., Nakamura, A. (2025). Posterior Tracking Algorithm for Multi-objective Classification Bandits. In: Gong, M., Song, Y., Koh, Y.S., Xiang, W., Wang, D. (eds) AI 2024: Advances in Artificial Intelligence. AI 2024. Lecture Notes in Computer Science(), vol 15443. Springer, Singapore. https://doi.org/10.1007/978-981-96-0351-0_7

Download citation

DOI: https://doi.org/10.1007/978-981-96-0351-0_7
Published: 20 November 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-96-0350-3
Online ISBN: 978-981-96-0351-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Posterior Tracking Algorithm for Multi-objective Classification Bandits