Abstract
A classification bandit problem is a kind of pure exploration K-armed bandit problems in which a given set of K arms must be judged whether it contains at least L good arms or not with probability at least \(1-\delta \) for a given positive integer \(L(\le K)\) and \(\delta >0\), by drawing as small number of arms as possible, where an arm is good if and only if its expected reward \(\mu \) is at least a given threshold \(\xi \). To apply algorithms for this problem to more diverse real-world problems, we extend the problem with one dimensional rewards to that with multi-dimensional rewards by defining good arms as arms whose i-th dimensional expected reward \(\mu _i\) is at least given threshold \(\xi _i\) for all the dimensions i. We also extend P-Tracking algorithm, which is reported to be a best performer for the original one-dimensional-reward problem, to that for our multi-dimensional-reward problem. Our results using numerical simulations demonstrate the superiority of the extended P-Tracking algorithm in sample efficiency compared to extended other existing algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
crepon, e., Garivier, A., M Koolen, W.: Sequential learning of the Pareto front for multi-objective bandits. In: Dasgupta, S., Mandt, S., Li, Y. (eds.) Proceedings of The 27th International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol. 238, pp. 3583–3591. PMLR (2024)
Degenne, R., Koolen, W.M.: Pure exploration with multiple correct answers. In: Advances in Neural Information Processing Systems (2019)
Drugan, M., Nowe, A.: Designing multi-objective multi-armed bandits algorithms: a study. In: The 2013 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2013)
Even-Dar, E., Mannor, S., Mansour, Y.: Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems. J. Mach. Learn. Res. 7(39), 1079–1105 (2006)
Garivier, A., Kaufmann, E.: Optimal best arm identification with fixed confidence. In: 29th Annual Conference on Learning Theory, pp. 998–1027 (2016)
Kano, H., Honda, J., Sakamaki, K., Matsuura, K., Nakamura, A., Sugiyama, M.: Good arm identification via bandit feedback. Mach. Learn. 108(5), 721–745 (2019). https://doi.org/10.1007/s10994-019-05784-4
Kaufmann, E., Koolen, W.M.: Mixture martingales revisited with applications to sequential tests and confidence intervals. J. Mach. Learn. Res. 22(246), 1–44 (2021)
Kone, C., Kaufmann, E., Richert, L.: Adaptive algorithms for relaxed pareto set identification (2023)
Locatelli, A., Gutzeit, M., Carpentier, A.: An optimal algorithm for the thresholding bandit problem. In: The 33rd International Conference on Machine Learning, pp. 1690–1698 (2016)
Tabata, K., Komiyama, J., Nakamura, A., Komatsuzaki, T.: Posterior tracking algorithm for classification bandits. In: The 26th International Conference on Artificial Intelligence and Statistics, pp. 10994–11022 (2023)
Thompson, W.R.: On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3-4), 285–294 (1933)
Acknowledgement
This work was supported by JSPS KAKENHI Grant Number JP24H00685.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Suzuki, R., Nakamura, A. (2025). Posterior Tracking Algorithm for Multi-objective Classification Bandits. In: Gong, M., Song, Y., Koh, Y.S., Xiang, W., Wang, D. (eds) AI 2024: Advances in Artificial Intelligence. AI 2024. Lecture Notes in Computer Science(), vol 15443. Springer, Singapore. https://doi.org/10.1007/978-981-96-0351-0_7
Download citation
DOI: https://doi.org/10.1007/978-981-96-0351-0_7
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-96-0350-3
Online ISBN: 978-981-96-0351-0
eBook Packages: Computer ScienceComputer Science (R0)