Skip to main content

Posterior Tracking Algorithm for Multi-objective Classification Bandits

  • Conference paper
  • First Online:
AI 2024: Advances in Artificial Intelligence (AI 2024)

Abstract

A classification bandit problem is a kind of pure exploration K-armed bandit problems in which a given set of K arms must be judged whether it contains at least L good arms or not with probability at least \(1-\delta \) for a given positive integer \(L(\le K)\) and \(\delta >0\), by drawing as small number of arms as possible, where an arm is good if and only if its expected reward \(\mu \) is at least a given threshold \(\xi \). To apply algorithms for this problem to more diverse real-world problems, we extend the problem with one dimensional rewards to that with multi-dimensional rewards by defining good arms as arms whose i-th dimensional expected reward \(\mu _i\) is at least given threshold \(\xi _i\) for all the dimensions i. We also extend P-Tracking algorithm, which is reported to be a best performer for the original one-dimensional-reward problem, to that for our multi-dimensional-reward problem. Our results using numerical simulations demonstrate the superiority of the extended P-Tracking algorithm in sample efficiency compared to extended other existing algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. crepon, e., Garivier, A., M Koolen, W.: Sequential learning of the Pareto front for multi-objective bandits. In: Dasgupta, S., Mandt, S., Li, Y. (eds.) Proceedings of The 27th International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol. 238, pp. 3583–3591. PMLR (2024)

    Google Scholar 

  2. Degenne, R., Koolen, W.M.: Pure exploration with multiple correct answers. In: Advances in Neural Information Processing Systems (2019)

    Google Scholar 

  3. Drugan, M., Nowe, A.: Designing multi-objective multi-armed bandits algorithms: a study. In: The 2013 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2013)

    Google Scholar 

  4. Even-Dar, E., Mannor, S., Mansour, Y.: Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems. J. Mach. Learn. Res. 7(39), 1079–1105 (2006)

    MathSciNet  Google Scholar 

  5. Garivier, A., Kaufmann, E.: Optimal best arm identification with fixed confidence. In: 29th Annual Conference on Learning Theory, pp. 998–1027 (2016)

    Google Scholar 

  6. Kano, H., Honda, J., Sakamaki, K., Matsuura, K., Nakamura, A., Sugiyama, M.: Good arm identification via bandit feedback. Mach. Learn. 108(5), 721–745 (2019). https://doi.org/10.1007/s10994-019-05784-4

    Article  MathSciNet  Google Scholar 

  7. Kaufmann, E., Koolen, W.M.: Mixture martingales revisited with applications to sequential tests and confidence intervals. J. Mach. Learn. Res. 22(246), 1–44 (2021)

    MathSciNet  Google Scholar 

  8. Kone, C., Kaufmann, E., Richert, L.: Adaptive algorithms for relaxed pareto set identification (2023)

    Google Scholar 

  9. Locatelli, A., Gutzeit, M., Carpentier, A.: An optimal algorithm for the thresholding bandit problem. In: The 33rd International Conference on Machine Learning, pp. 1690–1698 (2016)

    Google Scholar 

  10. Tabata, K., Komiyama, J., Nakamura, A., Komatsuzaki, T.: Posterior tracking algorithm for classification bandits. In: The 26th International Conference on Artificial Intelligence and Statistics, pp. 10994–11022 (2023)

    Google Scholar 

  11. Thompson, W.R.: On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3-4), 285–294 (1933)

    Google Scholar 

Download references

Acknowledgement

This work was supported by JSPS KAKENHI Grant Number JP24H00685.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Atsuyoshi Nakamura .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Suzuki, R., Nakamura, A. (2025). Posterior Tracking Algorithm for Multi-objective Classification Bandits. In: Gong, M., Song, Y., Koh, Y.S., Xiang, W., Wang, D. (eds) AI 2024: Advances in Artificial Intelligence. AI 2024. Lecture Notes in Computer Science(), vol 15443. Springer, Singapore. https://doi.org/10.1007/978-981-96-0351-0_7

Download citation

  • DOI: https://doi.org/10.1007/978-981-96-0351-0_7

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-96-0350-3

  • Online ISBN: 978-981-96-0351-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics