Skip to main content

Scalarized Lower Upper Confidence Bound Algorithm

  • Conference paper
  • First Online:
Learning and Intelligent Optimization (LION 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8994))

Included in the following conference series:

  • 1005 Accesses

Abstract

Multi-objective evolutionary optimisation algorithms and stochastic multi-armed bandits techniques are combined in designing stochastic multi-objective multi-armed bandits (MOMAB) with an efficient exploration and exploitation trade-off. Lower upper confidence bound (LUCB) focuses on sampling the arms that are most probable to be misclassified (i.e., optimal or suboptimal arms) in order to identify the set of best arms aka the Pareto front. Our scalarized multi-objective LUCB (sMO-LUCB) is an adaptation of LUCB to reward vectors. Preliminary empirical results show good performance of the proposed algorithm on a bi-objective environment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite time analysis of the multiarmed bandit problem. J. Mach. Learn. 47(2/3), 235–256 (2002)

    Article  MATH  Google Scholar 

  • Drugan, M., Nowe, A.: Designing multi-objective multi-armed bandits: a study. In: Proceedings of International Joint Conference of Neural Networks (IJCNN) (2013)

    Google Scholar 

  • Drugan, M., Nowe, A.: Scalarization based pareto optimal set of arms identification algorithms. In: Proceedings of International Joint Conference of Neural Networks (IJCNN) (2014)

    Google Scholar 

  • Kalyanakrishnan, S., Tewari, A., Auer, P., Stone, P.: PAC subset selection in stochastic multi-armed bandits. In: Proceedings of International Conference on Machine Learning (ICML) (2012)

    Google Scholar 

  • Kaufmann, E., Kalyanakrishnan, S.: Information complexity in bandit subset selection. In: Proceedings of COLT, pp. 228–251 (2013)

    Google Scholar 

  • Roijers, D.M., Vamplew, P., Whiteson, S., Dazeley, R.: A survey of multi-objective sequential decision-making. J. Artif. Intell. Res. (JAIR) 48, 67–113 (2013)

    MATH  MathSciNet  Google Scholar 

Download references

Acknowledgements

Madalina M. Drugan was supported by the IWT-SBO project PERPETUAL (gr. nr. 110041) and FWO project “Multi-criteria RL” (gr. nr. G. 087814N).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mădălina M. Drugan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Drugan, M.M. (2015). Scalarized Lower Upper Confidence Bound Algorithm. In: Dhaenens, C., Jourdan, L., Marmion, ME. (eds) Learning and Intelligent Optimization. LION 2015. Lecture Notes in Computer Science(), vol 8994. Springer, Cham. https://doi.org/10.1007/978-3-319-19084-6_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-19084-6_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-19083-9

  • Online ISBN: 978-3-319-19084-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics