Scalarized Lower Upper Confidence Bound Algorithm

Drugan, Mădălina M.

doi:10.1007/978-3-319-19084-6_21

Mădălina M. Drugan¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8994))

Included in the following conference series:

International Conference on Learning and Intelligent Optimization

1005 Accesses

Abstract

Multi-objective evolutionary optimisation algorithms and stochastic multi-armed bandits techniques are combined in designing stochastic multi-objective multi-armed bandits (MOMAB) with an efficient exploration and exploitation trade-off. Lower upper confidence bound (LUCB) focuses on sampling the arms that are most probable to be misclassified (i.e., optimal or suboptimal arms) in order to identify the set of best arms aka the Pareto front. Our scalarized multi-objective LUCB (sMO-LUCB) is an adaptation of LUCB to reward vectors. Preliminary empirical results show good performance of the proposed algorithm on a bi-objective environment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite time analysis of the multiarmed bandit problem. J. Mach. Learn. 47(2/3), 235–256 (2002)
Article MATH Google Scholar
Drugan, M., Nowe, A.: Designing multi-objective multi-armed bandits: a study. In: Proceedings of International Joint Conference of Neural Networks (IJCNN) (2013)
Google Scholar
Drugan, M., Nowe, A.: Scalarization based pareto optimal set of arms identification algorithms. In: Proceedings of International Joint Conference of Neural Networks (IJCNN) (2014)
Google Scholar
Kalyanakrishnan, S., Tewari, A., Auer, P., Stone, P.: PAC subset selection in stochastic multi-armed bandits. In: Proceedings of International Conference on Machine Learning (ICML) (2012)
Google Scholar
Kaufmann, E., Kalyanakrishnan, S.: Information complexity in bandit subset selection. In: Proceedings of COLT, pp. 228–251 (2013)
Google Scholar
Roijers, D.M., Vamplew, P., Whiteson, S., Dazeley, R.: A survey of multi-objective sequential decision-making. J. Artif. Intell. Res. (JAIR) 48, 67–113 (2013)
MATH MathSciNet Google Scholar

Download references

Acknowledgements

Madalina M. Drugan was supported by the IWT-SBO project PERPETUAL (gr. nr. 110041) and FWO project “Multi-criteria RL” (gr. nr. G. 087814N).

Author information

Authors and Affiliations

Artificial Intelligence Lab, Vrije Universiteit Brussels, Pleinlaan 2, 1050-B, Brussels, Belgium
Mădălina M. Drugan

Authors

Mădălina M. Drugan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mădălina M. Drugan .

Editor information

Editors and Affiliations

Lille University, Villeneuve d'Ascq, France
Clarisse Dhaenens
Lille University, Villeneuve d'Ascq, France
Laetitia Jourdan
Lille University, Villeneuve d'Ascq, France
Marie-Eléonore Marmion

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Drugan, M.M. (2015). Scalarized Lower Upper Confidence Bound Algorithm. In: Dhaenens, C., Jourdan, L., Marmion, ME. (eds) Learning and Intelligent Optimization. LION 2015. Lecture Notes in Computer Science(), vol 8994. Springer, Cham. https://doi.org/10.1007/978-3-319-19084-6_21

Download citation

DOI: https://doi.org/10.1007/978-3-319-19084-6_21
Published: 29 May 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19083-9
Online ISBN: 978-3-319-19084-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics