Abstract
Multi-objective evolutionary optimisation algorithms and stochastic multi-armed bandits techniques are combined in designing stochastic multi-objective multi-armed bandits (MOMAB) with an efficient exploration and exploitation trade-off. Lower upper confidence bound (LUCB) focuses on sampling the arms that are most probable to be misclassified (i.e., optimal or suboptimal arms) in order to identify the set of best arms aka the Pareto front. Our scalarized multi-objective LUCB (sMO-LUCB) is an adaptation of LUCB to reward vectors. Preliminary empirical results show good performance of the proposed algorithm on a bi-objective environment.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite time analysis of the multiarmed bandit problem. J. Mach. Learn. 47(2/3), 235–256 (2002)
Drugan, M., Nowe, A.: Designing multi-objective multi-armed bandits: a study. In: Proceedings of International Joint Conference of Neural Networks (IJCNN) (2013)
Drugan, M., Nowe, A.: Scalarization based pareto optimal set of arms identification algorithms. In: Proceedings of International Joint Conference of Neural Networks (IJCNN) (2014)
Kalyanakrishnan, S., Tewari, A., Auer, P., Stone, P.: PAC subset selection in stochastic multi-armed bandits. In: Proceedings of International Conference on Machine Learning (ICML) (2012)
Kaufmann, E., Kalyanakrishnan, S.: Information complexity in bandit subset selection. In: Proceedings of COLT, pp. 228–251 (2013)
Roijers, D.M., Vamplew, P., Whiteson, S., Dazeley, R.: A survey of multi-objective sequential decision-making. J. Artif. Intell. Res. (JAIR) 48, 67–113 (2013)
Acknowledgements
Madalina M. Drugan was supported by the IWT-SBO project PERPETUAL (gr. nr. 110041) and FWO project “Multi-criteria RL” (gr. nr. G. 087814N).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Drugan, M.M. (2015). Scalarized Lower Upper Confidence Bound Algorithm. In: Dhaenens, C., Jourdan, L., Marmion, ME. (eds) Learning and Intelligent Optimization. LION 2015. Lecture Notes in Computer Science(), vol 8994. Springer, Cham. https://doi.org/10.1007/978-3-319-19084-6_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-19084-6_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19083-9
Online ISBN: 978-3-319-19084-6
eBook Packages: Computer ScienceComputer Science (R0)