Reinforcement Learning of Pareto-Optimal Multiobjective Policies Using Steering

Vamplew, Peter; Issabekov, Rustam; Dazeley, Richard; Foale, Cameron

doi:10.1007/978-3-319-26350-2_53

Peter Vamplew¹⁵,
Rustam Issabekov¹⁵,
Richard Dazeley¹⁵ &
…
Cameron Foale¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9457))

Included in the following conference series:

Australasian Joint Conference on Artificial Intelligence

1690 Accesses
3 Citations

Abstract

There has been little research into multiobjective reinforcement learning (MORL) algorithms using stochastic or non-stationary policies, even though such policies may Pareto-dominate deterministic stationary policies. One approach is steering which forms a non-stationary combination of deterministic stationary base policies. This paper presents two new steering algorithms designed for the task of learning Pareto-optimal policies. The first algorithm (w-steering) is a direct adaptation of previous approaches to steering, and therefore requires prior knowledge of recurrent states which are guaranteed to be revisited. The second algorithm (Q-steering) eliminates this requirement. Empirical results show that both algorithms perform well when given knowledge of recurrent states, but that Q-steering provides substantial performance improvements over w-steering when this knowledge is not available.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
MORL methods based on linear scalarisation are limited to discovering policies which lie on the convex hull of the Pareto front [17]. However in the context of learning base policies this is actually advantageous as policies lying inside the convex hull will not form part of any Pareto-optimal combination of base policies [15].
2.
The other alternative is to treat no states as members of \(S_R\), but this would mean the agent would never switch base policies, thereby losing the benefits of non-stationarity.

References

Brys, T., Van Moffaert, K., Van Vaerenbergh, K., Nowé, A.: On the behaviour of scalarization methods for the engagement of a wet clutch. In: The 12th International Conference on Machine Learning and Applications. IEEE (2013)
Google Scholar
Castelletti, A., Corani, G., Rizzolli, A., Soncini-Sessa, R., Weber, E.: Reinforcement learning in the operational management of a water system. In: IFAC Workshop on Modeling and Control in Environmental Issues, pp. 325–330 (2002)
Google Scholar
Chatterjee, K., Majumdar, R., Henzinger, T.A.: Markov decision processes with multiple objectives. In: Durand, B., Thomas, W. (eds.) STACS 2006. LNCS, vol. 3884, pp. 325–336. Springer, Heidelberg (2006)
Chapter Google Scholar
Handa, H.: Solving multi-objective reinforcement learning problems by EDA-RL - acquisition of various strategies. In: Proceedings of the Ninth Internatonal Conference on Intelligent Sysems Design and Applications, pp. 426–431 (2009)
Google Scholar
Kalyanakrishnan, S., Stone, P.: An empirical analysis of value function-based and policy search reinforcement learning. In: Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems, vol. 2, pp. 749–756. International Foundation for Autonomous Agents and Multiagent Systems (2009)
Google Scholar
Lizotte, D.J., Bowling, M., Murphy, S.A.: Efficient reinforcement learning with multiple reward functions for randomized clinical trial analysis. In: 27th International Conference on Machine Learning, pp. 695–702 (2010)
Google Scholar
Mannor, S., Shimkin, N.: The steering approach for multi-criteria reinforcement learning. In: Neural Information Processing Systems, pp. 1563–1570 (2001)
Google Scholar
Mannor, S., Shimkin, N.: A geometric approach to multi-criterion reinforcement learning. J. Mach. Learn. Res. 5, 325–360 (2004)
MathSciNet MATH Google Scholar
Parisi, S., Pirotta, M., Smacchia, N., Bascetta, L., Restelli, M.: Policy gradient approaches for multi-objective sequential decision making. In: 2014 International Joint Conference on Neural Networks (IJCNN), pp. 2323–2330. IEEE (2014)
Google Scholar
Roijers, D.M., Whiteson, S., Oliehoek, F.A.: Computing convex coverage sets for multi-objective coordination graphs. In: Perny, P., Pirlot, M., Tsoukiàs, A. (eds.) ADT 2013. LNCS, vol. 8176, pp. 309–323. Springer, Heidelberg (2013)
Chapter Google Scholar
Roijers, D., Vamplew, P., Whiteson, S., Dazeley, R.: A survey of multi-objective sequential decision-making. J. Artif. Intell. Res. 48, 67–113 (2013)
MathSciNet MATH Google Scholar
Shelton, C.: Importance sampling for reinforcement learning with multiple objectives. AI Technical report 2001–003, MIT, August 2001
Google Scholar
Soh, H., Demiris, Y.: Evolving policies for multi-reward partially observable Markov decision processes (MR-POMDPs). In: Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation, GECCO 2011, pp. 713–720 (2011)
Google Scholar
Taylor, M.E., Whiteson, S., Stone, P.: Temporal difference and policy search methods for reinforcement learning: an empirical comparison. In: Proceedings of the National Conference on Artificial Intelligence, vol. 22, p. 1675 (2007)
Google Scholar
Vamplew, P., Dazeley, R., Barker, E., Kelarev, A.: Constructing stochastic mixture policies for episodic multiobjective reinforcement learning tasks. In: Nicholson, A., Li, X. (eds.) AI 2009. LNCS, vol. 5866, pp. 340–349. Springer, Heidelberg (2009)
Chapter Google Scholar
Vamplew, P., Dazeley, R., Berry, A., Dekker, E., Issabekov, R.: Empirical evaluation methods for multiobjective reinforcement learning algorithms. Mach. Learn. 84(1–2), 51–80 (2011)
Article MathSciNet Google Scholar
Vamplew, P., Yearwood, J., Dazeley, R., Berry, A.: On the limitations of scalarisation for multi-objective reinforcement learning of Pareto fronts. In: Wobcke, W., Zhang, M. (eds.) AI 2008. LNCS (LNAI), vol. 5360, pp. 372–378. Springer, Heidelberg (2008)
Chapter Google Scholar
Van Moffaert, K., Nowé, A.: Multi-objective reinforcement learning using sets of pareto dominating policies. J. Mach. Learn. Res. 15, 3483–3512 (2014)
MathSciNet MATH Google Scholar
Watkins, C.J.C.H.: Learning from delayed rewards (1989)
Google Scholar
Whiteson, S., Taylor, M.E., Stone, P.: Critical factors in the empirical performance of temporal difference and evolutionary methods for reinforcement learning. Auton. Agent. Multi-Agent Syst. 21(1), 1–35 (2010)
Article Google Scholar
Zitzler, E., Thiele, L., Laumanns, M., Fonseca, C.M., da Fonseca, V.G.: Performance assessment of multiobjective optimisers: an analysis and review. IEEE Trans. Evol. Comput. 7(2), 117–132 (2003)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Federation Learning Agents Group, Federation University Australia, Ballarat, VIC, Australia
Peter Vamplew, Rustam Issabekov, Richard Dazeley & Cameron Foale

Authors

Peter Vamplew
View author publications
You can also search for this author in PubMed Google Scholar
Rustam Issabekov
View author publications
You can also search for this author in PubMed Google Scholar
Richard Dazeley
View author publications
You can also search for this author in PubMed Google Scholar
Cameron Foale
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peter Vamplew .

Editor information

Editors and Affiliations

The University of Waikato, Hamilton, New Zealand
Bernhard Pfahringer
The Australian National University, Canberra, Aust Capital Terr, Australia
Jochen Renz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vamplew, P., Issabekov, R., Dazeley, R., Foale, C. (2015). Reinforcement Learning of Pareto-Optimal Multiobjective Policies Using Steering. In: Pfahringer, B., Renz, J. (eds) AI 2015: Advances in Artificial Intelligence. AI 2015. Lecture Notes in Computer Science(), vol 9457. Springer, Cham. https://doi.org/10.1007/978-3-319-26350-2_53

Download citation

DOI: https://doi.org/10.1007/978-3-319-26350-2_53
Published: 22 November 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26349-6
Online ISBN: 978-3-319-26350-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics