Abstract
Most reinforcement learning research aims to optimize agents’ policies for a single objective. However, many real-world applications are inherently characterized by the presence of multiple, possibly conflicting, objectives. As a generalization of standard reinforcement learning approaches, multi-objective reinforcement learning addresses the demand for trade-offs between competing objectives. Instead of using single policy techniques, which involve various pieces of heuristic information such as reward shaping, we propose a novel reinforcement learning method that learns a policy without preference. We argue for the combination of Pareto Optimality theory and the deep Q network as a powerful tool to avoid constructing a synthetic reward function. This method is applied to reach a non-dominated sorting, defined as the Pareto front set, computed simultaneously without assuming any other weighted function or a linear procedure to select an action. We provide theoretical guarantees of our proposed method in the Grid World experiment. Experiments on multi-objective Cartpole demonstrate that our approach exhibits better performance, quick convergence, relatively good stability, and more diverse solutions than the traditional multi-objective deep Q network.
Similar content being viewed by others
References
Ahmadzadeh S, Kormushev P, Caldwell D (2014) Multi-objective reinforcement learning for AUV thruster failure recovery. In: IEEE symposium on adaptive dynamic programming and reinforcement learning, pp 1–8.
Diaz-Balteiro L, Romero C (2008) Making forestry decisions with multiple criteria: a review and an assessment. For Ecol Manag 255(8–9):3222–3241
Feng Y, Yang X, Cheng G (2018) Stability in mean for multi-dimensional uncertain differential equation. Soft Comput 22(17):5783–5789
Feng Y, Dai L, Gao J, Cheng G (2020a) Uncertain pursuit-evasion game. Soft Comput 24(4):2425–2429
Feng Y, Shi W, Cheng G, Huang J, Liu Z (2020b) Benchmarking framework for command and control mission planning under uncertain environment. Soft Comput 24(4):2463–2478
Gardi A, Sabatini R, Marino M, Kistan T (2016) Multi-objective 4d trajectory optimization for online strategic and tactical air traffic management. Sustainable aviation. Springer, Cham, pp 185–200
Handa H (2009) Solving multi-objective reinforcement learning problems by EDA-RL-acquisition of various strategies. In: 2009 ninth international conference on intelligent systems design and applications, pp 426–431
Liu S, Feng Y, Wu K, Cheng G, Huang J, Liu Z (2021a) Graph-attention-based casual discovery with trust region-navigated clipping policy optimization. IEEE Trans Cybern. https://doi.org/10.1109/TCYB.2021.3116762
Liu Y, Wang Y, Feng Y, Wu Y (2021b) Neural network-based adaptive boundary control of a flexible riser with input deadzone and output constraint. IEEE Trans Cybern. https://doi.org/10.1109/TCYB.2021.3102160
Lizotte D, Bowling M, Murphy S (2010) Efficient reinforcement learning with multiple reward functions for randomized controlled trial analysis. In: Proceedings of the 27th international conference on machine learning, pp 695–702
Lizotte D, Bowling M, Murphy S (2012) Linear fitted-q iteration with multiple reward functions. J Mach Learn Res 13(1):3253–3295
Mnih V, Kavukcuoglu K, Silver D, Rusu A, Veness J, Bellemare M, Graves A, Riedmiller M, Fidjeland A, Ostrovski G (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533
Parisi S, Pirotta M, Peters J (2017) Manifold-based multi-objective policy search with sample reuse. Neurocomputing 263:3–14
Pirotta M, Parisi S, Restelli M (2015) Multi-objective reinforcement learning with continuous pareto frontier approximation. In: Twenty-ninth AAAI conference on artificial intelligence
Ruiz-Montiel M, Mandow L, Pérez-de-la-Cruz J (2017) A temporal difference method for multi-objective reinforcement learning. Neurocomputing 263:15–25
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. Mach Learn. https://doi.org/10.48550/arXiv.1707.06347
Shen B, Yang Y, Feng Y, Zhou Z (2021) A generalized construction of mutually orthogonal complementary sequence sets with non-power-of-two lengths. IEEE Trans Commun 69:4247–4253
Van Moffaert K, Nowé A (2014) Multi-objective reinforcement learning using sets of pareto dominating policies. J Mach Learn Res 15(1):3483–3512
Watkins C, Dayan P (1992) Q-Learning. Mach Learn 8(3–4):279–292
Wu G, Fan M, Shi J, Feng Y (2021) Reinforcement learning based truck-and-drone coordinated delivery. IEEE Trans Artif Intell. https://doi.org/10.1109/TAI.2021.3087666
Zhang M, Filippone A, Bojdo N (2018) Multi-objective optimisation of aircraft departure trajectories. Aerosp Sci Technol 79:37–47
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Yang, F., Huang, H., Shi, W. et al. PMDRL: Pareto-front-based multi-objective deep reinforcement learning. J Ambient Intell Human Comput 14, 12663–12672 (2023). https://doi.org/10.1007/s12652-022-04232-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-022-04232-x