PMDRL: Pareto-front-based multi-objective deep reinforcement learning

Yang, Fangjie; Huang, Honglan; Shi, Wei; Ma, Yang; Feng, Yanghe; Cheng, Guangquan; Liu, Zhong

doi:10.1007/s12652-022-04232-x

PMDRL: Pareto-front-based multi-objective deep reinforcement learning

Original Research
Published: 26 July 2022

Volume 14, pages 12663–12672, (2023)
Cite this article

Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Fangjie Yang¹^na1,
Honglan Huang¹^na1,
Wei Shi¹^na1,
Yang Ma¹^na1,
Yanghe Feng ORCID: orcid.org/0000-0002-1039-9735¹,
Guangquan Cheng¹ &
…
Zhong Liu¹

570 Accesses
Explore all metrics

Abstract

Most reinforcement learning research aims to optimize agents’ policies for a single objective. However, many real-world applications are inherently characterized by the presence of multiple, possibly conflicting, objectives. As a generalization of standard reinforcement learning approaches, multi-objective reinforcement learning addresses the demand for trade-offs between competing objectives. Instead of using single policy techniques, which involve various pieces of heuristic information such as reward shaping, we propose a novel reinforcement learning method that learns a policy without preference. We argue for the combination of Pareto Optimality theory and the deep Q network as a powerful tool to avoid constructing a synthetic reward function. This method is applied to reach a non-dominated sorting, defined as the Pareto front set, computed simultaneously without assuming any other weighted function or a linear procedure to select an action. We provide theoretical guarantees of our proposed method in the Grid World experiment. Experiments on multi-objective Cartpole demonstrate that our approach exhibits better performance, quick convergence, relatively good stability, and more diverse solutions than the traditional multi-objective deep Q network.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Spider wasp optimizer: a novel meta-heuristic optimization algorithm

Article 13 March 2023

An exhaustive review of the metaheuristic algorithms for search and optimization: taxonomy, applications, and open challenges

Article 09 April 2023

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

References

Ahmadzadeh S, Kormushev P, Caldwell D (2014) Multi-objective reinforcement learning for AUV thruster failure recovery. In: IEEE symposium on adaptive dynamic programming and reinforcement learning, pp 1–8.
Diaz-Balteiro L, Romero C (2008) Making forestry decisions with multiple criteria: a review and an assessment. For Ecol Manag 255(8–9):3222–3241
Article Google Scholar
Feng Y, Yang X, Cheng G (2018) Stability in mean for multi-dimensional uncertain differential equation. Soft Comput 22(17):5783–5789
Article MATH Google Scholar
Feng Y, Dai L, Gao J, Cheng G (2020a) Uncertain pursuit-evasion game. Soft Comput 24(4):2425–2429
Article MATH Google Scholar
Feng Y, Shi W, Cheng G, Huang J, Liu Z (2020b) Benchmarking framework for command and control mission planning under uncertain environment. Soft Comput 24(4):2463–2478
Article Google Scholar
Gardi A, Sabatini R, Marino M, Kistan T (2016) Multi-objective 4d trajectory optimization for online strategic and tactical air traffic management. Sustainable aviation. Springer, Cham, pp 185–200
Chapter Google Scholar
Handa H (2009) Solving multi-objective reinforcement learning problems by EDA-RL-acquisition of various strategies. In: 2009 ninth international conference on intelligent systems design and applications, pp 426–431
Liu S, Feng Y, Wu K, Cheng G, Huang J, Liu Z (2021a) Graph-attention-based casual discovery with trust region-navigated clipping policy optimization. IEEE Trans Cybern. https://doi.org/10.1109/TCYB.2021.3116762
Article Google Scholar
Liu Y, Wang Y, Feng Y, Wu Y (2021b) Neural network-based adaptive boundary control of a flexible riser with input deadzone and output constraint. IEEE Trans Cybern. https://doi.org/10.1109/TCYB.2021.3102160
Article Google Scholar
Lizotte D, Bowling M, Murphy S (2010) Efficient reinforcement learning with multiple reward functions for randomized controlled trial analysis. In: Proceedings of the 27th international conference on machine learning, pp 695–702
Lizotte D, Bowling M, Murphy S (2012) Linear fitted-q iteration with multiple reward functions. J Mach Learn Res 13(1):3253–3295
MathSciNet MATH Google Scholar
Mnih V, Kavukcuoglu K, Silver D, Rusu A, Veness J, Bellemare M, Graves A, Riedmiller M, Fidjeland A, Ostrovski G (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533
Article Google Scholar
Parisi S, Pirotta M, Peters J (2017) Manifold-based multi-objective policy search with sample reuse. Neurocomputing 263:3–14
Article Google Scholar
Pirotta M, Parisi S, Restelli M (2015) Multi-objective reinforcement learning with continuous pareto frontier approximation. In: Twenty-ninth AAAI conference on artificial intelligence
Ruiz-Montiel M, Mandow L, Pérez-de-la-Cruz J (2017) A temporal difference method for multi-objective reinforcement learning. Neurocomputing 263:15–25
Article Google Scholar
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. Mach Learn. https://doi.org/10.48550/arXiv.1707.06347
Article Google Scholar
Shen B, Yang Y, Feng Y, Zhou Z (2021) A generalized construction of mutually orthogonal complementary sequence sets with non-power-of-two lengths. IEEE Trans Commun 69:4247–4253
Article Google Scholar
Van Moffaert K, Nowé A (2014) Multi-objective reinforcement learning using sets of pareto dominating policies. J Mach Learn Res 15(1):3483–3512
MathSciNet MATH Google Scholar
Watkins C, Dayan P (1992) Q-Learning. Mach Learn 8(3–4):279–292
Article MATH Google Scholar
Wu G, Fan M, Shi J, Feng Y (2021) Reinforcement learning based truck-and-drone coordinated delivery. IEEE Trans Artif Intell. https://doi.org/10.1109/TAI.2021.3087666
Article Google Scholar
Zhang M, Filippone A, Bojdo N (2018) Multi-objective optimisation of aircraft departure trajectories. Aerosp Sci Technol 79:37–47
Article Google Scholar

Download references

Author information

Fangjie Yang, Honglan Huang, Wei Shi, Yang Ma these authors contributed equally to this work and should be considered co-first authors.

Authors and Affiliations

College of Systems Engineering, National University of Defense Technology, Changsha, China
Fangjie Yang, Honglan Huang, Wei Shi, Yang Ma, Yanghe Feng, Guangquan Cheng & Zhong Liu

Authors

Fangjie Yang
View author publications
You can also search for this author in PubMed Google Scholar
Honglan Huang
View author publications
You can also search for this author in PubMed Google Scholar
Wei Shi
View author publications
You can also search for this author in PubMed Google Scholar
Yang Ma
View author publications
You can also search for this author in PubMed Google Scholar
Yanghe Feng
View author publications
You can also search for this author in PubMed Google Scholar
Guangquan Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Zhong Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yanghe Feng.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, F., Huang, H., Shi, W. et al. PMDRL: Pareto-front-based multi-objective deep reinforcement learning. J Ambient Intell Human Comput 14, 12663–12672 (2023). https://doi.org/10.1007/s12652-022-04232-x

Download citation

Received: 12 December 2021
Accepted: 22 June 2022
Published: 26 July 2022
Issue Date: September 2023
DOI: https://doi.org/10.1007/s12652-022-04232-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PMDRL: Pareto-front-based multi-objective deep reinforcement learning

Abstract

Access this article

Similar content being viewed by others

Spider wasp optimizer: a novel meta-heuristic optimization algorithm

An exhaustive review of the metaheuristic algorithms for search and optimization: taxonomy, applications, and open challenges

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

PMDRL: Pareto-front-based multi-objective deep reinforcement learning

Abstract

Access this article

Similar content being viewed by others

Spider wasp optimizer: a novel meta-heuristic optimization algorithm

An exhaustive review of the metaheuristic algorithms for search and optimization: taxonomy, applications, and open challenges

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation