Skip to main content
Log in

PMDRL: Pareto-front-based multi-objective deep reinforcement learning

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

Most reinforcement learning research aims to optimize agents’ policies for a single objective. However, many real-world applications are inherently characterized by the presence of multiple, possibly conflicting, objectives. As a generalization of standard reinforcement learning approaches, multi-objective reinforcement learning addresses the demand for trade-offs between competing objectives. Instead of using single policy techniques, which involve various pieces of heuristic information such as reward shaping, we propose a novel reinforcement learning method that learns a policy without preference. We argue for the combination of Pareto Optimality theory and the deep Q network as a powerful tool to avoid constructing a synthetic reward function. This method is applied to reach a non-dominated sorting, defined as the Pareto front set, computed simultaneously without assuming any other weighted function or a linear procedure to select an action. We provide theoretical guarantees of our proposed method in the Grid World experiment. Experiments on multi-objective Cartpole demonstrate that our approach exhibits better performance, quick convergence, relatively good stability, and more diverse solutions than the traditional multi-objective deep Q network.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Ahmadzadeh S, Kormushev P, Caldwell D (2014) Multi-objective reinforcement learning for AUV thruster failure recovery. In: IEEE symposium on adaptive dynamic programming and reinforcement learning, pp 1–8.

  • Diaz-Balteiro L, Romero C (2008) Making forestry decisions with multiple criteria: a review and an assessment. For Ecol Manag 255(8–9):3222–3241

    Article  Google Scholar 

  • Feng Y, Yang X, Cheng G (2018) Stability in mean for multi-dimensional uncertain differential equation. Soft Comput 22(17):5783–5789

    Article  MATH  Google Scholar 

  • Feng Y, Dai L, Gao J, Cheng G (2020a) Uncertain pursuit-evasion game. Soft Comput 24(4):2425–2429

    Article  MATH  Google Scholar 

  • Feng Y, Shi W, Cheng G, Huang J, Liu Z (2020b) Benchmarking framework for command and control mission planning under uncertain environment. Soft Comput 24(4):2463–2478

    Article  Google Scholar 

  • Gardi A, Sabatini R, Marino M, Kistan T (2016) Multi-objective 4d trajectory optimization for online strategic and tactical air traffic management. Sustainable aviation. Springer, Cham, pp 185–200

    Chapter  Google Scholar 

  • Handa H (2009) Solving multi-objective reinforcement learning problems by EDA-RL-acquisition of various strategies. In: 2009 ninth international conference on intelligent systems design and applications, pp 426–431

  • Liu S, Feng Y, Wu K, Cheng G, Huang J, Liu Z (2021a) Graph-attention-based casual discovery with trust region-navigated clipping policy optimization. IEEE Trans Cybern. https://doi.org/10.1109/TCYB.2021.3116762

    Article  Google Scholar 

  • Liu Y, Wang Y, Feng Y, Wu Y (2021b) Neural network-based adaptive boundary control of a flexible riser with input deadzone and output constraint. IEEE Trans Cybern. https://doi.org/10.1109/TCYB.2021.3102160

    Article  Google Scholar 

  • Lizotte D, Bowling M, Murphy S (2010) Efficient reinforcement learning with multiple reward functions for randomized controlled trial analysis. In: Proceedings of the 27th international conference on machine learning, pp 695–702

  • Lizotte D, Bowling M, Murphy S (2012) Linear fitted-q iteration with multiple reward functions. J Mach Learn Res 13(1):3253–3295

    MathSciNet  MATH  Google Scholar 

  • Mnih V, Kavukcuoglu K, Silver D, Rusu A, Veness J, Bellemare M, Graves A, Riedmiller M, Fidjeland A, Ostrovski G (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533

    Article  Google Scholar 

  • Parisi S, Pirotta M, Peters J (2017) Manifold-based multi-objective policy search with sample reuse. Neurocomputing 263:3–14

    Article  Google Scholar 

  • Pirotta M, Parisi S, Restelli M (2015) Multi-objective reinforcement learning with continuous pareto frontier approximation. In: Twenty-ninth AAAI conference on artificial intelligence

  • Ruiz-Montiel M, Mandow L, Pérez-de-la-Cruz J (2017) A temporal difference method for multi-objective reinforcement learning. Neurocomputing 263:15–25

    Article  Google Scholar 

  • Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. Mach Learn. https://doi.org/10.48550/arXiv.1707.06347

    Article  Google Scholar 

  • Shen B, Yang Y, Feng Y, Zhou Z (2021) A generalized construction of mutually orthogonal complementary sequence sets with non-power-of-two lengths. IEEE Trans Commun 69:4247–4253

    Article  Google Scholar 

  • Van Moffaert K, Nowé A (2014) Multi-objective reinforcement learning using sets of pareto dominating policies. J Mach Learn Res 15(1):3483–3512

    MathSciNet  MATH  Google Scholar 

  • Watkins C, Dayan P (1992) Q-Learning. Mach Learn 8(3–4):279–292

    Article  MATH  Google Scholar 

  • Wu G, Fan M, Shi J, Feng Y (2021) Reinforcement learning based truck-and-drone coordinated delivery. IEEE Trans Artif Intell. https://doi.org/10.1109/TAI.2021.3087666

    Article  Google Scholar 

  • Zhang M, Filippone A, Bojdo N (2018) Multi-objective optimisation of aircraft departure trajectories. Aerosp Sci Technol 79:37–47

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yanghe Feng.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, F., Huang, H., Shi, W. et al. PMDRL: Pareto-front-based multi-objective deep reinforcement learning. J Ambient Intell Human Comput 14, 12663–12672 (2023). https://doi.org/10.1007/s12652-022-04232-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-022-04232-x

Keywords

Navigation