skip to main content
10.1145/3665348.3665372acmotherconferencesArticle/Chapter ViewAbstractPublication PagesgaiisConference Proceedingsconference-collections
research-article

MDPo: Offline Reinforcement Learning Based on Mixture Density Policy Network

Published: 03 July 2024 Publication History

Abstract

Offline reinforcement learning aims to empower agents to derive effective strategies from a pre-existing dataset for decision-making tasks. This learning paradigm has a broad application prospect in areas with high safety constraints, such as healthcare and robotic control. However, existing offline reinforcement learning algorithms often overlook the impact of the dataset's inherent multimodal distribution on policy optimization during the training phase, leading to compromised model performance. To tackle this challenge, we introduce a novel offline reinforcement learning algorithm based on mixture density policy network, MDPo ( Mixture Density Policy). The MDPo algorithm initially employs an expected regression loss function to train the value function. Subsequently, it constructs the policy using a mixture density network and trains the policy through a distributional constraint, ultimately learning a high-quality policy model under the combined influence of the reward signal and policy constraints. Predominantly leveraging mixture density networks, MDPo models the policy as a multimodal distribution, enhancing the policy's representational capacity to better fit the multimodal distribution of actions in the dataset, thereby increasing training process stability and improving model performance. Experiments conducted on the Antmaze task of the D4RL benchmark demonstrate that MDPo significantly outperforms existing state-of-the-art methods, also demonstrating the enhancement of training stability.

References

[1]
Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi. 2020. Proceedings of the 37th International Conference on Machine Learning, PMLR 119: 104-114, 2020.
[2]
Philip J. Ball, Laura Smith, Ilya Kostrikov, Sergey Levine. 2023. Proceedings of the 40th International Conference on Machine Learning, PMLR 202: 1577-1594, 2023.
[3]
David Brandfonbrener, Will Whitney, Rajesh Ranganath, Joan Bruna. 2021. Offline RL Without Off-Policy Evaluation, Advances in neural information processing systems, 34, 4933-4946, 2021.
[4]
Stephen Dankwa, Wenfeng Zheng. 2019. Twin-delayed ddpg: A deep reinforcement learning technique to model a continuous movement of an intelligent robot agent. Proceedings of the 3rd international conference on vision, image and signal processing, ACM 66: 1-5 2019.
[5]
Scott Fujimoto, David Meger, Doina Precup. 2019. Proceedings of the 36th International Conference on Machine Learning, PMLR 97: 2052-2062, 2019.
[6]
Scott Fujimoto, Shixiang Gu. 2021. A minimalist approach to offline reinforcement learning. Advances in neural information processing systems, 34: 20132-20145, 2021.
[7]
Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, Sergey Levine. 2018. Proceedings of the 35th International Conference on Machine Learning, PMLR 80: 1861-1870, 2018.
[8]
Ilya Kostrikov, Ashvin Nair, Sergey Levine. 2021. Offline reinforcement learning with implicit q-learning. arXiv preprint arXiv: 2110.06169, 2021.
[9]
Aviral Kumar, Justin Fu, Matthew Soh, George Tucker, Sergey Levine. 2019. Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction, 32, 2019.
[10]
Aviral Kumar, Aurick Zhou, George Tucker, Sergey Levine. 2020. Conservative Q-Learning for Offline Reinforcement Learning. Advances in Neural Information Processing Systems, 33: 1179-1191, 2020.
[11]
Rong-Jun Qin, Xingyuan Zhang, Songyi Gao, Xiong-Hui Chen, Zewen Li, Weinan Zhang, Yang Yu. 2022. NeoRL: A Near Real-World Benchmark for Offline Reinforcement Learning. Advances in Neural Information Processing Systems, 35: 24753-24765, 2022.
[12]
Ashvin Nair, Abhishek Gupta, Murtaza Dalal, Sergey Levine. 2020. Awac: Accelerating online reinforcement learning with offline datasets. arXiv preprint arXiv: 2006.09359, 2020.
[13]
Wenxuan Zhou, Sujay Bajracharya, David Held. 2021. Plas: Latent action space for offline reinforcement learning. Proceedings of the 2020 Conference on Robot Learning, PMLR 155: 1719-1735, 2021.

Index Terms

  1. MDPo: Offline Reinforcement Learning Based on Mixture Density Policy Network

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    GAIIS '24: Proceedings of the 2024 International Conference on Generative Artificial Intelligence and Information Security
    May 2024
    439 pages
    ISBN:9798400709562
    DOI:10.1145/3665348
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 03 July 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    GAIIS 2024

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 25
      Total Downloads
    • Downloads (Last 12 months)25
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 17 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media