research-article

MDPo: Offline Reinforcement Learning Based on Mixture Density Policy Network

Authors:

GAIIS '24: Proceedings of the 2024 International Conference on Generative Artificial Intelligence and Information Security

Pages 132 - 137

https://doi.org/10.1145/3665348.3665372

Published: 03 July 2024 Publication History

Get Access

Abstract

Offline reinforcement learning aims to empower agents to derive effective strategies from a pre-existing dataset for decision-making tasks. This learning paradigm has a broad application prospect in areas with high safety constraints, such as healthcare and robotic control. However, existing offline reinforcement learning algorithms often overlook the impact of the dataset's inherent multimodal distribution on policy optimization during the training phase, leading to compromised model performance. To tackle this challenge, we introduce a novel offline reinforcement learning algorithm based on mixture density policy network, MDPo ( Mixture Density Policy). The MDPo algorithm initially employs an expected regression loss function to train the value function. Subsequently, it constructs the policy using a mixture density network and trains the policy through a distributional constraint, ultimately learning a high-quality policy model under the combined influence of the reward signal and policy constraints. Predominantly leveraging mixture density networks, MDPo models the policy as a multimodal distribution, enhancing the policy's representational capacity to better fit the multimodal distribution of actions in the dataset, thereby increasing training process stability and improving model performance. Experiments conducted on the Antmaze task of the D4RL benchmark demonstrate that MDPo significantly outperforms existing state-of-the-art methods, also demonstrating the enhancement of training stability.

References

[1]

Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi. 2020. Proceedings of the 37th International Conference on Machine Learning, PMLR 119: 104-114, 2020.

Google Scholar

[2]

Philip J. Ball, Laura Smith, Ilya Kostrikov, Sergey Levine. 2023. Proceedings of the 40th International Conference on Machine Learning, PMLR 202: 1577-1594, 2023.

Google Scholar

[3]

David Brandfonbrener, Will Whitney, Rajesh Ranganath, Joan Bruna. 2021. Offline RL Without Off-Policy Evaluation, Advances in neural information processing systems, 34, 4933-4946, 2021.

Google Scholar

[4]

Stephen Dankwa, Wenfeng Zheng. 2019. Twin-delayed ddpg: A deep reinforcement learning technique to model a continuous movement of an intelligent robot agent. Proceedings of the 3rd international conference on vision, image and signal processing, ACM 66: 1-5 2019.

Digital Library

Google Scholar

[5]

Scott Fujimoto, David Meger, Doina Precup. 2019. Proceedings of the 36th International Conference on Machine Learning, PMLR 97: 2052-2062, 2019.

Google Scholar

[6]

Scott Fujimoto, Shixiang Gu. 2021. A minimalist approach to offline reinforcement learning. Advances in neural information processing systems, 34: 20132-20145, 2021.

Google Scholar

[7]

Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, Sergey Levine. 2018. Proceedings of the 35th International Conference on Machine Learning, PMLR 80: 1861-1870, 2018.

Google Scholar

[8]

Ilya Kostrikov, Ashvin Nair, Sergey Levine. 2021. Offline reinforcement learning with implicit q-learning. arXiv preprint arXiv: 2110.06169, 2021.

Google Scholar

[9]

Aviral Kumar, Justin Fu, Matthew Soh, George Tucker, Sergey Levine. 2019. Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction, 32, 2019.

Google Scholar

[10]

Aviral Kumar, Aurick Zhou, George Tucker, Sergey Levine. 2020. Conservative Q-Learning for Offline Reinforcement Learning. Advances in Neural Information Processing Systems, 33: 1179-1191, 2020.

Google Scholar

[11]

Rong-Jun Qin, Xingyuan Zhang, Songyi Gao, Xiong-Hui Chen, Zewen Li, Weinan Zhang, Yang Yu. 2022. NeoRL: A Near Real-World Benchmark for Offline Reinforcement Learning. Advances in Neural Information Processing Systems, 35: 24753-24765, 2022.

Google Scholar

[12]

Ashvin Nair, Abhishek Gupta, Murtaza Dalal, Sergey Levine. 2020. Awac: Accelerating online reinforcement learning with offline datasets. arXiv preprint arXiv: 2006.09359, 2020.

Google Scholar

[13]

Wenxuan Zhou, Sujay Bajracharya, David Held. 2021. Plas: Latent action space for offline reinforcement learning. Proceedings of the 2020 Conference on Robot Learning, PMLR 155: 1719-1735, 2021.

Google Scholar

Index Terms

MDPo: Offline Reinforcement Learning Based on Mixture Density Policy Network
1. Theory of computation
  1. Theory and algorithms for application domains
    1. Machine learning theory
      1. Reinforcement learning

Recommendations

Gaussian mixture density modeling, decomposition, and applications

We present a new approach to the modeling and decomposition of Gaussian mixtures by using robust statistical methods. The mixture distribution is viewed as a contaminated Gaussian density. Using this model and the model-fitting (MF) estimator, we propose ...
Mixture density estimation with group membership functions

The mixture density model has been extensively studied in the field of statistical pattern recognition. And the EM algorithm has been well known as a convenient and efficient tool to iteratively compute the maximum likelihood estimates of mixture model ...
Projection pursuit mixture density estimation

In this paper we seek a Gaussian mixture model (GMM) of an n-variate probability density function. Usually the parameters of GMMs are determined in the original n-dimensional space by optimizing a maximum likelihood (ML) criterion. A practical ...

Comments

Information & Contributors

Information

Published In

GAIIS '24: Proceedings of the 2024 International Conference on Generative Artificial Intelligence and Information Security

May 2024

439 pages

ISBN:9798400709562

DOI:10.1145/3665348

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 July 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Conference

GAIIS 2024

GAIIS 2024: 2024 International Conference on Generative Artificial Intelligence and Information Security

May 10 - 12, 2024

Kuala Lumpur, Malaysia

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
25
Total Downloads

Downloads (Last 12 months)25
Downloads (Last 6 weeks)5

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Abstract

References

Index Terms

Recommendations

Gaussian mixture density modeling, decomposition, and applications

Mixture density estimation with group membership functions

Projection pursuit mixture density estimation

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Login options

Full Access

View options

PDF

eReader

HTML Format

Share

Share this Publication link

Share on social media

Affiliations