Offline Reinforcement Learning with Diffusion-Based Behavior Cloning Term

Wang, Han; Lin, Youfang; Han, Sheng; Lv, Kai

doi:10.1007/978-3-031-40292-0_22

Han Wang¹³,
Youfang Lin¹³,
Sheng Han¹³ &
…
Kai Lv ORCID: orcid.org/0000-0001-6533-5176¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14120))

Included in the following conference series:

International Conference on Knowledge Science, Engineering and Management

1248 Accesses
2 Citations

Abstract

To address the distributional shift problem in offline reinforcement learning, policy constraint methods aim to minimize the divergence between the current policy and the behavior policy. One type of policy constraint method is the regularization constraint method, which adds regularization terms to online reinforcement learning algorithms. However, some of these regularization terms may be too restrictive and limited by the expressive power of the generative model. To relax the strict distribution matching constraint, this paper proposes TD3+diffusion-based BC algorithm, which contains a behavior cloning term incorporating the diffusion model as a regularization constraint term. The diffusion model has a strong expressive power and can achieve support set matching, which means it can learn actions with a high probability in a given state and avoid actions outside of the distribution. Our algorithm matches or surpasses state-of-the-art algorithms on most tasks in the D4RL benchmark, as shown by the experimental results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chen, H., Lu, C., Ying, C., Su, H., Zhu, J.: Offline reinforcement learning via high-fidelity generative behavior modeling. ArXiv abs/2209.14548 (2022)
Google Scholar
Emmons, S., Eysenbach, B., Kostrikov, I., Levine, S.: Rvs: what is essential for offline RL via supervised learning? In: International Conference on Learning Representations (2022)
Google Scholar
Fu, J., Kumar, A., Nachum, O., Tucker, G., Levine, S.: D4rl: datasets for deep data-driven reinforcement learning. arXiv preprint arXiv:2004.07219 (2020)
Fujimoto, S., Gu, S.S.: A minimalist approach to offline reinforcement learning. Adv. Neural. Inf. Process. Syst. 34, 20132–20145 (2021)
Google Scholar
Fujimoto, S., Hoof, H., Meger, D.: Addressing function approximation error in actor-critic methods. In: International Conference on Machine Learning, pp. 1587–1596. PMLR (2018)
Google Scholar
Fujimoto, S., Meger, D., Precup, D.: Off-policy deep reinforcement learning without exploration. In: International Conference on Machine Learning, pp. 2052–2062. PMLR (2019)
Google Scholar
Ghasemipour, S.K.S., Schuurmans, D., Gu, S.S.: Emaq: expected-max q-learning operator for simple yet effective offline and online RL. In: International Conference on Machine Learning, pp. 3682–3691. PMLR (2021)
Google Scholar
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. Adv. Neural. Inf. Process. Syst. 33, 6840–6851 (2020)
Google Scholar
Janner, M., Du, Y., Tenenbaum, J., Levine, S.: Planning with diffusion for flexible behavior synthesis. In: International Conference on Machine Learning (2022)
Google Scholar
Kidambi, R., Rajeswaran, A., Netrapalli, P., Joachims, T.: Morel: model-based offline reinforcement learning. Adv. Neural. Inf. Process. Syst. 33, 21810–21823 (2020)
Google Scholar
Kostrikov, I., Fergus, R., Tompson, J., Nachum, O.: Offline reinforcement learning with fisher divergence critic regularization. In: International Conference on Machine Learning, pp. 5774–5783. PMLR (2021)
Google Scholar
Kostrikov, I., Nair, A., Levine, S.: Offline reinforcement learning with implicit q-learning. In: International Conference on Learning Representations (2022)
Google Scholar
Kumar, A., Fu, J., Soh, M., Tucker, G., Levine, S.: Stabilizing off-policy q-learning via bootstrapping error reduction. Advances in Neural Information Processing Systems 32 (2019)
Google Scholar
Kumar, A., Zhou, A., Tucker, G., Levine, S.: Conservative q-learning for offline reinforcement learning. Adv. Neural. Inf. Process. Syst. 33, 1179–1191 (2020)
Google Scholar
Lv, K., Sheng, H., Xiong, Z., Li, W., Zheng, L.: Pose-based view synthesis for vehicles: a perspective aware method. IEEE Trans. Image Process. 29, 5163–5174 (2020)
Article MATH Google Scholar
Lv, K., Sheng, H., Xiong, Z., Li, W., Zheng, L.: Improving driver gaze prediction with reinforced attention. IEEE Trans. Multim. 23, 4198–4207 (2021)
Article Google Scholar
Lyu, J., Li, X., Lu, Z.: Double check your state before trusting it: Confidence-aware bidirectional offline model-based imagination. In: Advances in Neural Information Processing Systems (2022)
Google Scholar
Nair, A., Dalal, M., Gupta, A., Levine, S.: Accelerating online reinforcement learning with offline datasets. ArXiv abs/2006.09359 (2020)
Google Scholar
Peng, X.B., Kumar, A., Zhang, G., Levine, S.: Advantage-weighted regression: Simple and scalable off-policy reinforcement learning. ArXiv abs/1910.00177 (2019)
Google Scholar
Sheng, H., Lv, K., Liu, Y., Ke, W., Lyu, W., Xiong, Z., Li, W.: Combining pose invariant and discriminative features for vehicle reidentification. IEEE Internet Things J. 8(5), 3189–3200 (2021)
Article Google Scholar
Sinha, S., Mandlekar, A., Garg, A.: S4rl: surprisingly simple self-supervision for offline reinforcement learning in robotics. In: Conference on Robot Learning, pp. 907–917. PMLR (2022)
Google Scholar
Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S., Poole, B.: Score-based generative modeling through stochastic differential equations. In: International Conference on Learning Representations (2021)
Google Scholar
Wang, J., Li, W., Jiang, H., Zhu, G., Li, S., Zhang, C.: Offline reinforcement learning with reverse model-based imagination. Adv. Neural. Inf. Process. Syst. 34, 29420–29432 (2021)
Google Scholar
Wang, S., Wu, Z., Hu, X., Lin, Y., Lv, K.: Skill-based hierarchical reinforcement learning for target visual navigation. IEEE Trans. Multimed., 1–13 (2023). https://doi.org/10.1109/TMM.2023.3243618
Wang, Z., Hunt, J.J., Zhou, M.: Diffusion policies as an expressive policy class for offline reinforcement learning. arXiv preprint arXiv:2208.06193 (2022)
Wu, Y., Tucker, G., Nachum, O.: Behavior regularized offline reinforcement learning. arXiv preprint arXiv:1911.11361 (2019)
Yu, T., Thomas, G., Yu, L., Ermon, S., Zou, J.Y., Levine, S., Finn, C., Ma, T.: MOPO: model-based offline policy optimization. Adv. Neural. Inf. Process. Syst. 33, 14129–14142 (2020)
Google Scholar
Zhang, H., Lin, Y., Han, S., Lv, K.: Lexicographic actor-critic deep reinforcement learning for urban autonomous driving. IEEE Trans. Veh. Technol. 72(4), 4308–4319 (2023)
Article Google Scholar
Zhuang, Z., Lei, K., Liu, J., Wang, D., Guo, Y.: Behavior proximal policy optimization. arXiv preprint arXiv:2302.11312 (2023)

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grant 62206013.

Author information

Authors and Affiliations

Beijing Key Laboratory of Traffic Data Analysis and Mining, School of Computer and Information Technology, Beijing Jiaotong University, Beijing, 100044, China
Han Wang, Youfang Lin, Sheng Han & Kai Lv

Authors

Han Wang
View author publications
You can also search for this author in PubMed Google Scholar
Youfang Lin
View author publications
You can also search for this author in PubMed Google Scholar
Sheng Han
View author publications
You can also search for this author in PubMed Google Scholar
Kai Lv
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kai Lv .

Editor information

Editors and Affiliations

Peking University, Beijing, China
Zhi Jin
South China Normal University, Guangzhou, China
Yuncheng Jiang
Babeș-Bolyai University, Cluj-Napoca, Romania
Robert Andrei Buchmann
Ulster University, Belfast, UK
Yaxin Bi
Babeș-Bolyai University, Cluj-Napoca, Romania
Ana-Maria Ghiran
South China Normal University, Guangzhou, China
Wenjun Ma

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, H., Lin, Y., Han, S., Lv, K. (2023). Offline Reinforcement Learning with Diffusion-Based Behavior Cloning Term. In: Jin, Z., Jiang, Y., Buchmann, R.A., Bi, Y., Ghiran, AM., Ma, W. (eds) Knowledge Science, Engineering and Management. KSEM 2023. Lecture Notes in Computer Science(), vol 14120. Springer, Cham. https://doi.org/10.1007/978-3-031-40292-0_22

Download citation

DOI: https://doi.org/10.1007/978-3-031-40292-0_22
Published: 09 August 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-40291-3
Online ISBN: 978-3-031-40292-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Offline Reinforcement Learning with Diffusion-Based Behavior Cloning Term