Decision Poisson: From Universal Gravitation to Offline Reinforcement Learning

Cai, Heqiu; Zhang, Zhanao; Yao, Zhicong; Mo, Kanghua; Chen, Dixuan; Yan, Hongyang

doi:10.1007/978-981-99-9785-5_31

Heqiu Cai¹⁰,
Zhanao Zhang¹¹,
Zhicong Yao¹⁰,
Kanghua Mo¹⁰,
Dixuan Chen¹⁰ &
…
Hongyang Yan¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14509))

Included in the following conference series:

International Conference on Artificial Intelligence Security and Privacy

744 Accesses

Abstract

Viewing offline reinforcement learning (RL) through the lens of conditional generative modeling has gradually become more accepted by researchers as a novel sequence modeling approach. Diffusion models have many advantages as state-of-the-art methods, but their repeated forward and reverse diffusion steps can be computationally demanding for large, high-dimensional data. Here we develop a new policy for offline RL based on Poisson flow generative modeling that does not rely on Gaussian assumptions. Our method achieves improved evaluation metrics, faster sample generation, and increased robustness to hyperparameters and model architectures. This also enables probing the significance of the underlying framework for offline sequence modeling. Ultimately, on D4rl and Minari benchmarks, our method matches state-of-the-art performance with fewer resources, further validating conditional generative modeling for decision tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Diffusion Models as Optimizers for Efficient Planning in Offline RL

Reinforcement Learning

Offline reinforcement learning in high-dimensional stochastic environments

Article Open access 11 October 2023

References

Agarwal, R., Schuurmans, D., Norouzi, M.: An optimistic perspective on offline reinforcement learning. In: International Conference on Machine Learning, pp. 104–114. PMLR
Google Scholar
Ajay, A., Du, Y., Gupta, A., Tenenbaum, J., Jaakkola, T., Agrawal, P.: Is conditional generative modeling all you need for decision-making?
Google Scholar
Chen, L., et al.: Decision transformer: Reinforcement learning via sequence modeling. 34, pp. 15084–15097
Google Scholar
Chua, K., Calandra, R., McAllister, R., Levine, S.: Deep reinforcement learning in a handful of trials using probabilistic dynamics models. Advances in neural information processing systems, 31 (2018)
Google Scholar
Dhariwal, P., Nichol, A.: Diffusion models beat gans on image synthesis. 34, pp. 8780–8794
Google Scholar
Fu, J., Kumar, A., Nachum, O., Tucker, G., Levine, S.: D4rl: datasets for deep data-driven reinforcement learning
Google Scholar
Fujimoto, S., Meger, D., Precup, D.: Off-policy deep reinforcement learning without exploration. In: International Conference on Machine Learning, pp. 2052–2062. PMLR (2019)
Google Scholar
Giuliari, F., Hasan, I., Cristani, M., Galasso, F.: Transformer networks for trajectory forecasting. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 10335–10342. IEEE
Google Scholar
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. 33, 6840–6851
Google Scholar
Huang, S., et al.: Diffusion-based generation, optimization, and planning in 3d scenes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16750–16761
Google Scholar
Janner, M., Du, Y., Tenenbaum, J.B., Levine, S.: Planning with diffusion for flexible behavior synthesis
Google Scholar
Kidambi, R., Rajeswaran, A., Netrapalli, P., Joachims, T.: Morel: model-based offline reinforcement learning. Adv. Neural. Inf. Process. Syst. 33, 21810–21823 (2020)
Google Scholar
Kostrikov, I., Nair, A., Levine, S.: Offline reinforcement learning with implicit q-learning
Google Scholar
Kumar, A., Zhou, A., Tucker, G., Levine, S.: Conservative q-learning for offline reinforcement learning. 33, pp. 1179–1191
Google Scholar
Laroche, R., Trichelair, P., Des Combes, R.T.: Safe policy improvement with baseline bootstrapping. In International conference on machine learning, pp. 3652–3661. PMLR (2019)
Google Scholar
Peebles, W., Xie, S.: Scalable diffusion models with transformers
Google Scholar
Prudencio, R.F., ROA Maximo, M., Colombini, E.L.: A survey on offline reinforcement learning: Taxonomy, review, and open problems. IEEE
Google Scholar
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10684–10695
Google Scholar
Sun, Y., et al.: Retentive network: a successor to transformer for large language models
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction. MIT press (2018)
Google Scholar
Tesauro, G., et al.: Temporal difference learning and td-gammon. Commun. ACM 38(3), 58–68 (1995)
Article Google Scholar
Xu, Y., Liu, Z., Tegmark, M., Jaakkola, T.: Poisson flow generative models. 35, pp. 16782–16795
Google Scholar
Yang, L., et al.: Diffusion models: a comprehensive survey of methods and applications
Google Scholar
Yang, S., Nachum, O., Du, Y., Wei, J., Abbeel, P., Schuurmans, D.: Foundation models for decision making: Problems, methods, and opportunities
Google Scholar
Zhong, Z., et al.: Guided conditional diffusion for controllable traffic simulation. In: 2023 IEEE International Conference on Robotics and Automation (ICRA), pp. 3560–3566. IEEE
Google Scholar

Download references

Acknowledgment

The authors gratefully acknowledge the financial supports by the Guangzhou basic and applied basic research Project (2023A04J1725); Funded by National Natural Science Foundation of China (No.62102107).

Author information

Authors and Affiliations

Guangzhou University, Guangzhou, 510006, China
Heqiu Cai, Zhicong Yao, Kanghua Mo, Dixuan Chen & Hongyang Yan
Nanjing University, Nanjing, 210093, China
Zhanao Zhang

Authors

Heqiu Cai
View author publications
You can also search for this author in PubMed Google Scholar
Zhanao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhicong Yao
View author publications
You can also search for this author in PubMed Google Scholar
Kanghua Mo
View author publications
You can also search for this author in PubMed Google Scholar
Dixuan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Hongyang Yan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongyang Yan .

Editor information

Editors and Affiliations

Rutgers University, Newark, NJ, USA
Jaideep Vaidya
Tampere University, Tampere, Finland
Moncef Gabbouj
Guangzhou University, Guangzhou, China
Jin Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cai, H., Zhang, Z., Yao, Z., Mo, K., Chen, D., Yan, H. (2024). Decision Poisson: From Universal Gravitation to Offline Reinforcement Learning. In: Vaidya, J., Gabbouj, M., Li, J. (eds) Artificial Intelligence Security and Privacy. AIS&P 2023. Lecture Notes in Computer Science, vol 14509. Springer, Singapore. https://doi.org/10.1007/978-981-99-9785-5_31

Download citation

DOI: https://doi.org/10.1007/978-981-99-9785-5_31
Published: 04 February 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-9784-8
Online ISBN: 978-981-99-9785-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Decision Poisson: From Universal Gravitation to Offline Reinforcement Learning