Abstract
Viewing offline reinforcement learning (RL) through the lens of conditional generative modeling has gradually become more accepted by researchers as a novel sequence modeling approach. Diffusion models have many advantages as state-of-the-art methods, but their repeated forward and reverse diffusion steps can be computationally demanding for large, high-dimensional data. Here we develop a new policy for offline RL based on Poisson flow generative modeling that does not rely on Gaussian assumptions. Our method achieves improved evaluation metrics, faster sample generation, and increased robustness to hyperparameters and model architectures. This also enables probing the significance of the underlying framework for offline sequence modeling. Ultimately, on D4rl and Minari benchmarks, our method matches state-of-the-art performance with fewer resources, further validating conditional generative modeling for decision tasks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Agarwal, R., Schuurmans, D., Norouzi, M.: An optimistic perspective on offline reinforcement learning. In: International Conference on Machine Learning, pp. 104–114. PMLR
Ajay, A., Du, Y., Gupta, A., Tenenbaum, J., Jaakkola, T., Agrawal, P.: Is conditional generative modeling all you need for decision-making?
Chen, L., et al.: Decision transformer: Reinforcement learning via sequence modeling. 34, pp. 15084–15097
Chua, K., Calandra, R., McAllister, R., Levine, S.: Deep reinforcement learning in a handful of trials using probabilistic dynamics models. Advances in neural information processing systems, 31 (2018)
Dhariwal, P., Nichol, A.: Diffusion models beat gans on image synthesis. 34, pp. 8780–8794
Fu, J., Kumar, A., Nachum, O., Tucker, G., Levine, S.: D4rl: datasets for deep data-driven reinforcement learning
Fujimoto, S., Meger, D., Precup, D.: Off-policy deep reinforcement learning without exploration. In: International Conference on Machine Learning, pp. 2052–2062. PMLR (2019)
Giuliari, F., Hasan, I., Cristani, M., Galasso, F.: Transformer networks for trajectory forecasting. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 10335–10342. IEEE
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. 33, 6840–6851
Huang, S., et al.: Diffusion-based generation, optimization, and planning in 3d scenes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16750–16761
Janner, M., Du, Y., Tenenbaum, J.B., Levine, S.: Planning with diffusion for flexible behavior synthesis
Kidambi, R., Rajeswaran, A., Netrapalli, P., Joachims, T.: Morel: model-based offline reinforcement learning. Adv. Neural. Inf. Process. Syst. 33, 21810–21823 (2020)
Kostrikov, I., Nair, A., Levine, S.: Offline reinforcement learning with implicit q-learning
Kumar, A., Zhou, A., Tucker, G., Levine, S.: Conservative q-learning for offline reinforcement learning. 33, pp. 1179–1191
Laroche, R., Trichelair, P., Des Combes, R.T.: Safe policy improvement with baseline bootstrapping. In International conference on machine learning, pp. 3652–3661. PMLR (2019)
Peebles, W., Xie, S.: Scalable diffusion models with transformers
Prudencio, R.F., ROA Maximo, M., Colombini, E.L.: A survey on offline reinforcement learning: Taxonomy, review, and open problems. IEEE
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10684–10695
Sun, Y., et al.: Retentive network: a successor to transformer for large language models
Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction. MIT press (2018)
Tesauro, G., et al.: Temporal difference learning and td-gammon. Commun. ACM 38(3), 58–68 (1995)
Xu, Y., Liu, Z., Tegmark, M., Jaakkola, T.: Poisson flow generative models. 35, pp. 16782–16795
Yang, L., et al.: Diffusion models: a comprehensive survey of methods and applications
Yang, S., Nachum, O., Du, Y., Wei, J., Abbeel, P., Schuurmans, D.: Foundation models for decision making: Problems, methods, and opportunities
Zhong, Z., et al.: Guided conditional diffusion for controllable traffic simulation. In: 2023 IEEE International Conference on Robotics and Automation (ICRA), pp. 3560–3566. IEEE
Acknowledgment
The authors gratefully acknowledge the financial supports by the Guangzhou basic and applied basic research Project (2023A04J1725); Funded by National Natural Science Foundation of China (No.62102107).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Cai, H., Zhang, Z., Yao, Z., Mo, K., Chen, D., Yan, H. (2024). Decision Poisson: From Universal Gravitation to Offline Reinforcement Learning. In: Vaidya, J., Gabbouj, M., Li, J. (eds) Artificial Intelligence Security and Privacy. AIS&P 2023. Lecture Notes in Computer Science, vol 14509. Springer, Singapore. https://doi.org/10.1007/978-981-99-9785-5_31
Download citation
DOI: https://doi.org/10.1007/978-981-99-9785-5_31
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-9784-8
Online ISBN: 978-981-99-9785-5
eBook Packages: Computer ScienceComputer Science (R0)