Skip to main content

Decision Poisson: From Universal Gravitation to Offline Reinforcement Learning

  • Conference paper
  • First Online:
Artificial Intelligence Security and Privacy (AIS&P 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14509))

  • 233 Accesses

Abstract

Viewing offline reinforcement learning (RL) through the lens of conditional generative modeling has gradually become more accepted by researchers as a novel sequence modeling approach. Diffusion models have many advantages as state-of-the-art methods, but their repeated forward and reverse diffusion steps can be computationally demanding for large, high-dimensional data. Here we develop a new policy for offline RL based on Poisson flow generative modeling that does not rely on Gaussian assumptions. Our method achieves improved evaluation metrics, faster sample generation, and increased robustness to hyperparameters and model architectures. This also enables probing the significance of the underlying framework for offline sequence modeling. Ultimately, on D4rl and Minari benchmarks, our method matches state-of-the-art performance with fewer resources, further validating conditional generative modeling for decision tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Agarwal, R., Schuurmans, D., Norouzi, M.: An optimistic perspective on offline reinforcement learning. In: International Conference on Machine Learning, pp. 104–114. PMLR

    Google Scholar 

  2. Ajay, A., Du, Y., Gupta, A., Tenenbaum, J., Jaakkola, T., Agrawal, P.: Is conditional generative modeling all you need for decision-making?

    Google Scholar 

  3. Chen, L., et al.: Decision transformer: Reinforcement learning via sequence modeling. 34, pp. 15084–15097

    Google Scholar 

  4. Chua, K., Calandra, R., McAllister, R., Levine, S.: Deep reinforcement learning in a handful of trials using probabilistic dynamics models. Advances in neural information processing systems, 31 (2018)

    Google Scholar 

  5. Dhariwal, P., Nichol, A.: Diffusion models beat gans on image synthesis. 34, pp. 8780–8794

    Google Scholar 

  6. Fu, J., Kumar, A., Nachum, O., Tucker, G., Levine, S.: D4rl: datasets for deep data-driven reinforcement learning

    Google Scholar 

  7. Fujimoto, S., Meger, D., Precup, D.: Off-policy deep reinforcement learning without exploration. In: International Conference on Machine Learning, pp. 2052–2062. PMLR (2019)

    Google Scholar 

  8. Giuliari, F., Hasan, I., Cristani, M., Galasso, F.: Transformer networks for trajectory forecasting. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 10335–10342. IEEE

    Google Scholar 

  9. Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. 33, 6840–6851

    Google Scholar 

  10. Huang, S., et al.: Diffusion-based generation, optimization, and planning in 3d scenes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16750–16761

    Google Scholar 

  11. Janner, M., Du, Y., Tenenbaum, J.B., Levine, S.: Planning with diffusion for flexible behavior synthesis

    Google Scholar 

  12. Kidambi, R., Rajeswaran, A., Netrapalli, P., Joachims, T.: Morel: model-based offline reinforcement learning. Adv. Neural. Inf. Process. Syst. 33, 21810–21823 (2020)

    Google Scholar 

  13. Kostrikov, I., Nair, A., Levine, S.: Offline reinforcement learning with implicit q-learning

    Google Scholar 

  14. Kumar, A., Zhou, A., Tucker, G., Levine, S.: Conservative q-learning for offline reinforcement learning. 33, pp. 1179–1191

    Google Scholar 

  15. Laroche, R., Trichelair, P., Des Combes, R.T.: Safe policy improvement with baseline bootstrapping. In International conference on machine learning, pp. 3652–3661. PMLR (2019)

    Google Scholar 

  16. Peebles, W., Xie, S.: Scalable diffusion models with transformers

    Google Scholar 

  17. Prudencio, R.F., ROA Maximo, M., Colombini, E.L.: A survey on offline reinforcement learning: Taxonomy, review, and open problems. IEEE

    Google Scholar 

  18. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10684–10695

    Google Scholar 

  19. Sun, Y., et al.: Retentive network: a successor to transformer for large language models

    Google Scholar 

  20. Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction. MIT press (2018)

    Google Scholar 

  21. Tesauro, G., et al.: Temporal difference learning and td-gammon. Commun. ACM 38(3), 58–68 (1995)

    Article  Google Scholar 

  22. Xu, Y., Liu, Z., Tegmark, M., Jaakkola, T.: Poisson flow generative models. 35, pp. 16782–16795

    Google Scholar 

  23. Yang, L., et al.: Diffusion models: a comprehensive survey of methods and applications

    Google Scholar 

  24. Yang, S., Nachum, O., Du, Y., Wei, J., Abbeel, P., Schuurmans, D.: Foundation models for decision making: Problems, methods, and opportunities

    Google Scholar 

  25. Zhong, Z., et al.: Guided conditional diffusion for controllable traffic simulation. In: 2023 IEEE International Conference on Robotics and Automation (ICRA), pp. 3560–3566. IEEE

    Google Scholar 

Download references

Acknowledgment

The authors gratefully acknowledge the financial supports by the Guangzhou basic and applied basic research Project (2023A04J1725); Funded by National Natural Science Foundation of China (No.62102107).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongyang Yan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cai, H., Zhang, Z., Yao, Z., Mo, K., Chen, D., Yan, H. (2024). Decision Poisson: From Universal Gravitation to Offline Reinforcement Learning. In: Vaidya, J., Gabbouj, M., Li, J. (eds) Artificial Intelligence Security and Privacy. AIS&P 2023. Lecture Notes in Computer Science, vol 14509. Springer, Singapore. https://doi.org/10.1007/978-981-99-9785-5_31

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-9785-5_31

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-9784-8

  • Online ISBN: 978-981-99-9785-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics