Skip to main content
Log in

Approximate Bayesian reinforcement learning based on estimation of plant

  • Published:
Autonomous Robots Aims and scope Submit manuscript

Abstract

This study proposes an approximate parametric model-based Bayesian reinforcement learning approach for robots, based on online Bayesian estimation and online planning for an estimated model. The proposed approach is designed to learn a robotic task with a few real-world samples and to be robust against model uncertainty, within feasible computational resources. The proposed approach employs two-stage modeling, which is composed of (1) a parametric differential equation model with a few parameters based on prior knowledge such as equations of motion, and (2) a parametric model that interpolates a finite number of transition probability models for online estimation and planning. The proposed approach modifies the online Bayesian estimation to be robust against approximation errors of the parametric model to a real plant. The policy planned for the interpolating model is proven to have a form of theoretical robustness. Numerical simulation and hardware experiments of a planar peg-in-hole task demonstrate the effectiveness of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  • Asmuth, J., Li, L., Littman, M. L., Nouri, A., & Wingate, D. (2009). A Bayesian sampling approach to exploration in reinforcement learning. In Proceedings of conference on uncertainty in artificial intelligence (pp. 19–26). AUAI Press.

  • Atkeson, C. G., & Santamaria, J. C. (1997). A comparison of direct and model-based reinforcement learning. In Proceedings of international conference on robotics and automation (Vol. 4, pp. 3557–3564). IEEE.

  • Bertsekas, D. P., & Tsitsiklis, J. N. (1996). Neuro-dynamic programming. Belmont, MA: Athena Scientific.

    MATH  Google Scholar 

  • Bishop, C. M. (2006). Pattern recognition and machine learning (information science and statistics). Secaucus, NJ: Springer.

    MATH  Google Scholar 

  • Deisenroth, M., & Rasmussen, C. E. (2011). PILCO: A model-based and data-efficient approach to policy search. In Proceedings international conference on machine learning (pp. 465–472).

  • Duff, M. O. (2002). Optimal learning: Computational procedures for Bayes-adaptive Markov decision processes. Ph.D. thesis, University of Massachusetts Amherst.

  • Ghavamzadeh, M., Mannor, S., Pineau, J., & Tamar, A. (2015). Bayesian reinforcement learning: A survey. Foundations and Trends in Machine Learning, 8(5–6), 359–483.

    Article  Google Scholar 

  • Gopalan, A., & Mannor, S. (2015). Thompson sampling for learning parameterized Markov decision processes. In Conference on learning theory (pp. 861–898).

  • Haarnoja, T., Zhou, A., Hartikainen, K., Tucker, G., Ha, S., Tan, J., Kumar, V., Zhu, H., Gupta, A., Abbeel, P., et al. (2018). Soft actor-critic algorithms and applications. arXiv preprint. arXiv:1812.05905.

  • Kober, J., Bagnell, J. A., & Peters, J. (2013). Reinforcement learning in robotics: A survey. The International Journal of Robotics Research, 32(11), 1238–1274.

    Article  Google Scholar 

  • Littman, M. L. (1994). Markov games as a framework for multi-agent reinforcement learning. In Machine learning proceedings 1994 (pp 157–163). Elsevier.

  • Mahmood, A. R., Korenkevych, D., Vasan, G., Ma, W., & Bergstra, J. (2018). Benchmarking reinforcement learning algorithms on real-world robots. In Conference on robot learning (pp. 561–591).

  • Mankowitz, D. J., Mann, T. A., Bacon, P. L., Precup, D., & Mannor, S. (2018). Learning robust options. In AAAI conference on artificial intelligence (pp. 6409–6416).

  • Morimoto, J., & Doya, K. (2005). Robust reinforcement learning. Neural Computation, 17(2), 335–359.

    Article  MathSciNet  Google Scholar 

  • Nagabandi, A., Kahn, G., Fearing, R. S., & Levine, S. (2018). Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning. In Proceedings of international conference on robotics and automation (pp 7559–7566). IEEE.

  • Nilim, A., & El Ghaoui, L. (2005). Robust control of Markov decision processes with uncertain transition matrices. Operations Research, 53(5), 780–798.

    Article  MathSciNet  Google Scholar 

  • Pinto, L., Davidson, J., Sukthankar, R., & Gupta, A. (2017). Robust adversarial reinforcement learning. In Proceedings of international conference on machine learning (pp. 2817–2826). JMLR. org.

  • Ross, S., Chaib-draa, B., & Pineau, J. (2008). Bayesian reinforcement learning in continuous POMDPs with application to robot navigation. In Proceedings of international conference on robotics and automation (pp 2845–2851).

  • Senda, K., & Tani, Y. (2014). Autonomous robust skill generation using reinforcement learning with plant variation. Advances in Mechanical Engineering, 6, 276264.

    Article  Google Scholar 

  • Smith, R. (2008). Open dynamics engine. http://www.ode.org/

  • Strens, M. (2000). A Bayesian framework for reinforcement learning. In Proceedings of international conference on machine learning (pp. 943–950).

  • Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning. An introduction. Cambridge, MA: MIT Press.

    Book  Google Scholar 

  • Sutton, R. S., Precup, D., & Singh, S. (1999). Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112(1–2), 181–211.

    Article  MathSciNet  Google Scholar 

  • Zhou, K., & Doyle, J. C. (1998). Essentials of robust control (Vol. 104). Upper Saddle River, NJ: Prentice Hall.

    MATH  Google Scholar 

Download references

Acknowledgements

A part of this work has been financially supported by a grant-in-aid for Scientific Research from the Ministry of Education, Science, Culture, and Sports of Japan.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Kei Senda or Toru Hishinuma.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (mp4 5417 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Senda, K., Hishinuma, T. & Tani, Y. Approximate Bayesian reinforcement learning based on estimation of plant. Auton Robot 44, 845–857 (2020). https://doi.org/10.1007/s10514-020-09901-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10514-020-09901-4

Keywords

Navigation