Approximate Bayesian reinforcement learning based on estimation of plant

Senda, Kei; Hishinuma, Toru; Tani, Yurika

doi:10.1007/s10514-020-09901-4

Approximate Bayesian reinforcement learning based on estimation of plant

Published: 06 February 2020

Volume 44, pages 845–857, (2020)
Cite this article

Autonomous Robots Aims and scope Submit manuscript

445 Accesses
Explore all metrics

Abstract

This study proposes an approximate parametric model-based Bayesian reinforcement learning approach for robots, based on online Bayesian estimation and online planning for an estimated model. The proposed approach is designed to learn a robotic task with a few real-world samples and to be robust against model uncertainty, within feasible computational resources. The proposed approach employs two-stage modeling, which is composed of (1) a parametric differential equation model with a few parameters based on prior knowledge such as equations of motion, and (2) a parametric model that interpolates a finite number of transition probability models for online estimation and planning. The proposed approach modifies the online Bayesian estimation to be robust against approximation errors of the parametric model to a real plant. The policy planned for the interpolating model is proven to have a form of theoretical robustness. Numerical simulation and hardware experiments of a planar peg-in-hole task demonstrate the effectiveness of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reinforcement Learning Algorithm for Partially Observable Environment Considering State Estimation Error Prediction

Learning High-Level Navigation Strategies via Inverse Reinforcement Learning: A Comparative Analysis

Semi-parametric Approaches to Learning in Model-Based Hierarchical Control of Complex Systems

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Asmuth, J., Li, L., Littman, M. L., Nouri, A., & Wingate, D. (2009). A Bayesian sampling approach to exploration in reinforcement learning. In Proceedings of conference on uncertainty in artificial intelligence (pp. 19–26). AUAI Press.
Atkeson, C. G., & Santamaria, J. C. (1997). A comparison of direct and model-based reinforcement learning. In Proceedings of international conference on robotics and automation (Vol. 4, pp. 3557–3564). IEEE.
Bertsekas, D. P., & Tsitsiklis, J. N. (1996). Neuro-dynamic programming. Belmont, MA: Athena Scientific.
MATH Google Scholar
Bishop, C. M. (2006). Pattern recognition and machine learning (information science and statistics). Secaucus, NJ: Springer.
MATH Google Scholar
Deisenroth, M., & Rasmussen, C. E. (2011). PILCO: A model-based and data-efficient approach to policy search. In Proceedings international conference on machine learning (pp. 465–472).
Duff, M. O. (2002). Optimal learning: Computational procedures for Bayes-adaptive Markov decision processes. Ph.D. thesis, University of Massachusetts Amherst.
Ghavamzadeh, M., Mannor, S., Pineau, J., & Tamar, A. (2015). Bayesian reinforcement learning: A survey. Foundations and Trends in Machine Learning, 8(5–6), 359–483.
Article Google Scholar
Gopalan, A., & Mannor, S. (2015). Thompson sampling for learning parameterized Markov decision processes. In Conference on learning theory (pp. 861–898).
Haarnoja, T., Zhou, A., Hartikainen, K., Tucker, G., Ha, S., Tan, J., Kumar, V., Zhu, H., Gupta, A., Abbeel, P., et al. (2018). Soft actor-critic algorithms and applications. arXiv preprint. arXiv:1812.05905.
Kober, J., Bagnell, J. A., & Peters, J. (2013). Reinforcement learning in robotics: A survey. The International Journal of Robotics Research, 32(11), 1238–1274.
Article Google Scholar
Littman, M. L. (1994). Markov games as a framework for multi-agent reinforcement learning. In Machine learning proceedings 1994 (pp 157–163). Elsevier.
Mahmood, A. R., Korenkevych, D., Vasan, G., Ma, W., & Bergstra, J. (2018). Benchmarking reinforcement learning algorithms on real-world robots. In Conference on robot learning (pp. 561–591).
Mankowitz, D. J., Mann, T. A., Bacon, P. L., Precup, D., & Mannor, S. (2018). Learning robust options. In AAAI conference on artificial intelligence (pp. 6409–6416).
Morimoto, J., & Doya, K. (2005). Robust reinforcement learning. Neural Computation, 17(2), 335–359.
Article MathSciNet Google Scholar
Nagabandi, A., Kahn, G., Fearing, R. S., & Levine, S. (2018). Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning. In Proceedings of international conference on robotics and automation (pp 7559–7566). IEEE.
Nilim, A., & El Ghaoui, L. (2005). Robust control of Markov decision processes with uncertain transition matrices. Operations Research, 53(5), 780–798.
Article MathSciNet Google Scholar
Pinto, L., Davidson, J., Sukthankar, R., & Gupta, A. (2017). Robust adversarial reinforcement learning. In Proceedings of international conference on machine learning (pp. 2817–2826). JMLR. org.
Ross, S., Chaib-draa, B., & Pineau, J. (2008). Bayesian reinforcement learning in continuous POMDPs with application to robot navigation. In Proceedings of international conference on robotics and automation (pp 2845–2851).
Senda, K., & Tani, Y. (2014). Autonomous robust skill generation using reinforcement learning with plant variation. Advances in Mechanical Engineering, 6, 276264.
Article Google Scholar
Smith, R. (2008). Open dynamics engine. http://www.ode.org/
Strens, M. (2000). A Bayesian framework for reinforcement learning. In Proceedings of international conference on machine learning (pp. 943–950).
Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning. An introduction. Cambridge, MA: MIT Press.
Book Google Scholar
Sutton, R. S., Precup, D., & Singh, S. (1999). Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112(1–2), 181–211.
Article MathSciNet Google Scholar
Zhou, K., & Doyle, J. C. (1998). Essentials of robust control (Vol. 104). Upper Saddle River, NJ: Prentice Hall.
MATH Google Scholar

Download references

Acknowledgements

A part of this work has been financially supported by a grant-in-aid for Scientific Research from the Ministry of Education, Science, Culture, and Sports of Japan.

Author information

Authors and Affiliations

Department of Aeronautics and Astronautics, Kyoto University, Kyoto, 615-8540, Japan
Kei Senda, Toru Hishinuma & Yurika Tani

Authors

Kei Senda
View author publications
You can also search for this author inPubMed Google Scholar
Toru Hishinuma
View author publications
You can also search for this author inPubMed Google Scholar
Yurika Tani
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding authors

Correspondence to Kei Senda or Toru Hishinuma.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (mp4 5417 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Senda, K., Hishinuma, T. & Tani, Y. Approximate Bayesian reinforcement learning based on estimation of plant. Auton Robot 44, 845–857 (2020). https://doi.org/10.1007/s10514-020-09901-4

Download citation

Received: 10 February 2018
Accepted: 18 January 2020
Published: 06 February 2020
Issue Date: May 2020
DOI: https://doi.org/10.1007/s10514-020-09901-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Approximate Bayesian reinforcement learning based on estimation of plant

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Reinforcement Learning Algorithm for Partially Observable Environment Considering State Estimation Error Prediction

Learning High-Level Navigation Strategies via Inverse Reinforcement Learning: A Comparative Analysis

Semi-parametric Approaches to Learning in Model-Based Hierarchical Control of Complex Systems

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Electronic supplementary material

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now