Approximate planning for bayesian hierarchical reinforcement learning

Vien, Ngo Anh; Ngo, Hung; Lee, Sungyoung; Chung, TaeChoong

doi:10.1007/s10489-014-0565-6

Approximate planning for bayesian hierarchical reinforcement learning

Published: 20 July 2014

Volume 41, pages 808–819, (2014)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Ngo Anh Vien¹,
Hung Ngo²,
Sungyoung Lee³ &
…
TaeChoong Chung³

580 Accesses
9 Citations
Explore all metrics

Abstract

In this paper, we propose to use hierarchical action decomposition to make Bayesian model-based reinforcement learning more efficient and feasible for larger problems. We formulate Bayesian hierarchical reinforcement learning as a partially observable semi-Markov decision process (POSMDP). The main POSMDP task is partitioned into a hierarchy of POSMDP subtasks. Each subtask might consist of only primitive actions or hierarchically call other subtasks’ policies, since the policies of lower-level subtasks are considered as macro actions in higher-level subtasks. A solution for this hierarchical action decomposition is to solve lower-level subtasks first, then higher-level ones. Because each formulated POSMDP has a continuous state space, we sample from a prior belief to build an approximate model for them, then solve by using a recently introduced Monte Carlo Value Iteration with Macro-Actions solver. We name this method Monte Carlo Bayesian Hierarchical Reinforcement Learning. Simulation results show that our algorithm exploiting the action hierarchy performs significantly better than that of flat Bayesian reinforcement learning in terms of both reward, and especially solving time, in at least one order of magnitude.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bayes-adaptive hierarchical MDPs

Article 29 January 2016

Ngo Anh Vien, SeungGwan Lee & TaeChoong Chung

A Hierarchical Representation Policy Iteration Algorithm for Reinforcement Learning

Adaptive Sparse Grids in Reinforcement Learning

References

Abbeel P, Coates A, Quigley M, Ng AY (2006) An application of reinforcement learning to aerobatic helicopter flight. In: Advances in neural information processing systems (NIPS), pp 1–8
Abdoos M, Mozayani N, Bazzan ALC (2014) Hierarchical control of traffic signals using q-learning with tile coding. Appl Intell 40(2):201–213
Article Google Scholar
Asmuth J, Littman ML (2011) Learning is planning: near Bayesoptimal reinforcement learning via Monte-Carlo tree search. In: UAI, pp 19–26
Atkeson CG (1997) Nonparametric model-based reinforcement learning. In: Advances in neural information processing systems (NIPS)
Bai H, Hsu D, Lee WS, Vien NA (2010) Monte Carlo value iteration for continuous-state POMDPs. In: Algorithmic foundations of robotics IX, pp 175–191
Barto AG, Mahadevan S (2003) Recent advances in hierarchical reinforcement learning. Discrete Event Dyn Syst 13(4):341–379
Article MathSciNet Google Scholar
Baxter J, Tridgell A, Weaver L (2000) Learning to play chess using temporal differences. Mach Learn 40(3):243–263
Article MATH Google Scholar
Cao F, Ray S (2012) Bayesian hierarchical reinforcement learning. In: Bartlett P, Pereira F, Burges C, Bottou L, Weinberger K (eds) Advances in neural information processing systems (NIPS), pp 73–81
Castro PS, Precup D (2007) Using linear programming for Bayesian exploration in Markov decision processes. In: IJCAI, pp 2437–2442
Dearden R, Friedman N, Russell SJ (1998) Bayesian Q-learning. In: AAAI, pp 761–768
Dietterich TG (2000) Hierarchical reinforcement learning with the MAXQ value function decomposition. J Artif Intell Res (JAIR) 13:227–303
MathSciNet MATH Google Scholar
Duff M (2002) Optimal learning: Computational procedures for Bayes-adaptive Markov decision processes. PhD thesis, University of Massassachusetts Amherst
Engel Y, Mannor S, Meir R (2003) Bayes meets Bellman: The Gaussian process approach to temporal difference learning. In: Proceedings of the international conference on machine learning, pp 154–161
Engel Y, Mannor S, Meir R (2005) Reinforcement learning with Gaussian processes. In: Proceedings of the International Conference on Machine Learning, pp 201–208
Furmston T, Barber D (2010) Variational methods for reinforcement learning. In: AISTATS, pp 241–248
Ghavamzadeh M, Engel Y (2006) Bayesian policy gradient algorithms. In: Advances in neural information processing systems (NIPS), pp 457–464
Ghavamzadeh M, Engel Y (2007) Bayesian actor-critic algorithms. In: Proceedings of the international conference on machine learning, pp 297–304
Granmo OC, Glimsdal S (2012) Accelerated Bayesian learning for decentralized two-armed bandit based decision making with applications to the goore game. Appl Intell
Guez A, Silver D, Dayan P (2012) Efficient Bayes-adaptive reinforcement learning using sample-based search. In: Advances in neural information processing systems (NIPS), pp 1034–1042
Hauskrecht M, Meuleau N, Kaelbling LP, Dean T, Boutilier C (1998) Hierarchical solution of Markov decision processes using macro-actions. In: UAI, pp 220–229
He R, Brunskill E, Roy N (2010) PUMA: Planning under uncertainty with macro-actions. In: Proceedings of the association for the advancement of artificial intelligence (AAAI)
Hong J, Prabhu VV (2004) Distributed reinforcement learning control for batch sequencing and sizing in just-in-time manufacturing systems. Applied Intelligence 20(1):71–87
Article Google Scholar
Iglesias A, Martínez P, Aler R, Fernández F (2009) Learning teaching strategies in an adaptive and intelligent educational system through reinforcement learning. Appl Intell 31(1):89–106
Article Google Scholar
Jong NK, Stone P (2008) Hierarchical model-based reinforcement learning: Rmax + MAXQ. In: Proceedings of the international
Li J, Li Z, Chen J (2011) Microassembly path planning using reinforcement learning for improving positioning accuracy of a 1 cm³ omni-directional mobile microrobot. Appl Intell 34(2):211–225
Article Google Scholar
Lim ZW, Hsu D, Sun LW(2011) Monte Carlo value iteration with macro-actions. In: Advances in neural information processing systems (NIPS), pp 1287–1295
Ngo H, LuciwM, F¨orster A, Schmidhuber J (2012) Learning skills from play: Artificial curiosity on a Katana robot arm In: Proceedings of the international joint conference of neural networks (IJCNN)
Ngo H, Luciw M, Förster A, Schmidhuber J (2013) Confidence-based progress-driven self-generated goals for skill acquisition in developmental robots. Front Psychol 4
Pakizeh E, Palhang M, Pedram MM (2012) Multi-criteria expertness based cooperative Q-learning. Appl Intell
Pineau J (2004) Tractable planning under uncertainty: exploiting structure. Ph.D. thesis. Robotics Institute, Carnegie Mellon University
Google Scholar
Pineau J, Thrun S (2001) An integrated approach to hierarchy and abstraction for POMDPs. Tech. rep. Carnegie Mellon University, Robotics Institute
Google Scholar
Porta JM, Vlassis NA, Spaan MTJ, Poupart P (2006) Point-based value iteration for continuous POMDPs. JMLR 7:2329–2367
MathSciNet MATH Google Scholar
Poupart P, Vlassis NA, Hoey J, Regan K (2006) An analytic solution to discrete Bayesian reinforcement learning. In: Proceedings of the international conference on machine learning, pp 697–704
Ross S, Chaib-draa B, Pineau J (2007) Bayes-adaptive POMDPs. In: Advances in neural information processing systems (NIPS)
Ross S, Pineau J Model-based bayesian reinforcement learning in large structured domains. In: UAI, pp. 476–483, (2008)
Samuel AL (1959) Some studies in machine learning using the game of checkers. IBM J Res Dev 3(3):210–229
Article MathSciNet Google Scholar
Singh SP, Bertsekas D (1996) Reinforcement learning for dynamic channel allocation in cellular telephone systems. In: Advances in neural information processing systems (NIPS), pp 974–980
Strens MJA (2000) A Bayesian framework for reinforcement learning. In: Proceedings of the international conference on machine learning, pp 943–950
Sun S (2013) A review of deterministic approximate inference techniques for Bayesian machine learning. Neural Comput Applic 23(7-8):2039–2050
Article Google Scholar
Sutton RS, Barto AG (1998) Reinforcement learning: An introduction. MIT Press, Cambridge, MA
Google Scholar
Sutton RS, Precup D, Singh SP (1999) Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artif Intell 112(1-2):181–211
Article MathSciNet MATH Google Scholar
Szepesvári C (2010) Algorithms for reinforcement learning. Synth Lect Artif Intell Mach Learn 4(1):1–103
Article Google Scholar
Tesauro G (1992) Practical issues in temporal difference learning. Mach Learn 8:257–277
MATH Google Scholar
Tesauro G (1994) TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Comput 6(2):215–219
Article Google Scholar
Tesauro G (1995) Temporal difference learning and TD-Gammon. Commun ACM 38(3):58–68
Article Google Scholar
Strens MJA (2000) A Bayesian framework for reinforcement learning. In: Proceedings of the international conference on machine learning, pp 943–950
Turkett WH Robust multiagent plan generation and execution with decision theoretic planners. Ph.D. thesis, Department of Computer Science and Engineering, University of South Carolina (1998)
Vien NA, Chung T (2007) Natural gradient policy for average cost SMDP problem. In: Proceedings of the IEEE international conference on tools with artificial intelligence, pp 11– 18
Vien NA, Chung T (2008) Policy gradient semi-Markov decision process. In: Proceedings of the IEEE international conference on tools with artificial intelligence, pp 11–18
Vien NA, Ertel W, Chung T (2013) Learning via human feedback in continuous state and action spaces. Appl Intell 39(2)
Vien NA, Ertel W, Dang VH, Chung T (2013) Monte-Carlo tree search for Bayesian reinforcement learning. Appl Intell 39(2):345–353
Article Google Scholar
Vien NA, Ngo H, Ertel W (2014) Monte Carlo Bayesian hierarchical reinforcement learning. In: Proceedings of the international conference on autonomous agents and multi-agent systems (AAMAS), pp 1551–1552. International Foundation for Autonomous Agents and Multiagent Systems (2014)
Vien NA, Viet NH, Lee S, Chung T (2007) Heuristic search based exploration in reinforcement learning. In: IWANN, pp 110–118
Vien NA, Viet NH, Lee S, Chung T (2007) Obstacle avoidance path planning for mobile robot based on ant-q reinforcement learning algorithm. In: ISNN (1), pp 704–713
Vien NA, Viet NH, Lee S, Chung T (2009) Policy gradient SMDP for resource allocation and routing in integrated services networks. IEICE Trans 92-B (6):2008–2022
Article Google Scholar
Vien NA, Yu H, Chung T (2011) Hessian matrix distribution for Bayesian policy gradient reinforcement learning. Info Sci 181(9):1671–1685
Article MathSciNet MATH Google Scholar
Viet NH, Vien NA, Chung T (2008) Policy gradient SMDP for resource allocation and routing in integrated services networks. In: ICNSC, pp 1541–1546
Wang T, Lizotte DJ, Bowling MH, Schuurmans D (2005) Bayesian sparse sampling for on-line reward optimization. In: Proceedings of the international conference on machine learning, pp 956–963
Wang Y, Won KS, Hsu D, Lee WS (2010) Monte Carlo Bayesian reinforcement learning. In: Proceedings of the international conference on machine learning
White CC (1976) Procedures for the solution of a finite-horizon, partially observed, semi-Markov optimization problem. Oper Res 24(2):348–358
Article MATH Google Scholar
Wu B, Zheng HY, Feng YP (2014) Point-based online value iteration algorithm in large pomdp. Appl Intell:546–555
Zhang W, Dietterich TG (1995) A reinforcement learning approach to job-shop scheduling. In: International joint conferences on artificial intelligence, pp 1114–1120

Download references

Acknowledgments

This work was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science, and Technology (2010-0012609).

Author information

Authors and Affiliations

Machine Learning and Robotics Lab, University of Stuttgart, Stuttgart, Germany
Ngo Anh Vien
Swiss AI Lab IDSIA, USI-SUPSI, Galleria 2, CH-6928, Manno-Lugano, Ticino, Switzerland
Hung Ngo
Department of Computer Engineering, Kyung Hee University, Seocheon-dong, Giheung-gu, Yongin-si, Gyeonggi-do, 446-701, South Korea
Sungyoung Lee & TaeChoong Chung

Authors

Ngo Anh Vien
View author publications
You can also search for this author in PubMed Google Scholar
Hung Ngo
View author publications
You can also search for this author in PubMed Google Scholar
Sungyoung Lee
View author publications
You can also search for this author in PubMed Google Scholar
TaeChoong Chung
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Ngo Anh Vien or TaeChoong Chung.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vien, N., Ngo, H., Lee, S. et al. Approximate planning for bayesian hierarchical reinforcement learning. Appl Intell 41, 808–819 (2014). https://doi.org/10.1007/s10489-014-0565-6

Download citation

Published: 20 July 2014
Issue Date: October 2014
DOI: https://doi.org/10.1007/s10489-014-0565-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Approximate planning for bayesian hierarchical reinforcement learning

Abstract

Access this article

Similar content being viewed by others

Bayes-adaptive hierarchical MDPs

A Hierarchical Representation Policy Iteration Algorithm for Reinforcement Learning

Adaptive Sparse Grids in Reinforcement Learning

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding authors

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Approximate planning for bayesian hierarchical reinforcement learning

Abstract

Access this article

Similar content being viewed by others

Bayes-adaptive hierarchical MDPs

A Hierarchical Representation Policy Iteration Algorithm for Reinforcement Learning

Adaptive Sparse Grids in Reinforcement Learning

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding authors

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation