Bayes-adaptive hierarchical MDPs

Vien, Ngo Anh; Lee, SeungGwan; Chung, TaeChoong

doi:10.1007/s10489-015-0742-2

Bayes-adaptive hierarchical MDPs

Published: 29 January 2016

Volume 45, pages 112–126, (2016)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Ngo Anh Vien¹,
SeungGwan Lee² &
TaeChoong Chung³

683 Accesses
2 Citations
4 Altmetric
Explore all metrics

Abstract

Reinforcement learning (RL) is an area of machine learning that is concerned with how an agent learns to make decisions sequentially in order to optimize a particular performance measure. For achieving such a goal, the agent has to choose either 1) exploiting previously known knowledge that might end up at local optimality or 2) exploring to gather new knowledge that expects to improve the current performance. Among other RL algorithms, Bayesian model-based RL (BRL) is well-known to be able to trade-off between exploitation and exploration optimally via belief planning, i.e. partially observable Markov decision process (POMDP). However, solving that POMDP often suffers from curse of dimensionality and curse of history. In this paper, we make two major contributions which are: 1) an integration framework of temporal abstraction into BRL that eventually results in a hierarchical POMDP formulation, which can be solved online using a hierarchical sample-based planning solver; 2) a subgoal discovery method for hierarchical BRL that automatically discovers useful macro actions to accelerate learning. In the experiment section, we demonstrate that the proposed approach can scale up to much larger problems. On the other hand, the agent is able to discover useful subgoals for speeding up Bayesian reinforcement learning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Integrating Learning and Planning

Hierarchical Reinforcement Learning with Unlimited Recursive Subroutine Calls

Active Inference Successor Representations

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Abbeel P, Coates A, Quigley M, Ng AY (2006) An application of reinforcement learning to aerobatic helicopter flight
Asmuth J, Littman ML (2011) Learning is planning: near Bayes-optimal reinforcement learning via Monte-Carlo tree search. In: UAI, p 19–26
Atkeson CG (1997) Nonparametric model-based reinforcement learning. In: Advances in Neural Information Processing Systems (NIPS)
Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Mach Learn 47(2-3):235– 256
Article MATH Google Scholar
Bai H, Hsu D, Lee WS, Vien NA (2010) Monte Carlo value iteration for continuous-state POMDPs. In: Algorithmic foundations of robotics IX, p 175–191
Barto AG, Mahadevan S (2003) Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems 13(4):341–379
Article MathSciNet MATH Google Scholar
Baxter J, Tridgell A, Weaver L (2000) Learning to play chess using temporal differences. Mach Learn 40(3):243–263
Article MATH Google Scholar
Cao F, Ray S, Bottou L (2012) Bayesian hierarchical reinforcement learning. In: 0Bartlett P, Pereira F, Burges C, Weinberger K (eds) Advances in Neural Information Processing Systems (NIPS), pp 73–81
Castro PS, Precup D (2007) Using Linear Programming for Bayesian exploration in Markov decision processes. In: IJCAI, p 2437–2442
Cuayȧhuitl H, Kruijff-Korbayovȧ I, Dethlefs N (2014) Nonstrict hierarchical reinforcement learning for interactive systems and robots. TiiS 4(3):15:1–15, 30
Google Scholar
Dearden R, Friedman N, Russell SJ (1998) Bayesian Q-learning. In: AAAI, p 761–768
Dietterich TG (2000) Hierarchical reinforcement learning with the MAXQ value function decomposition. J Artif Intell Res (JAIR) 13:227–303
MathSciNet MATH Google Scholar
Duff M (2002) Optimal learning: Computational procedures for Bayes-adaptive Markov decision processes. PhD thesis, University of Massassachusetts Amherst
Engel Y, Mannor S, Meir R (2003) Bayes meets Bellman: The Gaussian process approach to temporal difference learning. In: Proceedings of the International Conference on Machine Learning, p 154–161
Engel Y, Mannor S, Meir R (2005) Reinforcement learning with Gaussian processes. In: Proceedings of the International Conference on Machine Learning, p 201–208
Ghavamzadeh M, Engel Y (2006) Bayesian policy gradient algorithms. In: Advances in Neural Information Processing Systems (NIPS), p 457–464
Ghavamzadeh M, Engel Y (2007) Bayesian actor-critic algorithms. In: Proceedings of the International Conference on Machine Learning, p 297–304
Guez A, Silver D, Dayan P (2012) Efficient Bayes-adaptive reinforcement learning using sample-based search. In: Advances in Neural Information Processing Systems (NIPS), p 1034–1042
Hauskrecht M, Meuleau N, Kaelbling LP, Dean T, Boutilier C (1998) Hierarchical solution of Markov decision processes using macro-actions. In: UAI, p 220–229
He R, Brunskill E, Roy N (2010) PUMA: Planning under uncertainty with macro-actions. In: Proceedings of the Association for the Advancement of Artificial Intelligence (AAAI)
Hong J, Prabhu VV (2004) Distributed reinforcement learning control for batch sequencing and sizing in just-in-time manufacturing systems. Appl Intell 20(1):71–87
Article Google Scholar
Iglesias A, Martínez P, Aler R, Fernández F (2009) Learning teaching strategies in an adaptive and intelligent educational system through reinforcement learning. Appl Intell 31(1):89–106
Article Google Scholar
Konidaris G, Barto AG (2009) Skill discovery in continuous reinforcement learning domains using skill chaining. In: Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Proceedings of a meeting held 7-10 December 2009, Vancouver, British Columbia, Canada., p 1015–1023
Lang T, Toussaint M (2010) Planning with noisy probabilistic relational rules. J Artif Intell Res (JAIR) 39:1–49
MATH Google Scholar
Lim ZW, Hsu D, Sun LW (2011) Monte Carlo value iteration with macro-actions. In: Advances in Neural Information Processing Systems (NIPS), p 1287–1295
McGovern A, Barto AG (2001) Automatic discovery of subgoals in reinforcement learning using diverse density. In: Proceedings of the International Conference on Machine Learning, p 361– 368
Mcgovern A, Sutton RS, Fagg AH (1997) Roles of macro-actions in accelerating reinforcement learning. In: In Grace Hopper Celebration of Women in Computing, p 13–18
Pineau J, Thrun S (2001) An integrated approach to hierarchy and abstraction for POMDPs. Tech. rep., Carnegie Mellon University. Robotics Institute
Poupart P, Vlassis NA, Hoey J, Regan K (2006) An analytic solution to discrete Bayesian reinforcement learning. In: Proceedings of the International Conference on Machine Learning, p 697–704
Ross S, Chaib-draa B, Pineau J (2007) Bayes-adaptive POMDPs. In: Advances in Neural Information Processing Systems (NIPS)
Ross S, Pineau J (2008) Model-based bayesian reinforcement learning in large structured domains. In: UAI, pp. 476–483
Samuel AL (1959) Some studies in machine learning using the game of checkers. IBM J Res Dev 3(3):210–229
Article MathSciNet Google Scholar
Silver D, Veness J (2010) Monte-carlo planning in large POMDPs. In: Advances in Neural Information Processing Systems (NIPS), pp. 2164–2172
Simsek Ö., Barto AG (2004) Using relative novelty to identify useful temporal abstractions in reinforcement learning. In: Proceedings of the International Conference on Machine Learning
Simsek Ö., Barto AG (2008) Skill characterization based on betweenness. In: Advances in Neural Information Processing Systems (NIPS), p 1497–1504
Singh SP, Bertsekas D (1996) Reinforcement learning for dynamic channel allocation in cellular telephone systems. In: Advances in Neural Information Processing Systems (NIPS), p 974–980
Strens MJA (2000) A Bayesian framework for reinforcement learning. In: Proceedings of the International Conference on Machine Learning, p 943–950
Sutton RS, Barto AG (1998) Reinforcement Learning: An Introduction. MIT Press, Cambridge
Google Scholar
Sutton RS, Precup D, Singh SP (1999) Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artif Intell 112(1-2):181–211
Article MathSciNet MATH Google Scholar
Tesauro G (1992) Practical issues in temporal difference learning. Mach Learn 8:257–277
MATH Google Scholar
Tesauro G (1994) TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Comput 6(2):215–219
Article Google Scholar
Tesauro G (1995) Temporal difference learning and TD-Gammon. Commun ACM 38(3):58–68
Article Google Scholar
Theocharous G, Kaelbling LP (2003) Approximate planning in POMDPs with macro-actions. In: Advances in Neural Information Processing Systems (NIPS)
Vien NA, Chung T (2007) Natural gradient policy for average cost SMDP problem. In: Proceedings of the IEEE International Conference on Tools with Artificial Intelligence, p 11–18
Vien NA, Chung T (2008) Policy gradient semi-Markov decision process. In: Proceedings of the IEEE International Conference on Tools with Artificial Intelligence, p 11–18
Vien NA, Ertel W (2012) Learning via human feedback in continuous state and action spaces. In: AAAI Fall Symposium: Robots Learning Interactively from Human Teachers
Vien NA, Ertel W (2012) Monte carlo tree search for bayesian reinforcement learning. In: 11th International Conference on Machine Learning and Applications, ICMLA, Boca Raton, FL, USA, December 12-15, 2012. Volume 1, p 138–143
Vien NA, Lee S, Chung T (2010) Policy gradient based semi-Markov decision problems: Approximation and estimation errors. IEICE Trans 93-D(2):271–279
Google Scholar
Vien NA, Ngo H, Ertel W (2014) Monte Carlo Bayesian hierarchical reinforcement learning. In: Proceedings of the International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), pp. 1551–1552. International Foundation for Autonomous Agents and Multiagent Systems
Vien NA, Ngo H, Lee S, Ertel W (2014) Approximate planning for bayesian hierarchical reinforcement learning. Appl Intell 41(3):808–819
Article Google Scholar
Vien NA, Toussaint M (2014) Model-based relational RL when object existence is partially observable. In: (ICML 2014)
Vien NA, Toussaint M (2015) Hierarchical Monte-Carlo planning. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, January 25-30, 2015, Austin, Texas, USA., p 3613–3619
Vien NA, Toussaint M (2015) POMDP manipulation via trajectory optimization. In: Proc. of the Int. Conf. on Intelligent Robots and Systems (IROS 2015)
Vien NA, Toussaint M (2015) Touch based POMDP manipulation via sequential submodular optimization. In: Proc. of the IEEE-RAS Int. Conf. on Humanoid Robots (Humanoids 2015)
Vien NA, Viet NH, Lee S, Chung T (2007) Heuristic search based exploration in reinforcement learning. In: IWANN, p 110–118
Vien NA, Viet NH, Lee S, Chung T (2007) Obstacle avoidance path planning for mobile robot based on ant-q reinforcement learning algorithm. In: ISNN (1), p 704–713
Vien NA, Viet NH, Lee S, Chung T (2009) Policy gradient SMDP for resource allocation and routing in integrated services networks. IEICE Trans 92-B(6):2008–2022
Article Google Scholar
Vien NA, Viet NH, Park H, Lee S, Chung T (2007) Q-learning based univector field navigation method for mobile robots. In: Khaled E (ed) Advances and Innovations in Systems, Computing Sciences and Software Engineering, p 463–468
Vien NA, Yu H, Chung T (2011) Hessian matrix distribution for Bayesian policy gradient reinforcement learning. Inf Sci 181(9):1671–1685
Article MathSciNet MATH Google Scholar
Viet NH, Vien NA, Chung T (2008) Policy gradient SMDP for resource allocation and routing in integrated services networks. In: ICNSC, p 1541–1546
Viet NH, Vien NA, Lee S, Chung T (2008) Obstacle avoidance path planning for mobile robot based on multi colony ant algorithm. In: ACHI, p 285–289
Wang T, Lizotte DJ, Bowling MH, Schuurmans D (2005) Bayesian sparse sampling for on-line reward optimization. In: Proceedings of the International Conference on Machine Learning, p 956–963
Wang Y, Won KS, Hsu D, Lee WS (2010) Monte Carlo Bayesian reinforcement learning. In: Proceedings of the International Conference on Machine Learning
White CC (1976) Procedures for the solution of a finite-horizon, partially observed, semi-Markov optimization problem. Oper Res 24(2):348–358
Article MathSciNet MATH Google Scholar
Zhang W, Dietterich TG (1995) A reinforcement learning approach to job-shop scheduling. In: International Joint Conferences on Artificial Intelligence, p 1114–1120

Download references

Acknowledgements

The authors are grateful to the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science, and Technology (2014R1A1A2057735) for its tremendous support to this work’s completion.

Author information

Authors and Affiliations

Machine Learning and Robotics Laboratory, University of Stuttgart, Stuttgart, Germany
Ngo Anh Vien
College of Liberal Arts, Kyung Hee University, 1 Seocheon-dong, Giheung-gu, Yongin-si, Gyeonggi-do, 446-701, Korea
SeungGwan Lee
Department of Computer Engineering, Kyung Hee University, Seocheon-dong, Giheung-gu, Yongin-si, Gyeonggi-do, 446-701, South Korea
TaeChoong Chung

Authors

Ngo Anh Vien
View author publications
You can also search for this author inPubMed Google Scholar
SeungGwan Lee
View author publications
You can also search for this author inPubMed Google Scholar
TaeChoong Chung
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding authors

Correspondence to Ngo Anh Vien or TaeChoong Chung.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vien, N.A., Lee, S. & Chung, T. Bayes-adaptive hierarchical MDPs. Appl Intell 45, 112–126 (2016). https://doi.org/10.1007/s10489-015-0742-2

Download citation

Published: 29 January 2016
Issue Date: July 2016
DOI: https://doi.org/10.1007/s10489-015-0742-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bayes-adaptive hierarchical MDPs

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Integrating Learning and Planning

Hierarchical Reinforcement Learning with Unlimited Recursive Subroutine Calls

Active Inference Successor Representations

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now