Abstract
Hierarchical algorithms for Markov decision processes have been proved to be useful for the problem domains with multiple subtasks. Although the existing hierarchical approaches are strong in task decomposition, they are weak in task abstraction, which is more important for task analysis and modeling. In this paper, we propose a task-oriented design to strengthen the task abstraction. Our approach learns an episodic task model from the problem domain, with which the planner obtains the same control effect, with concise structure and much improved performance than the original model. According to our analysis and experimental evaluation, our approach has better performance than the existing hierarchical algorithms, such as MAXQ and HEXQ.
Similar content being viewed by others
References
Boutilier C, Dearden R, Goldszmidt M (1995) Exploiting structure in policy construction. In: Proceedings of IJCAI, pp 1104–1113
Deák F, Kovács A, Váncza J, Dobrowiecki TP (2001) Hierarchical knowledge-based process planning in manufacturing. In: Proceedings of the IFIP 11 international PROLAMAT conference on digital enterprise, pp 428–439
Dietterich TG (1998) The maxq method for hierarchical reinforcement learning. In: ICML, San Francisco, CA, USA, pp 118–126
Dietterich TG (2000) Hierarchical reinforcement learning with the MAXQ value function decomposition. J Artif Intell Res 13: 227–303
Hansen EA, Zhou R (2003) Synthesis of hierarchical finite-state controllers for POMDPs. In: Proceedings of ICAPS, AAAI, pp 113–122
Hengst B (2002) Discovering hierarchy in reinforcement learning with hexq. In: ICML ’02: proceedings of the nineteenth international conference on machine learning, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 243–250
Jonsson A, Barto A (2005) A causal approach to hierarchical decomposition of factored mdps. In: ICML ’05: proceedings of the 22nd international conference on machine learning, ACM, New York, NY, USA, pp 401–408, http://doi.acm.org/10.1145/1102351.1102402
Pineau J, Roy N, Thrun S (2001) A hierarchical approach to pomdp planning and execution. In: Workshop on hierarchy and memory in reinforcement learning (ICML)
Potts D, Hengst B (2004) Discovering multiple levels of a task hierarchy concurrently. Rob Auton Syst 49(1-2): 43–55
Smith T, Simmons RG (2004) Heuristic search value iteration for POMDPs. In: Proceedings of UAI
Sutton RS, Precup D, Singh S (1999) Between mdps and semi-mdps: a framework for temporal abstraction in reinforcement learning. Artif Intell 112(1-2):181–211. http://dx.doi.org/10.1016/S0004-3702(99)00052-1
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lin, Y., Makedon, F. & Xu, Y. Episodic task learning in Markov decision processes. Artif Intell Rev 36, 87–98 (2011). https://doi.org/10.1007/s10462-011-9204-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-011-9204-3