Skip to main content
Log in

Episodic task learning in Markov decision processes

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Hierarchical algorithms for Markov decision processes have been proved to be useful for the problem domains with multiple subtasks. Although the existing hierarchical approaches are strong in task decomposition, they are weak in task abstraction, which is more important for task analysis and modeling. In this paper, we propose a task-oriented design to strengthen the task abstraction. Our approach learns an episodic task model from the problem domain, with which the planner obtains the same control effect, with concise structure and much improved performance than the original model. According to our analysis and experimental evaluation, our approach has better performance than the existing hierarchical algorithms, such as MAXQ and HEXQ.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Boutilier C, Dearden R, Goldszmidt M (1995) Exploiting structure in policy construction. In: Proceedings of IJCAI, pp 1104–1113

  • Deák F, Kovács A, Váncza J, Dobrowiecki TP (2001) Hierarchical knowledge-based process planning in manufacturing. In: Proceedings of the IFIP 11 international PROLAMAT conference on digital enterprise, pp 428–439

  • Dietterich TG (1998) The maxq method for hierarchical reinforcement learning. In: ICML, San Francisco, CA, USA, pp 118–126

  • Dietterich TG (2000) Hierarchical reinforcement learning with the MAXQ value function decomposition. J Artif Intell Res 13: 227–303

    MATH  MathSciNet  Google Scholar 

  • Hansen EA, Zhou R (2003) Synthesis of hierarchical finite-state controllers for POMDPs. In: Proceedings of ICAPS, AAAI, pp 113–122

  • Hengst B (2002) Discovering hierarchy in reinforcement learning with hexq. In: ICML ’02: proceedings of the nineteenth international conference on machine learning, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 243–250

  • Jonsson A, Barto A (2005) A causal approach to hierarchical decomposition of factored mdps. In: ICML ’05: proceedings of the 22nd international conference on machine learning, ACM, New York, NY, USA, pp 401–408, http://doi.acm.org/10.1145/1102351.1102402

  • Pineau J, Roy N, Thrun S (2001) A hierarchical approach to pomdp planning and execution. In: Workshop on hierarchy and memory in reinforcement learning (ICML)

  • Potts D, Hengst B (2004) Discovering multiple levels of a task hierarchy concurrently. Rob Auton Syst 49(1-2): 43–55

    Article  Google Scholar 

  • Smith T, Simmons RG (2004) Heuristic search value iteration for POMDPs. In: Proceedings of UAI

  • Sutton RS, Precup D, Singh S (1999) Between mdps and semi-mdps: a framework for temporal abstraction in reinforcement learning. Artif Intell 112(1-2):181–211. http://dx.doi.org/10.1016/S0004-3702(99)00052-1

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yong Lin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lin, Y., Makedon, F. & Xu, Y. Episodic task learning in Markov decision processes. Artif Intell Rev 36, 87–98 (2011). https://doi.org/10.1007/s10462-011-9204-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-011-9204-3

Keywords

Navigation