Abstract
Task decomposition and State abstraction are crucial parts in reinforcement learning. It allows an agent to ignore aspects of its current states that are irrelevant to its current decision, and therefore speeds up dynamic programming and learning. This paper presents the SVI algorithm that uses a dynamic Bayesian network model to construct an influence graph that indicates relationships between state variables. SVI performs state abstraction for each subtask by ignoring irrelevant state variables and lower level subtasks. Experiment results show that the decomposition of tasks introduced by SVI can significantly accelerate constructing a near-optimal policy. This general framework can be applied to a broad spectrum of complex real world problems such as robotics, industrial manufacturing, games and others.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Barto A, Mahadevan S (2003) Recent advances in hierarchical reinforcement learning. Discrete Event Syst (special issue on reinforcement learning) 13: 41–77
Bertsekas DP, Tsitsiklis JN (1996) Neuro-dynamic programming. Athena Scientific, Belmont
Boutilier C, Dearden R, Goldszmidt M (1995) Exploiting structure in policy construction. IJCAI 14: 1104–1113
Dean T, Kanazawa K (1989) A model for reasoning about persistence and causation. Comput Intell 5(3): 142–150
Dietterich T (2000) Hierarchical reinforcement learning with the MAXQ value function decoposition. J Artif Intell Res 13: 227–303
Hengst B (2002) Discovering hierarchy in reinforcement learning with HEXQ. ICML 19: 243–250
Jonsson A, Barto A (2005) A causal approach to hierarchical decomposition of factored MDPs. In: Proceedings of the 22nd international conference on machine learning, pp 401–408
Makar R, Mahadevan S, Ghavamzadeh M (2001) Hierarchical multi-agent reinforcement learning. In: Proceedings of the 5th international conference on autonomous agents
Parr R, Russell S (1998) Reinforcement learning with hierarchies of machines. Advances in neural information processing systems. MIT Press, Oxford, pp 1043–1049
Sutton R, Barto A (1998) Reinforcement learning. MIT Press, Oxford
Sutton R, Precup D, Singh S (1999) Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artif Intell 112(1-2): 181–211
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lasheng, Y., Zhongbin, J. & Kang, L. Research on task decomposition and state abstraction in reinforcement learning. Artif Intell Rev 38, 119–127 (2012). https://doi.org/10.1007/s10462-011-9243-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-011-9243-9