Hybrid MDP based integrated hierarchical Q-learning

Chen, ChunLin; Dong, DaoYi; Li, Han-Xiong; Tarn, Tzyh-Jong

doi:10.1007/s11432-011-4332-6

Hybrid MDP based integrated hierarchical Q-learning

Research Papers
Published: 11 August 2011

Volume 54, pages 2279–2294, (2011)
Cite this article

Science China Information Sciences Aims and scope Submit manuscript

ChunLin Chen¹,
DaoYi Dong^2,3,
Han-Xiong Li⁴ &
…
Tzyh-Jong Tarn⁵

241 Accesses
15 Citations
3 Altmetric
Explore all metrics

Abstract

As a widely used reinforcement learning method, Q-learning is bedeviled by the curse of dimensionality: The computational complexity grows dramatically with the size of state-action space. To combat this difficulty, an integrated hierarchical Q-learning framework is proposed based on the hybrid Markov decision process (MDP) using temporal abstraction instead of the simple MDP. The learning process is naturally organized into multiple levels of learning, e.g., quantitative (lower) level and qualitative (upper) level, which are modeled as MDP and semi-MDP (SMDP), respectively. This hierarchical control architecture constitutes a hybrid MDP as the model of hierarchical Q-learning, which bridges the two levels of learning. The proposed hierarchical Q-learning can scale up very well and speed up learning with the upper level learning process. Hence this approach is an effective integral learning and control scheme for complex problems. Several experiments are carried out using a puzzle problem in a gridworld environment and a navigation control problem for a mobile robot. The experimental results demonstrate the effectiveness and efficiency of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Sutton R, Barto A G. Reinforcement Learning: An Introduction. Cambridge: MIT Press, 1998. 133–156
Google Scholar
Feng Z Y, Liang L T, Tan L, et al. Q-learning based heterogenous network self-optimization for reconfigurable network with CPC assistance. Sci China Ser F-Inf Sci, 2009, 52: 2360–2368
Article MATH Google Scholar
He P, Jagannathan S. Reinforcement learning-based output feedback control of nonlinear systems with input constraints. IEEE Trans Syst Man Cybern Part B-Cybern, 2005, 35: 150–154
Article Google Scholar
Kondo T, Ito K. A reinforcement learning with evolutionary state recruitment strategy for autonomous mobile robots control. Robot Auton Syst, 2004, 46: 111–124
Article Google Scholar
Morimoto J, Doya K. Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning. Robot Auton Syst, 2001, 36: 37–51
Article MATH Google Scholar
Chen C, Dong D. Grey system based reactive navigation of mobile robots using reinforcement learning. Int J Innov Comp Inf Control, 2010, 6: 789–800
Google Scholar
Cheng D Z. Advances in automation and control research in China. Sci China Ser F-Inf Sci, 2009, 52: 1954–1963
Article MATH Google Scholar
Yung N H C, Ye C. An intelligent mobile vehicle navigator based on fuzzy logic and reinforcement learning. IEEE Trans Syst Man Cybern Part B-Cybern, 1999, 29: 314–321
Article Google Scholar
Montesanto A, Tascini G, Puliti P, et al. Navigation with memory in a partially observable environment. Robot Auton Syst, 2006, 54: 84–94
Article Google Scholar
Sutton R. Learning to predict by the methods of temporal difference. Mach Learn, 1988, 3: 9–44
Google Scholar
Watkins J C H, Dayan P. Q-learning. Mach Learn, 1992, 8: 279–292
MATH Google Scholar
Bertsekas D P, Tsitsiklis J N. Neuro-dynamic Programming. Belmont: Athena Scientific, 1996. 36–51
MATH Google Scholar
Chen C, Dong D, Chen Z. Grey reinforcement learning for incomplete information processing. Lect Notes Comput Sci, 2006, 3959: 399–407
Article MathSciNet Google Scholar
Dong D, Chen C, Li H, et al. Quantum reinforcement learning. IEEE Trans Syst Man Cybern Part B-Cybern, 2008, 38: 1207–1220
Article Google Scholar
Dong D, Chen C, Tarn T J, et al. Incoherent control of quantum systems with wavefunction controllable subspaces via quantum reinforcement learning. IEEE Trans Syst Man Cybern Part B-Cybern, 2008, 38: 957–962
Article Google Scholar
Chen C, Dong D, Chen Z. Quantum computation for action selection using reinforcement learning. Int J Quantum Inf, 2006, 4: 1071–1083
Article MATH Google Scholar
Dong D, Chen C, Chen Z, et al. Quantum mechanics helps in learning for more intelligent robots. Chin Phys Lett, 2006, 23: 1691–1694
Article Google Scholar
Dong D, Chen C, Zhang C, et al. Quantum robot: structure, algorithms and applications. Robotica, 2006, 24: 513–521
Article Google Scholar
Jing P, Ronald J W. Increment multi-step Q-learning. Mach Learn, 1996, 22: 283–291
Google Scholar
Mahadevan S. Average reward reinforcement learning: Foundations, algorithms and empirical results. Mach Learn, 1996, 22: 159–195
Google Scholar
Althaus P, Christensen H I. Smooth task switching through behavior competition. Robot Auton Syst, 2003, 44: 241–249
Article Google Scholar
Hallerdal M, Hallamy J. Behavior selection on a mobile robot using W-learning. In: Hallam B, Floreano D, Hallam J, et al., eds. Proceedings of the Seventh International Conference on the Simulation of Adaptive Behavior on from animals to animates, Edinburgh, UK, 2002. 93–102
Wiering M, Schmidhuber J. HQ-Learning. Adapt Behav, 1997, 6: 219–246
Article Google Scholar
Barto A G, Mahadevan S. Recent advances in hierarchical reinforcement learning. Discret Event Dyn Syst-Theory Appl, 2003, 13: 41–77
Article MathSciNet MATH Google Scholar
Chen C, Chen Z. Reinforcement learning for mobile robot: From reaction to deliberation. J Syst Eng Electron, 2005, 16: 611–617
Google Scholar
Tsitsiklis J N, VanRoy B. An analysis of temporal-difference learning with function approximation. IEEE Trans Autom Control, 1997, 42: 674–690
Article MathSciNet MATH Google Scholar
Sutton R S, McAllester D, Singh S, et al. Policy gradient methods for reinforcement learning with function approximation. Adv Neural Inf Process Syst, 2000, 12: 1057–1063
Google Scholar
Ormoneit D, Sen S. Kernel-based reinforcement learning. Mach Learn, 2002, 49: 161–178
Article MATH Google Scholar
Sutton R, Precup D, Singh S. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artif Intell, 1999, 112: 181–211
Article MathSciNet MATH Google Scholar
Parr P, Russell S. Reinforcement learning with hierarchies of machines. Adv Neural Inf Process Syst, 1998, 10: 1043–1049
Google Scholar
Dietterich T G. Hierarchical reinforcement learning with the Maxq value function decomposition. J Artif Intell Res, 2000, 13: 227–303
MathSciNet MATH Google Scholar
Theocharous G. Hierarchical learning and planning in partially observable Markov decision processes. Dissertation for Doctoral Degree. East Lansing: Michigan State University, USA, 2002. 30–72
Google Scholar
Chen C, Li H, Dong D. Hybrid control for autonomous mobile robot navigation-a hierarchical Q-learning algorithm. IEEE Robot Autom Mag, 2008, 15: 37–47
Article Google Scholar
Kuipers B. Qualitative Reasoning: Modeling and Simulation with Incomplete Knowledge. Cambridge: MIT Press, 1994. 1–27
Google Scholar
Berleant D, Kuipers B. Qualitative and quantitative simulation: Bridging the gap. Artif Intell, 1997, 95: 215–255
Article MATH Google Scholar
Guo M Z, Liu Y, Malec J. A new Q-learning algorithm based on the metropolis criterion. IEEE Trans Syst Man Cybern Part B-Cybern, 2004, 34: 2140–2143
Article Google Scholar
Dong D, Chen C, Chu J, et al. Robust quantum-inspired reinforcement learning for robot navigation. IEEE-ASME Trans Mechatron, 2011, in press

Download references

Author information

Authors and Affiliations

Department of Control and System Engineering, and State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210093, China
ChunLin Chen
Institute of Cyber-Systems and Control, State Key Laboratory of Industrial Control Technology, Zhejiang University, Hangzhou, 310027, China
DaoYi Dong
School of Engineering and Information Technology, University of New South Wales at the Australian Defence Force Academy, Canberra, ACT, 2600, Australia
DaoYi Dong
Department of Manufacturing Engineering and Engineering Management, City University of Hong Kong, Hong Kong, 999077, China
Han-Xiong Li
Department of Electrical and Systems Engineering, Washington University in St. Louis, St. Louis, MO, 63130, USA
Tzyh-Jong Tarn

Authors

ChunLin Chen
View author publications
You can also search for this author in PubMed Google Scholar
DaoYi Dong
View author publications
You can also search for this author in PubMed Google Scholar
Han-Xiong Li
View author publications
You can also search for this author in PubMed Google Scholar
Tzyh-Jong Tarn
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to DaoYi Dong.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, C., Dong, D., Li, HX. et al. Hybrid MDP based integrated hierarchical Q-learning. Sci. China Inf. Sci. 54, 2279–2294 (2011). https://doi.org/10.1007/s11432-011-4332-6

Download citation

Received: 16 April 2009
Accepted: 11 April 2010
Published: 11 August 2011
Issue Date: November 2011
DOI: https://doi.org/10.1007/s11432-011-4332-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hybrid MDP based integrated hierarchical Q-learning

Abstract

Access this article

Similar content being viewed by others

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

A practical guide to multi-objective reinforcement learning and planning

Game-theoretic multi-agent motion planning in a mixed environment

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Hybrid MDP based integrated hierarchical Q-learning

Abstract

Access this article

Similar content being viewed by others

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

A practical guide to multi-objective reinforcement learning and planning

Game-theoretic multi-agent motion planning in a mixed environment

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation