Abstract
As a widely used reinforcement learning method, Q-learning is bedeviled by the curse of dimensionality: The computational complexity grows dramatically with the size of state-action space. To combat this difficulty, an integrated hierarchical Q-learning framework is proposed based on the hybrid Markov decision process (MDP) using temporal abstraction instead of the simple MDP. The learning process is naturally organized into multiple levels of learning, e.g., quantitative (lower) level and qualitative (upper) level, which are modeled as MDP and semi-MDP (SMDP), respectively. This hierarchical control architecture constitutes a hybrid MDP as the model of hierarchical Q-learning, which bridges the two levels of learning. The proposed hierarchical Q-learning can scale up very well and speed up learning with the upper level learning process. Hence this approach is an effective integral learning and control scheme for complex problems. Several experiments are carried out using a puzzle problem in a gridworld environment and a navigation control problem for a mobile robot. The experimental results demonstrate the effectiveness and efficiency of the proposed approach.
Similar content being viewed by others
References
Sutton R, Barto A G. Reinforcement Learning: An Introduction. Cambridge: MIT Press, 1998. 133–156
Feng Z Y, Liang L T, Tan L, et al. Q-learning based heterogenous network self-optimization for reconfigurable network with CPC assistance. Sci China Ser F-Inf Sci, 2009, 52: 2360–2368
He P, Jagannathan S. Reinforcement learning-based output feedback control of nonlinear systems with input constraints. IEEE Trans Syst Man Cybern Part B-Cybern, 2005, 35: 150–154
Kondo T, Ito K. A reinforcement learning with evolutionary state recruitment strategy for autonomous mobile robots control. Robot Auton Syst, 2004, 46: 111–124
Morimoto J, Doya K. Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning. Robot Auton Syst, 2001, 36: 37–51
Chen C, Dong D. Grey system based reactive navigation of mobile robots using reinforcement learning. Int J Innov Comp Inf Control, 2010, 6: 789–800
Cheng D Z. Advances in automation and control research in China. Sci China Ser F-Inf Sci, 2009, 52: 1954–1963
Yung N H C, Ye C. An intelligent mobile vehicle navigator based on fuzzy logic and reinforcement learning. IEEE Trans Syst Man Cybern Part B-Cybern, 1999, 29: 314–321
Montesanto A, Tascini G, Puliti P, et al. Navigation with memory in a partially observable environment. Robot Auton Syst, 2006, 54: 84–94
Sutton R. Learning to predict by the methods of temporal difference. Mach Learn, 1988, 3: 9–44
Watkins J C H, Dayan P. Q-learning. Mach Learn, 1992, 8: 279–292
Bertsekas D P, Tsitsiklis J N. Neuro-dynamic Programming. Belmont: Athena Scientific, 1996. 36–51
Chen C, Dong D, Chen Z. Grey reinforcement learning for incomplete information processing. Lect Notes Comput Sci, 2006, 3959: 399–407
Dong D, Chen C, Li H, et al. Quantum reinforcement learning. IEEE Trans Syst Man Cybern Part B-Cybern, 2008, 38: 1207–1220
Dong D, Chen C, Tarn T J, et al. Incoherent control of quantum systems with wavefunction controllable subspaces via quantum reinforcement learning. IEEE Trans Syst Man Cybern Part B-Cybern, 2008, 38: 957–962
Chen C, Dong D, Chen Z. Quantum computation for action selection using reinforcement learning. Int J Quantum Inf, 2006, 4: 1071–1083
Dong D, Chen C, Chen Z, et al. Quantum mechanics helps in learning for more intelligent robots. Chin Phys Lett, 2006, 23: 1691–1694
Dong D, Chen C, Zhang C, et al. Quantum robot: structure, algorithms and applications. Robotica, 2006, 24: 513–521
Jing P, Ronald J W. Increment multi-step Q-learning. Mach Learn, 1996, 22: 283–291
Mahadevan S. Average reward reinforcement learning: Foundations, algorithms and empirical results. Mach Learn, 1996, 22: 159–195
Althaus P, Christensen H I. Smooth task switching through behavior competition. Robot Auton Syst, 2003, 44: 241–249
Hallerdal M, Hallamy J. Behavior selection on a mobile robot using W-learning. In: Hallam B, Floreano D, Hallam J, et al., eds. Proceedings of the Seventh International Conference on the Simulation of Adaptive Behavior on from animals to animates, Edinburgh, UK, 2002. 93–102
Wiering M, Schmidhuber J. HQ-Learning. Adapt Behav, 1997, 6: 219–246
Barto A G, Mahadevan S. Recent advances in hierarchical reinforcement learning. Discret Event Dyn Syst-Theory Appl, 2003, 13: 41–77
Chen C, Chen Z. Reinforcement learning for mobile robot: From reaction to deliberation. J Syst Eng Electron, 2005, 16: 611–617
Tsitsiklis J N, VanRoy B. An analysis of temporal-difference learning with function approximation. IEEE Trans Autom Control, 1997, 42: 674–690
Sutton R S, McAllester D, Singh S, et al. Policy gradient methods for reinforcement learning with function approximation. Adv Neural Inf Process Syst, 2000, 12: 1057–1063
Ormoneit D, Sen S. Kernel-based reinforcement learning. Mach Learn, 2002, 49: 161–178
Sutton R, Precup D, Singh S. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artif Intell, 1999, 112: 181–211
Parr P, Russell S. Reinforcement learning with hierarchies of machines. Adv Neural Inf Process Syst, 1998, 10: 1043–1049
Dietterich T G. Hierarchical reinforcement learning with the Maxq value function decomposition. J Artif Intell Res, 2000, 13: 227–303
Theocharous G. Hierarchical learning and planning in partially observable Markov decision processes. Dissertation for Doctoral Degree. East Lansing: Michigan State University, USA, 2002. 30–72
Chen C, Li H, Dong D. Hybrid control for autonomous mobile robot navigation-a hierarchical Q-learning algorithm. IEEE Robot Autom Mag, 2008, 15: 37–47
Kuipers B. Qualitative Reasoning: Modeling and Simulation with Incomplete Knowledge. Cambridge: MIT Press, 1994. 1–27
Berleant D, Kuipers B. Qualitative and quantitative simulation: Bridging the gap. Artif Intell, 1997, 95: 215–255
Guo M Z, Liu Y, Malec J. A new Q-learning algorithm based on the metropolis criterion. IEEE Trans Syst Man Cybern Part B-Cybern, 2004, 34: 2140–2143
Dong D, Chen C, Chu J, et al. Robust quantum-inspired reinforcement learning for robot navigation. IEEE-ASME Trans Mechatron, 2011, in press
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chen, C., Dong, D., Li, HX. et al. Hybrid MDP based integrated hierarchical Q-learning. Sci. China Inf. Sci. 54, 2279–2294 (2011). https://doi.org/10.1007/s11432-011-4332-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11432-011-4332-6