Skip to main content
Log in

Hybrid MDP based integrated hierarchical Q-learning

  • Research Papers
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

Abstract

As a widely used reinforcement learning method, Q-learning is bedeviled by the curse of dimensionality: The computational complexity grows dramatically with the size of state-action space. To combat this difficulty, an integrated hierarchical Q-learning framework is proposed based on the hybrid Markov decision process (MDP) using temporal abstraction instead of the simple MDP. The learning process is naturally organized into multiple levels of learning, e.g., quantitative (lower) level and qualitative (upper) level, which are modeled as MDP and semi-MDP (SMDP), respectively. This hierarchical control architecture constitutes a hybrid MDP as the model of hierarchical Q-learning, which bridges the two levels of learning. The proposed hierarchical Q-learning can scale up very well and speed up learning with the upper level learning process. Hence this approach is an effective integral learning and control scheme for complex problems. Several experiments are carried out using a puzzle problem in a gridworld environment and a navigation control problem for a mobile robot. The experimental results demonstrate the effectiveness and efficiency of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Sutton R, Barto A G. Reinforcement Learning: An Introduction. Cambridge: MIT Press, 1998. 133–156

    Google Scholar 

  2. Feng Z Y, Liang L T, Tan L, et al. Q-learning based heterogenous network self-optimization for reconfigurable network with CPC assistance. Sci China Ser F-Inf Sci, 2009, 52: 2360–2368

    Article  MATH  Google Scholar 

  3. He P, Jagannathan S. Reinforcement learning-based output feedback control of nonlinear systems with input constraints. IEEE Trans Syst Man Cybern Part B-Cybern, 2005, 35: 150–154

    Article  Google Scholar 

  4. Kondo T, Ito K. A reinforcement learning with evolutionary state recruitment strategy for autonomous mobile robots control. Robot Auton Syst, 2004, 46: 111–124

    Article  Google Scholar 

  5. Morimoto J, Doya K. Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning. Robot Auton Syst, 2001, 36: 37–51

    Article  MATH  Google Scholar 

  6. Chen C, Dong D. Grey system based reactive navigation of mobile robots using reinforcement learning. Int J Innov Comp Inf Control, 2010, 6: 789–800

    Google Scholar 

  7. Cheng D Z. Advances in automation and control research in China. Sci China Ser F-Inf Sci, 2009, 52: 1954–1963

    Article  MATH  Google Scholar 

  8. Yung N H C, Ye C. An intelligent mobile vehicle navigator based on fuzzy logic and reinforcement learning. IEEE Trans Syst Man Cybern Part B-Cybern, 1999, 29: 314–321

    Article  Google Scholar 

  9. Montesanto A, Tascini G, Puliti P, et al. Navigation with memory in a partially observable environment. Robot Auton Syst, 2006, 54: 84–94

    Article  Google Scholar 

  10. Sutton R. Learning to predict by the methods of temporal difference. Mach Learn, 1988, 3: 9–44

    Google Scholar 

  11. Watkins J C H, Dayan P. Q-learning. Mach Learn, 1992, 8: 279–292

    MATH  Google Scholar 

  12. Bertsekas D P, Tsitsiklis J N. Neuro-dynamic Programming. Belmont: Athena Scientific, 1996. 36–51

    MATH  Google Scholar 

  13. Chen C, Dong D, Chen Z. Grey reinforcement learning for incomplete information processing. Lect Notes Comput Sci, 2006, 3959: 399–407

    Article  MathSciNet  Google Scholar 

  14. Dong D, Chen C, Li H, et al. Quantum reinforcement learning. IEEE Trans Syst Man Cybern Part B-Cybern, 2008, 38: 1207–1220

    Article  Google Scholar 

  15. Dong D, Chen C, Tarn T J, et al. Incoherent control of quantum systems with wavefunction controllable subspaces via quantum reinforcement learning. IEEE Trans Syst Man Cybern Part B-Cybern, 2008, 38: 957–962

    Article  Google Scholar 

  16. Chen C, Dong D, Chen Z. Quantum computation for action selection using reinforcement learning. Int J Quantum Inf, 2006, 4: 1071–1083

    Article  MATH  Google Scholar 

  17. Dong D, Chen C, Chen Z, et al. Quantum mechanics helps in learning for more intelligent robots. Chin Phys Lett, 2006, 23: 1691–1694

    Article  Google Scholar 

  18. Dong D, Chen C, Zhang C, et al. Quantum robot: structure, algorithms and applications. Robotica, 2006, 24: 513–521

    Article  Google Scholar 

  19. Jing P, Ronald J W. Increment multi-step Q-learning. Mach Learn, 1996, 22: 283–291

    Google Scholar 

  20. Mahadevan S. Average reward reinforcement learning: Foundations, algorithms and empirical results. Mach Learn, 1996, 22: 159–195

    Google Scholar 

  21. Althaus P, Christensen H I. Smooth task switching through behavior competition. Robot Auton Syst, 2003, 44: 241–249

    Article  Google Scholar 

  22. Hallerdal M, Hallamy J. Behavior selection on a mobile robot using W-learning. In: Hallam B, Floreano D, Hallam J, et al., eds. Proceedings of the Seventh International Conference on the Simulation of Adaptive Behavior on from animals to animates, Edinburgh, UK, 2002. 93–102

  23. Wiering M, Schmidhuber J. HQ-Learning. Adapt Behav, 1997, 6: 219–246

    Article  Google Scholar 

  24. Barto A G, Mahadevan S. Recent advances in hierarchical reinforcement learning. Discret Event Dyn Syst-Theory Appl, 2003, 13: 41–77

    Article  MathSciNet  MATH  Google Scholar 

  25. Chen C, Chen Z. Reinforcement learning for mobile robot: From reaction to deliberation. J Syst Eng Electron, 2005, 16: 611–617

    Google Scholar 

  26. Tsitsiklis J N, VanRoy B. An analysis of temporal-difference learning with function approximation. IEEE Trans Autom Control, 1997, 42: 674–690

    Article  MathSciNet  MATH  Google Scholar 

  27. Sutton R S, McAllester D, Singh S, et al. Policy gradient methods for reinforcement learning with function approximation. Adv Neural Inf Process Syst, 2000, 12: 1057–1063

    Google Scholar 

  28. Ormoneit D, Sen S. Kernel-based reinforcement learning. Mach Learn, 2002, 49: 161–178

    Article  MATH  Google Scholar 

  29. Sutton R, Precup D, Singh S. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artif Intell, 1999, 112: 181–211

    Article  MathSciNet  MATH  Google Scholar 

  30. Parr P, Russell S. Reinforcement learning with hierarchies of machines. Adv Neural Inf Process Syst, 1998, 10: 1043–1049

    Google Scholar 

  31. Dietterich T G. Hierarchical reinforcement learning with the Maxq value function decomposition. J Artif Intell Res, 2000, 13: 227–303

    MathSciNet  MATH  Google Scholar 

  32. Theocharous G. Hierarchical learning and planning in partially observable Markov decision processes. Dissertation for Doctoral Degree. East Lansing: Michigan State University, USA, 2002. 30–72

    Google Scholar 

  33. Chen C, Li H, Dong D. Hybrid control for autonomous mobile robot navigation-a hierarchical Q-learning algorithm. IEEE Robot Autom Mag, 2008, 15: 37–47

    Article  Google Scholar 

  34. Kuipers B. Qualitative Reasoning: Modeling and Simulation with Incomplete Knowledge. Cambridge: MIT Press, 1994. 1–27

    Google Scholar 

  35. Berleant D, Kuipers B. Qualitative and quantitative simulation: Bridging the gap. Artif Intell, 1997, 95: 215–255

    Article  MATH  Google Scholar 

  36. Guo M Z, Liu Y, Malec J. A new Q-learning algorithm based on the metropolis criterion. IEEE Trans Syst Man Cybern Part B-Cybern, 2004, 34: 2140–2143

    Article  Google Scholar 

  37. Dong D, Chen C, Chu J, et al. Robust quantum-inspired reinforcement learning for robot navigation. IEEE-ASME Trans Mechatron, 2011, in press

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to DaoYi Dong.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, C., Dong, D., Li, HX. et al. Hybrid MDP based integrated hierarchical Q-learning. Sci. China Inf. Sci. 54, 2279–2294 (2011). https://doi.org/10.1007/s11432-011-4332-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11432-011-4332-6

Keywords

Navigation