Skip to main content

Multiscale Anticipatory Behavior by Hierarchical Reinforcement Learning

  • Conference paper
Anticipatory Behavior in Adaptive Learning Systems (ABiALS 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5499))

Included in the following conference series:

  • 1426 Accesses


In order to establish autonomous behavior for technical systems, the well known trade-off between reactive control and deliberative planning has to be considered. Within this paper, we combine both principles by proposing a two-level hierarchical reinforcement learning scheme to enable the system to autonomously determine suitable solutions to new tasks. The approach is based on a behavior representation specified by hybrid automata, which combines continuous and discrete behavior, to predict (anticipate) the outcome of a sequence of actions. On the higher layer of the hierarchical scheme, the behavior is abstracted in the form of finite state automata, on which value function iteration is performed to obtain a goal leading sequence of subtasks. This sequence is realized on the lower layer by applying policy gradient-based reinforcement learning to the hybrid automaton model. The iteration between both layers leads to a consistent and goal-attaining behavior, as shown for a simple robot grasping task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others


  1. Arkin, R.C.: An Behavior-based Robotics. MIT Press, Cambridge (1998)

    Google Scholar 

  2. Baird, L.: Residual algorithms: Reinforcement learning with function approximation. In: Proceedings of the Twelfth International Conference on Machine Learning, pp. 30–37 (1995)

    Google Scholar 

  3. Bertsekas, D.P., Tsitsiklis, J.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)

    MATH  Google Scholar 

  4. Branicky, M.S.: Behavioral Programming. In: Working Notes AAAI Spring Symp. on Hybrid Systems and AI (1999)

    Google Scholar 

  5. Butz, M.V., Sigaud, O., Gérard, P.: Anticipatory Behavior: Exploiting Knowledge About the Future to Improve Current Behavior. In: Butz, M.V., Sigaud, O., Gérard, P. (eds.) Anticipatory Behavior in Adaptive Learning Systems. LNCS, vol. 2684, pp. 1–10. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  6. Butz, M.V., Sigaud, O., Gérard, P.: Internal Models and Anticipations in Adaptive Learning Systems. In: Butz, M.V., Sigaud, O., Gérard, P. (eds.) Anticipatory Behavior in Adaptive Learning Systems. LNCS, vol. 2684, pp. 86–109. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  7. Dietterich, T.G.: Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research 13, 227–303 (2000)

    MathSciNet  MATH  Google Scholar 

  8. Ding, H., Rungger, M., Stursberg, O.: Intelligent Planning of Manufacturing Systems with Hybrid Dynamics. In: IFAC Conf. on Manufacturing Modeling, Management, and Control, pp. 181–186 (2007)

    Google Scholar 

  9. Doya, K.: Reinforcement learning in continuous time and space. Neural Comput. 12(1), 219–245 (2000)

    Article  Google Scholar 

  10. Egerstedt, M.: Behavior Based Robotics Using Hybrid Automata. In: Lynch, N.A., Krogh, B.H. (eds.) HSCC 2000. LNCS, vol. 1790, pp. 103–116. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  11. Henzinger, T.: The Theory of Hybrid Automata. In: Proceedings of the 11th Annual IEEE Symposium on Logic in Computer Science (LICS 1996), pp. 278–292 (1996)

    Google Scholar 

  12. Mataric, M.J.: Reward functions for accelerated learning. In: Proc. of the 11th Int. Conf. on Machine Learning, pp. 181–189. Morgan Kaufmann, San Francisco (1994)

    Google Scholar 

  13. Tejas, R.: Mehta and Magnus Egerstedt. Multi-modal control using adaptive motion description languages. Automatica 44, 1912–1917 (2008)

    Article  MATH  Google Scholar 

  14. Morimoto, J., Doya, K.: Acquisition of stand-up behavior by a real robot using hierarchical RL. Robotics and Autonomous Systems 36(1), 37–51 (2001)

    Article  MATH  Google Scholar 

  15. Parr, R., Russell, S.: Russell Reinforcement learning with hierarchies of machines. In: Advances in Neural Information Processing Systems, vol. 10, pp. 1043–1049. The MIT Press, Cambridge (1997)

    Google Scholar 

  16. Pirjanian, P.: Multiple objective behavior-based control 31, 53–60 (2000)

    Google Scholar 

  17. Precup, D., Sutton, R.S., Singh, S.P.: Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence 112(1-2), 181–211 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  18. Rungger, M., Stursberg, O., Spanfelner, B., Leuxner, C., Sitou, W.: Efficient Planning of Autonomous Robots using Hierarchical Composition. In: 5th Int. Conf. on Informatics, Control, Automation, Robotics, pp. 262–267 (2008)

    Google Scholar 

  19. Mohajerian, P., Schaal, S., Ijspeert, A.: Dynamics Systems vs. Optimal Control – A Unifying View, ch. 27, pp. 425–445. Elsevier, Amsterdam (2007)

    Google Scholar 

  20. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations


Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Rungger, M., Ding, H., Stursberg, O. (2009). Multiscale Anticipatory Behavior by Hierarchical Reinforcement Learning. In: Pezzulo, G., Butz, M.V., Sigaud, O., Baldassarre, G. (eds) Anticipatory Behavior in Adaptive Learning Systems. ABiALS 2008. Lecture Notes in Computer Science(), vol 5499. Springer, Berlin, Heidelberg.

Download citation

  • DOI:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-02564-8

  • Online ISBN: 978-3-642-02565-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics