Skip to main content

Automatic Segmentation of Sequences through Hierarchical Reinforcement Learning

  • Chapter
  • First Online:
Sequence Learning

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1828))

Abstract

Sequential behaviors (sequential decision processes) are fundamental to cognitive agents. The use of reinforcement learning (RL) for acquiring sequential behaviors is appropriate, and even necessary, when there is no domain-specific a priori knowledge available to agents (Sutton 1995, Barto et al 1995, Kaelbling et al 1996, Bertsekas and Tsitsiklis 1996, Watkins 1989). Given the complexity and differing scales of events in the world, there is a need for hierarchical RL that can produce action sequences and subsequences that correspond with domain structures. This has been demonstrated time and again, in terms of facilitating learning and/or dealing with non-Markovian dependencies, e.g., by Dayan and Hinton (1993), Kaelbling (1993), Lin (1993), Wiering and Schmidhuber (1998), Tadepalli and Dietterich (1997), Parr and Russell (1997), Dietterich (1997), and many others. Different levels of action subsequencing correspond to different levels of abstraction. Thus, subsequencing facilitates hierarchical planning as studied in traditional AI as well (Sacerdoti 1974, Knoblock, Tenenberg, and Yang 1994, Sun and Sessions 1998).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • F. Bacchus and Q. Yang, (1994). Downward refinement and the efficiency of hierarchical problem solving. Artificial Intelligence. 71,1, 43–100.

    Article  MATH  MathSciNet  Google Scholar 

  • D. Bertsekas and J. Tsitsiklis, (1996). Neuro-Dynamic Programming. Athena Scientific, Belmont, MA.

    MATH  Google Scholar 

  • A. Cassandra, L. Kaelbling, and M. Littman, (1994). Acting optimally in partially observable stochastic domains. Proc. of 12th National Conference on Artificial Intelligence. Morgan Kaufmann, San Mateo, CA.

    Google Scholar 

  • L. Chrisman, (1993). Reinforcement learning with perceptual aliasing: the perceptual distinction approach. Proc. of AAAI. 183–188. Morgan Kaufmann, San Mateo, CA.

    Google Scholar 

  • P. Dayan and G. Hinton, (1993). Feudal reinforcement learning. Advances in Neural Information Processing Systems. MIT Press, Cambridge, MA.

    Google Scholar 

  • T. Dietterich, (1997). Hierarchical reinforcement learning with MAXQ value function decomposition. http://www:engr:orst:edu/~tgd/cv/pubs.html

  • J. Elman, (1990). Finding structure in time. Cognitive Science. 14, 179–212.

    Article  Google Scholar 

  • P. Frasconi, M. Gori, and G. Soda, (1995). Recurrent neural networks and prior knowledge for sequence processing. Knowledge Based Systems. 8,6, 313–332.

    Article  Google Scholar 

  • C.L. Giles, B.G. Horne, and T. Lin, (1995). Learning a class of large finite state machines with a recurrent neural network. Neural Networks, 8(9), 1359–1365.

    Article  Google Scholar 

  • L. Kaelbling, (1993). Hierarchical learning in stochastic domains: preliminary results. Proc. of ICML, 167–173. Morgan Kaufmann, San Francisco, CA.

    Google Scholar 

  • L. Kaelbling, M. Littman, and A. Moore, (1996). Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4, 237–285.

    Google Scholar 

  • C. Knoblock, J. Tenenberg, and Q. Yang, (1994). Characterizing abstraction hierarchies for planning. Proc of AAAI’94. 692–697. Morgan Kaufmann, San Mateo, CA.

    Google Scholar 

  • L. Lin, (1993). Reinforcement Learning for Robots Using Neural Networks. Ph.D. Thesis, Carnegie Mellon University, Pittsburgh.

    Google Scholar 

  • A. McCallum, (1996). Learning to use selective attention and short-term memory in sequential tasks. Proc. Conference on Simulation of Adaptive Behavior. 315–324. MIT Press, Cambridge, MA.

    Google Scholar 

  • A. McCallum, (1996b). Reinforcement Learning with Selective Perception and Hidden State. Ph.D Thesis, Department of Computer Science, University of Rochester, Rochester, NY.

    Google Scholar 

  • G. Monohan, (1982). A survey of partially observable Markov decision processes: theory, models, and algorithms. Management Science, 28(1), 1–16.

    Article  MathSciNet  Google Scholar 

  • R. Parr and S. Russell, (1995). Approximating optimal policies for partially observable stochastic domains. Proc. of IJCAI’95. 1088–1094. Morgan Kaufmann, San Mateo, CA.

    Google Scholar 

  • R. Parr and S. Russell, (1997). Reinforcement learning with hierarchies of machines. Advances in Neural Information Processing Systems 9. MIT Press, Cambridge, MA.

    Google Scholar 

  • D. Precup, R. Sutton, and S. Singh, (1998). Multi-time models for temporary abstract planning. Advances in Neural Information Processing Systems 10. MIT Press, Cambridge, MA.

    Google Scholar 

  • M. Puterman, (1994). Markov Decision Processes. Wiley-Inter-science. New York.

    MATH  Google Scholar 

  • M. Ring, (1991). Incremental development of complex behaviors through automatic construction of sensory-motor hierarchies. Proc. of ICML. 343–347. Morgan Kaufmann, San Francisco, CA.

    Google Scholar 

  • E. Sacerdoti, (1974). Planning in a hierarchy of abstraction spaces. Artificial Intelligence. 5, 115–135.

    Article  MATH  Google Scholar 

  • J. Schmidhuber, (1992). Learning complex, extended sequences using the principle of history compression. Neural Computation, 4(2), 234–242.

    Article  Google Scholar 

  • J. Schmidhuber, (1993). Learning unambiguous reduced sequence descriptions. Advances in Neural Information Processing Systems, 291–298.

    Google Scholar 

  • S. Singh, (1994). Learning to Solve Markovian Decision Processes. Ph.D Thesis, University of Massachusetts, Amherst, MA.

    Google Scholar 

  • E. Sondik, (1978). The optimal control of partially observable Markov processes over the infinite horizon: discounted costs. Operations research, 26(2).

    Google Scholar 

  • R. Sun and T. Peterson, (1999). Multi-agent reinforcement learning: weighting and partitioning. Neural Networks, Vol.12 No.4–5. pp.127–153.

    Google Scholar 

  • R. Sun and C. Sessions, (1998). Learning plans without a priori knowledge. Adaptive Behavior, in press. A shortened version appeared in Proceedings of WCCI-IJCNN’98, vol.1, 1–6. IEEE Press, Piscateway, NJ.

    Google Scholar 

  • R. Sutton, (1995). TD models: modeling the world at a mixture of time scales. Proc. of ICML. Morgan Kaufmann, San Francisco, CA.

    Google Scholar 

  • P. Tadepalli and T. Dietterich, (1997). Hierarchical explanation-based reinforcement learning. Proc. International Conference on Machine Learning. 358–366. Morgan Kaufmann, San Francisco, CA.

    Google Scholar 

  • C. Tham, (1995). Reinforcement learning of multiple tasks using a hierarchical CMAC architecture. Robotics and Autonomous Systems. 15, 247–274.

    Article  Google Scholar 

  • S. Thrun and A. Schwartz, (1995). Finding structure in reinforcement learning. Neural Information Processing Systems. MIT Press, Cambridge, MA.

    Google Scholar 

  • C. Watkins, (1989). Learning with Delayed Rewards. Ph.D Thesis, Cambridge University, Cambridge, UK.

    Google Scholar 

  • S. Whitehead and L. Lin, (1995). Reinforcement learning of non-Markov decision processes. Artificial Intelligence. 73(1–2). 271–306.

    Article  Google Scholar 

  • M. Wiering and J. Schmidhuber, (1998). HQ-learning. Adaptive Behavior, 6(2), 219–246.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Sun, R., Sessions, C. (2000). Automatic Segmentation of Sequences through Hierarchical Reinforcement Learning. In: Sun, R., Giles, C.L. (eds) Sequence Learning. Lecture Notes in Computer Science(), vol 1828. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44565-X_11

Download citation

  • DOI: https://doi.org/10.1007/3-540-44565-X_11

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-41597-8

  • Online ISBN: 978-3-540-44565-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics