Automatic Segmentation of Sequences through Hierarchical Reinforcement Learning

Sun, R.; Sessions, C.

doi:10.1007/3-540-44565-X_11

R. Sun^3,4 &
C. Sessions^3,4

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1828))

1034 Accesses
1 Citations

Abstract

Sequential behaviors (sequential decision processes) are fundamental to cognitive agents. The use of reinforcement learning (RL) for acquiring sequential behaviors is appropriate, and even necessary, when there is no domain-specific a priori knowledge available to agents (Sutton 1995, Barto et al 1995, Kaelbling et al 1996, Bertsekas and Tsitsiklis 1996, Watkins 1989). Given the complexity and differing scales of events in the world, there is a need for hierarchical RL that can produce action sequences and subsequences that correspond with domain structures. This has been demonstrated time and again, in terms of facilitating learning and/or dealing with non-Markovian dependencies, e.g., by Dayan and Hinton (1993), Kaelbling (1993), Lin (1993), Wiering and Schmidhuber (1998), Tadepalli and Dietterich (1997), Parr and Russell (1997), Dietterich (1997), and many others. Different levels of action subsequencing correspond to different levels of abstraction. Thus, subsequencing facilitates hierarchical planning as studied in traditional AI as well (Sacerdoti 1974, Knoblock, Tenenberg, and Yang 1994, Sun and Sessions 1998).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

F. Bacchus and Q. Yang, (1994). Downward refinement and the efficiency of hierarchical problem solving. Artificial Intelligence. 71,1, 43–100.
Article MATH MathSciNet Google Scholar
D. Bertsekas and J. Tsitsiklis, (1996). Neuro-Dynamic Programming. Athena Scientific, Belmont, MA.
MATH Google Scholar
A. Cassandra, L. Kaelbling, and M. Littman, (1994). Acting optimally in partially observable stochastic domains. Proc. of 12th National Conference on Artificial Intelligence. Morgan Kaufmann, San Mateo, CA.
Google Scholar
L. Chrisman, (1993). Reinforcement learning with perceptual aliasing: the perceptual distinction approach. Proc. of AAAI. 183–188. Morgan Kaufmann, San Mateo, CA.
Google Scholar
P. Dayan and G. Hinton, (1993). Feudal reinforcement learning. Advances in Neural Information Processing Systems. MIT Press, Cambridge, MA.
Google Scholar
T. Dietterich, (1997). Hierarchical reinforcement learning with MAXQ value function decomposition. http://www:engr:orst:edu/~tgd/cv/pubs.html
J. Elman, (1990). Finding structure in time. Cognitive Science. 14, 179–212.
Article Google Scholar
P. Frasconi, M. Gori, and G. Soda, (1995). Recurrent neural networks and prior knowledge for sequence processing. Knowledge Based Systems. 8,6, 313–332.
Article Google Scholar
C.L. Giles, B.G. Horne, and T. Lin, (1995). Learning a class of large finite state machines with a recurrent neural network. Neural Networks, 8(9), 1359–1365.
Article Google Scholar
L. Kaelbling, (1993). Hierarchical learning in stochastic domains: preliminary results. Proc. of ICML, 167–173. Morgan Kaufmann, San Francisco, CA.
Google Scholar
L. Kaelbling, M. Littman, and A. Moore, (1996). Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4, 237–285.
Google Scholar
C. Knoblock, J. Tenenberg, and Q. Yang, (1994). Characterizing abstraction hierarchies for planning. Proc of AAAI’94. 692–697. Morgan Kaufmann, San Mateo, CA.
Google Scholar
L. Lin, (1993). Reinforcement Learning for Robots Using Neural Networks. Ph.D. Thesis, Carnegie Mellon University, Pittsburgh.
Google Scholar
A. McCallum, (1996). Learning to use selective attention and short-term memory in sequential tasks. Proc. Conference on Simulation of Adaptive Behavior. 315–324. MIT Press, Cambridge, MA.
Google Scholar
A. McCallum, (1996b). Reinforcement Learning with Selective Perception and Hidden State. Ph.D Thesis, Department of Computer Science, University of Rochester, Rochester, NY.
Google Scholar
G. Monohan, (1982). A survey of partially observable Markov decision processes: theory, models, and algorithms. Management Science, 28(1), 1–16.
Article MathSciNet Google Scholar
R. Parr and S. Russell, (1995). Approximating optimal policies for partially observable stochastic domains. Proc. of IJCAI’95. 1088–1094. Morgan Kaufmann, San Mateo, CA.
Google Scholar
R. Parr and S. Russell, (1997). Reinforcement learning with hierarchies of machines. Advances in Neural Information Processing Systems 9. MIT Press, Cambridge, MA.
Google Scholar
D. Precup, R. Sutton, and S. Singh, (1998). Multi-time models for temporary abstract planning. Advances in Neural Information Processing Systems 10. MIT Press, Cambridge, MA.
Google Scholar
M. Puterman, (1994). Markov Decision Processes. Wiley-Inter-science. New York.
MATH Google Scholar
M. Ring, (1991). Incremental development of complex behaviors through automatic construction of sensory-motor hierarchies. Proc. of ICML. 343–347. Morgan Kaufmann, San Francisco, CA.
Google Scholar
E. Sacerdoti, (1974). Planning in a hierarchy of abstraction spaces. Artificial Intelligence. 5, 115–135.
Article MATH Google Scholar
J. Schmidhuber, (1992). Learning complex, extended sequences using the principle of history compression. Neural Computation, 4(2), 234–242.
Article Google Scholar
J. Schmidhuber, (1993). Learning unambiguous reduced sequence descriptions. Advances in Neural Information Processing Systems, 291–298.
Google Scholar
S. Singh, (1994). Learning to Solve Markovian Decision Processes. Ph.D Thesis, University of Massachusetts, Amherst, MA.
Google Scholar
E. Sondik, (1978). The optimal control of partially observable Markov processes over the infinite horizon: discounted costs. Operations research, 26(2).
Google Scholar
R. Sun and T. Peterson, (1999). Multi-agent reinforcement learning: weighting and partitioning. Neural Networks, Vol.12 No.4–5. pp.127–153.
Google Scholar
R. Sun and C. Sessions, (1998). Learning plans without a priori knowledge. Adaptive Behavior, in press. A shortened version appeared in Proceedings of WCCI-IJCNN’98, vol.1, 1–6. IEEE Press, Piscateway, NJ.
Google Scholar
R. Sutton, (1995). TD models: modeling the world at a mixture of time scales. Proc. of ICML. Morgan Kaufmann, San Francisco, CA.
Google Scholar
P. Tadepalli and T. Dietterich, (1997). Hierarchical explanation-based reinforcement learning. Proc. International Conference on Machine Learning. 358–366. Morgan Kaufmann, San Francisco, CA.
Google Scholar
C. Tham, (1995). Reinforcement learning of multiple tasks using a hierarchical CMAC architecture. Robotics and Autonomous Systems. 15, 247–274.
Article Google Scholar
S. Thrun and A. Schwartz, (1995). Finding structure in reinforcement learning. Neural Information Processing Systems. MIT Press, Cambridge, MA.
Google Scholar
C. Watkins, (1989). Learning with Delayed Rewards. Ph.D Thesis, Cambridge University, Cambridge, UK.
Google Scholar
S. Whitehead and L. Lin, (1995). Reinforcement learning of non-Markov decision processes. Artificial Intelligence. 73(1–2). 271–306.
Article Google Scholar
M. Wiering and J. Schmidhuber, (1998). HQ-learning. Adaptive Behavior, 6(2), 219–246.
Article Google Scholar

Download references

Author information

Authors and Affiliations

CECS Dept, University of Missouri, Columbia, MO, 65211, USA
R. Sun & C. Sessions
University of Alabama, Tuscaloosa, AL, 35487, USA
R. Sun & C. Sessions

Authors

R. Sun
View author publications
You can also search for this author in PubMed Google Scholar
C. Sessions
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

CECS Department, University of Missouri-Columbia, 201 Engineering Building West, Columbia, MO, 65211-2060, USA
Ron Sun
NEC Research Institute, 4 Independence Way, Princeton, NJ, 08540, USA
C. Lee Giles

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Sun, R., Sessions, C. (2000). Automatic Segmentation of Sequences through Hierarchical Reinforcement Learning. In: Sun, R., Giles, C.L. (eds) Sequence Learning. Lecture Notes in Computer Science(), vol 1828. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44565-X_11

Download citation

DOI: https://doi.org/10.1007/3-540-44565-X_11
Published: 07 December 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41597-8
Online ISBN: 978-3-540-44565-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics