Abstract
Sequential behaviors (sequential decision processes) are fundamental to cognitive agents. The use of reinforcement learning (RL) for acquiring sequential behaviors is appropriate, and even necessary, when there is no domain-specific a priori knowledge available to agents (Sutton 1995, Barto et al 1995, Kaelbling et al 1996, Bertsekas and Tsitsiklis 1996, Watkins 1989). Given the complexity and differing scales of events in the world, there is a need for hierarchical RL that can produce action sequences and subsequences that correspond with domain structures. This has been demonstrated time and again, in terms of facilitating learning and/or dealing with non-Markovian dependencies, e.g., by Dayan and Hinton (1993), Kaelbling (1993), Lin (1993), Wiering and Schmidhuber (1998), Tadepalli and Dietterich (1997), Parr and Russell (1997), Dietterich (1997), and many others. Different levels of action subsequencing correspond to different levels of abstraction. Thus, subsequencing facilitates hierarchical planning as studied in traditional AI as well (Sacerdoti 1974, Knoblock, Tenenberg, and Yang 1994, Sun and Sessions 1998).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
F. Bacchus and Q. Yang, (1994). Downward refinement and the efficiency of hierarchical problem solving. Artificial Intelligence. 71,1, 43–100.
D. Bertsekas and J. Tsitsiklis, (1996). Neuro-Dynamic Programming. Athena Scientific, Belmont, MA.
A. Cassandra, L. Kaelbling, and M. Littman, (1994). Acting optimally in partially observable stochastic domains. Proc. of 12th National Conference on Artificial Intelligence. Morgan Kaufmann, San Mateo, CA.
L. Chrisman, (1993). Reinforcement learning with perceptual aliasing: the perceptual distinction approach. Proc. of AAAI. 183–188. Morgan Kaufmann, San Mateo, CA.
P. Dayan and G. Hinton, (1993). Feudal reinforcement learning. Advances in Neural Information Processing Systems. MIT Press, Cambridge, MA.
T. Dietterich, (1997). Hierarchical reinforcement learning with MAXQ value function decomposition. http://www:engr:orst:edu/~tgd/cv/pubs.html
J. Elman, (1990). Finding structure in time. Cognitive Science. 14, 179–212.
P. Frasconi, M. Gori, and G. Soda, (1995). Recurrent neural networks and prior knowledge for sequence processing. Knowledge Based Systems. 8,6, 313–332.
C.L. Giles, B.G. Horne, and T. Lin, (1995). Learning a class of large finite state machines with a recurrent neural network. Neural Networks, 8(9), 1359–1365.
L. Kaelbling, (1993). Hierarchical learning in stochastic domains: preliminary results. Proc. of ICML, 167–173. Morgan Kaufmann, San Francisco, CA.
L. Kaelbling, M. Littman, and A. Moore, (1996). Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4, 237–285.
C. Knoblock, J. Tenenberg, and Q. Yang, (1994). Characterizing abstraction hierarchies for planning. Proc of AAAI’94. 692–697. Morgan Kaufmann, San Mateo, CA.
L. Lin, (1993). Reinforcement Learning for Robots Using Neural Networks. Ph.D. Thesis, Carnegie Mellon University, Pittsburgh.
A. McCallum, (1996). Learning to use selective attention and short-term memory in sequential tasks. Proc. Conference on Simulation of Adaptive Behavior. 315–324. MIT Press, Cambridge, MA.
A. McCallum, (1996b). Reinforcement Learning with Selective Perception and Hidden State. Ph.D Thesis, Department of Computer Science, University of Rochester, Rochester, NY.
G. Monohan, (1982). A survey of partially observable Markov decision processes: theory, models, and algorithms. Management Science, 28(1), 1–16.
R. Parr and S. Russell, (1995). Approximating optimal policies for partially observable stochastic domains. Proc. of IJCAI’95. 1088–1094. Morgan Kaufmann, San Mateo, CA.
R. Parr and S. Russell, (1997). Reinforcement learning with hierarchies of machines. Advances in Neural Information Processing Systems 9. MIT Press, Cambridge, MA.
D. Precup, R. Sutton, and S. Singh, (1998). Multi-time models for temporary abstract planning. Advances in Neural Information Processing Systems 10. MIT Press, Cambridge, MA.
M. Puterman, (1994). Markov Decision Processes. Wiley-Inter-science. New York.
M. Ring, (1991). Incremental development of complex behaviors through automatic construction of sensory-motor hierarchies. Proc. of ICML. 343–347. Morgan Kaufmann, San Francisco, CA.
E. Sacerdoti, (1974). Planning in a hierarchy of abstraction spaces. Artificial Intelligence. 5, 115–135.
J. Schmidhuber, (1992). Learning complex, extended sequences using the principle of history compression. Neural Computation, 4(2), 234–242.
J. Schmidhuber, (1993). Learning unambiguous reduced sequence descriptions. Advances in Neural Information Processing Systems, 291–298.
S. Singh, (1994). Learning to Solve Markovian Decision Processes. Ph.D Thesis, University of Massachusetts, Amherst, MA.
E. Sondik, (1978). The optimal control of partially observable Markov processes over the infinite horizon: discounted costs. Operations research, 26(2).
R. Sun and T. Peterson, (1999). Multi-agent reinforcement learning: weighting and partitioning. Neural Networks, Vol.12 No.4–5. pp.127–153.
R. Sun and C. Sessions, (1998). Learning plans without a priori knowledge. Adaptive Behavior, in press. A shortened version appeared in Proceedings of WCCI-IJCNN’98, vol.1, 1–6. IEEE Press, Piscateway, NJ.
R. Sutton, (1995). TD models: modeling the world at a mixture of time scales. Proc. of ICML. Morgan Kaufmann, San Francisco, CA.
P. Tadepalli and T. Dietterich, (1997). Hierarchical explanation-based reinforcement learning. Proc. International Conference on Machine Learning. 358–366. Morgan Kaufmann, San Francisco, CA.
C. Tham, (1995). Reinforcement learning of multiple tasks using a hierarchical CMAC architecture. Robotics and Autonomous Systems. 15, 247–274.
S. Thrun and A. Schwartz, (1995). Finding structure in reinforcement learning. Neural Information Processing Systems. MIT Press, Cambridge, MA.
C. Watkins, (1989). Learning with Delayed Rewards. Ph.D Thesis, Cambridge University, Cambridge, UK.
S. Whitehead and L. Lin, (1995). Reinforcement learning of non-Markov decision processes. Artificial Intelligence. 73(1–2). 271–306.
M. Wiering and J. Schmidhuber, (1998). HQ-learning. Adaptive Behavior, 6(2), 219–246.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Sun, R., Sessions, C. (2000). Automatic Segmentation of Sequences through Hierarchical Reinforcement Learning. In: Sun, R., Giles, C.L. (eds) Sequence Learning. Lecture Notes in Computer Science(), vol 1828. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44565-X_11
Download citation
DOI: https://doi.org/10.1007/3-540-44565-X_11
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41597-8
Online ISBN: 978-3-540-44565-4
eBook Packages: Springer Book Archive