Q-Cut—Dynamic Discovery of Sub-goals in Reinforcement Learning

Menache, Ishai; Mannor, Shie; Shimkin, Nahum

doi:10.1007/3-540-36755-1_25

Ishai Menache²,
Shie Mannor² &
Nahum Shimkin²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2430))

Included in the following conference series:

European Conference on Machine Learning

3334 Accesses
59 Citations

Abstract

We present the Q-Cut algorithm, a graph theoretic approach for automatic detection of sub-goals in a dynamic environment, which is used for acceleration of the Q-Learning algorithm. The learning agent creates an on-line map of the process history, and uses an efficient Max-Flow/Min-Cut algorithm for identifying bottlenecks. The policies for reaching bottlenecks are separately learned and added to the model in a form of options (macro-actions). We then extend the basic Q-Cut algorithm to the Segmented Q-Cut algorithm, which uses previously identified bottlenecks for state space partitioning, necessary for finding additional bottlenecks in complex environments. Experiments show significant performance improvements, particulary in the initial learning phase.

Download to read the full chapter text

Chapter PDF

Local Roots: A Tree-Based Subgoal Discovery Method to Accelerate Reinforcement Learning

A graph-theoretic approach toward autonomous skill acquisition in reinforcement learning

Article 22 June 2017

Learning to Solve Sequential Planning Problems Without Rewards

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

R. K. Ahuja, T. L. Magnati, and J. B. Orlin. Network Flows Theory, Algorithms and Applications. Prentice Hall Press, 1993.
Google Scholar
D. P. Bertsekas and J. N. Tsitsiklis. Neuro-Dynamic Programming. Athena Scientific, 1995.
Google Scholar
A. Blum and S. Chawla. Learning from labeled and unlabeled data using graph mincuts. In Proceedings of the 18th International Conference on Machine Learning, pages 19–26. Morgan Kaufmann, 2001.
Google Scholar
P. Dayan and G. E. Hinton. Feudal reinforcement learning. In Advances in Neural Information Processing Systems 5. Morgan Kaufmann, 1993.
Google Scholar
P. Dayan and C. Watkins. Q-learning. Machine Learning, 8:279–292, 1992.
MATH Google Scholar
T. G. Dietterich. Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research, 13:227–303, 2000.
MATH MathSciNet Google Scholar
B. Digney. Learning hierarchical control structure for multiple tasks and changing environments. In Proceedings of the Fifth Conference on the Simulation of Adaptive Behavior: SAB 98, 1998.
Google Scholar
A. V. Goldberg and R. E. Tarjan. A new approach to the maximum-flow problem. Journal of ACM, 35(4):921–940, October 1988.
Article MATH MathSciNet Google Scholar
D. J. Huang and A. B. Kahng. When clusters meet partitions: A new density based methods for circuit decomposition. In Proceedings of the European Design and Test Conference, pages 60–64, 1995.
Google Scholar
L. G. Lin. Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning, 8(3):293–321, 1992.
Google Scholar
A. McGovern and A. G. Barto. Automatic discovery of subgoals in reinforcement learning using diverse density. In Proceedings of the 18th International Conference on Machine Learning, pages 361–368. Morgan Kaufmann, 2001.
Google Scholar
A. McGovern, R. S. Sutton, and A. H. Fagg. Roles of macro-actions in accelerating reinforcement learning. In Proceedings of the 1997 Grace Hopper Celebration of Women in Computing, pages 13–18, 1997.
Google Scholar
J. Morimoto and K. Doya. Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning. In Proceedings of the 17th International Conference on Machine Learning, pages 623–630. Morgan Kaufmann, 2000.
Google Scholar
S. P. Singh, T. Jaakkola, and M. I. Jordan. Reinforcement learning with soft state aggregation. In Advances in Neural Information Processing Systems, volume 7, pages 361–368. The MIT Press, 1995.
Google Scholar
R. S. Sutton, D. Precup, and S. Singh. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112:181–211, 1999.
Article MATH MathSciNet Google Scholar
J. N. Tsitsiklis and B. Van Roy. An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control, 42(5):674–690, 1997.
Article MATH Google Scholar
Y. C. Wei and C. K. Cheng. Ratio cut partitioning for hierarchical designs. IEEE/ACM Transaction on Networking, 10(7):911–921, 1991.
Google Scholar
M. Wiering and J. Schmidhuber. HQ-learning. Adaptive Behavior, 6(2):219–246, 1997.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, Technion, Israel Institute of Technology, 32000, Haifa, Israel
Ishai Menache, Shie Mannor & Nahum Shimkin

Authors

Ishai Menache
View author publications
You can also search for this author in PubMed Google Scholar
Shie Mannor
View author publications
You can also search for this author in PubMed Google Scholar
Nahum Shimkin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Helsinki, P.O. Box 26, 00014, Helsinki, Finland
Tapio Elomaa , Heikki Mannila & Hannu Toivonen , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Menache, I., Mannor, S., Shimkin, N. (2002). Q-Cut—Dynamic Discovery of Sub-goals in Reinforcement Learning. In: Elomaa, T., Mannila, H., Toivonen, H. (eds) Machine Learning: ECML 2002. ECML 2002. Lecture Notes in Computer Science(), vol 2430. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36755-1_25

Download citation

DOI: https://doi.org/10.1007/3-540-36755-1_25
Published: 20 September 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44036-9
Online ISBN: 978-3-540-36755-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics