Employing Automatic Temporal Abstractions to Accelerate Utile Suffix Memory Algorithm

Çilden, Erkin; Polat, Faruk

doi:10.1007/978-3-319-11584-9_11

Erkin Çilden²² &
Faruk Polat²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8732))

Included in the following conference series:

German Conference on Multiagent System Technologies

820 Accesses

Abstract

The main objective of the memory based reinforcement learning algorithms for hidden state problems is to overcome the state aliasing issue using a form of short term memory during learning. Extended sequence tree method, on the other hand, is a sequence based automated temporal abstraction mechanism that can be appended to a reinforcement learning algorithm. Assuming a fully observable problem setting, it tries to find useful sub-policies in solution space that can be reused as timed actions, providing significant savings in terms of learning time. This paper presents a way to expand a well known memory based model-free reinforcement learning algorithm, namely Utile Suffix Memory, by using a modified version of extended sequence tree method. By this way, learning speed of the algorithm is increased under certain conditions. Enhancement is shown empirically via experimentation on some benchmark problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Chrisman, L.: Reinforcement learning with perceptual aliasing: the perceptual distinctions approach. In: Proceedings of the Tenth National Conference on Artificial Intelligence, pp. 183–188. AAAI Press (1992)
Google Scholar
Çilden, E., Polat, F.: Generating memoryless policies faster using automatic temporal abstractions for reinforcement learning with hidden state. In: IEEE 25th International Conference on Tools with Artificial Intelligence, pp. 719–726 (2013)
Google Scholar
Dung, L.T., Komeda, T., Takagi, M.: Reinforcement learning for POMDP using state classification. Applied Artificial Intelligence 22(7-8), 761–779 (2008)
Article Google Scholar
Girgin, S., Polat, F., Alhajj, R.: Improving reinforcement learning by using sequence trees. Machine Learning 81(3), 283–331 (2010)
Article MathSciNet Google Scholar
Hengst, B.: Discovering hierarchy in reinforcement learning with HEXQ. In: Proceedings of the Nineteenth International Conference on Machine Learning, pp. 243–250. Morgan Kaufmann Publishers Inc. (2002)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Computation 9, 1735–1780 (1997)
Article Google Scholar
Littman, M.L., Cassandra, A.R., Kaelbling, L.P.: Learning policies for partially observable environments: Scaling up. In: Huhns, M.N., Singh, M.P. (eds.) Readings in Agents, pp. 495–503. Morgan Kaufmann Publishers Inc. (1998)
Google Scholar
McCallum, A.K.: Reinforcement Learning with Selective Perception and Hidden State. Ph.d. thesis, University of Rochester (1996)
Google Scholar
McGovern, A.: acQuire-macros: An algorithm for automatically learning macro-actions. In: The Neural Information Processing Systems Conference Workshop on Abstraction and Hierarchy in Reinforcement Learning (1998)
Google Scholar
McGovern, A., Barto, A.G.: Automatic discovery of subgoals in reinforcement learning using diverse density. In: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 361–368. Morgan Kaufmann Publishers Inc. (2001)
Google Scholar
Peshkin, L., Meuleau, N., Kaelbling, L.P.: Learning policies with external memory. In: Proceedings of the Sixteenth International Conference on Machine Learning, pp. 307–314. Morgan Kaufmann Publishers Inc. (1999)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press (1998)
Google Scholar
Sutton, R.S., Precup, D., Singh, S.: Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artificial Intelligence 112(1-2), 181–211 (1999)
Article MathSciNet MATH Google Scholar
Yoshikawa, T., Kurihara, M.: An acquiring method of macro-actions in reinforcement learning. In: IEEE International Conference on Systems, Man, and Cybernetics, vol. 6, pp. 4813–4817 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Middle East Technical University, Ankara, Turkey
Erkin Çilden & Faruk Polat

Authors

Erkin Çilden
View author publications
You can also search for this author in PubMed Google Scholar
Faruk Polat
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Technische Universität Clausthal, Clausthal-Zellerfeld, Germany
Jörg P. Müller
institut für Automatisierungs- und Softwaretechnik, Universität Stuttgart, Pfaffenwaldring 47, 70550, Stuttgart, Germany
Michael Weyrich
Instituto de Informática, Universidade Federal do Rio Grande do Sul (UFRGS), Caixa Postal 15064, 91501-970, Porto Alegre, Rio Grande do Sul, Brazil
Ana L. C. Bazzan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Çilden, E., Polat, F. (2014). Employing Automatic Temporal Abstractions to Accelerate Utile Suffix Memory Algorithm. In: Müller, J.P., Weyrich, M., Bazzan, A.L.C. (eds) Multiagent System Technologies. MATES 2014. Lecture Notes in Computer Science(), vol 8732. Springer, Cham. https://doi.org/10.1007/978-3-319-11584-9_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-11584-9_11
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11583-2
Online ISBN: 978-3-319-11584-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics