ARKAQ-Learning: Autonomous State Space Segmentation and Policy Generation

Sardağ, Alp; Akın, H. Levent

doi:10.1007/11569596_54

ARKAQ-Learning: Autonomous State Space Segmentation and Policy Generation

Alp Sardağ¹⁹ &
H. Levent Akın¹⁹

Conference paper

2618 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3733))

Abstract

A real world environment is often partially observable by the agents either because of noisy sensors or incomplete perception. Autonomous strategy planning under uncertainty has two major challenges. First, autonomous segmentation of the state space for a given task; Second, emerging complex behaviors that deal with each state segment. This paper suggests a new approach that handles both by utilizing combination of various techniques, namely ARKAQ-Learning (ART 2-A networks augmented with Kalman Filters and Q-Learning). The algorithm is an online algorithm and it has low space and computational complexity. The algorithm was run for some well known partially observable Markov decision process problems. World Model Generator could reveal the hidden states, mapping non-Markovian model to Markovian internal state space. Policy Generator could build the optimal policy on the internal Markovian state model.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Boyen, X., Koller, D.: Tractable inference for complex stochastic processes. In: Conference on Uncertainty in Artificial Intelligence, pp. 33–42 (1998)
Google Scholar
Sallans, B.: Learning factored representations on partially observable Markov decision process. In: Neural Information Processing Systems, pp. 1050–1056. MIT Press, Cambridge (2000)
Google Scholar
Kaelbling, L., Littman, M., Moore, A.: Reinforcement Learning: A Survey. Journal of Artificial Intelligence Research 4, 237–285 (1996)
Google Scholar
Watkins, C.J.: Learning with delayed rewards. PhD Thesis, Cambridge University (1989)
Google Scholar
Singh, S., Jaakkola, T., Jordan, M.: Learning without state estimation in partially observable Markov decision processes. In: International Conference on Machine Learning, pp. 284–292 (1994)
Google Scholar
McCallum, R.A.: Instance-based util distinctions for reinforcement learning with hidden state. In: Proceedings of the Twelfth International Conference on Machine Learning, pp. 387–395. Morgan Kaufmann, San Francisco (1995)
Google Scholar
Lin, L., Mitchell, T.M.: Memory approaches to reinforcement learning in non-Markovian domains. Technical Report CMU-CS-92-138, School of Computer Science, Carnegie Mellon University (1992)
Google Scholar
Rumelhart, D., Hinton, G., Williams, R.: Parallel distributed processing. In: Learning internal representations by error propagation, ch. 8. MIT Press, Cambridge (1986)
Google Scholar
Grewal, A., Andrews, C.: Kalman Filtering, pp. 80–102. Prentice-Hall, Englewood Cliffs (1993)
MATH Google Scholar
Maybeck, P.S.: Stochastic models, estimation and control, vol. 1, pp. 1–15. Academic Press, London (1979)
MATH Google Scholar
Carpenter, G.A., Grossberg, S.: ART2. Self-Organization of Stable Category Recognition Codes for Analog Input Patterns. Applied Optics, 4919–4930 (1989)
Google Scholar
Carpenter, G.A., Grossberg, S., Rosen, D.B.: ART 2-A: An Adaptive Resonance Algorithm for Rapid Category Learning and Recognition. In: Neural Networks, vol. 4, pp. 493–504. Pergamon Press, Oxford (1991)
Google Scholar
Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957)
MATH Google Scholar
Bertsekas, D.: ynamic Programming: Deterministic and Stochastic Models. Prentice-Hall, Englewood Cliffs (1987)
Google Scholar
Howard, R.A.: Dynamic Programming and Markov Processes. The MIT Press, Cambridge (1960)
MATH Google Scholar
Sutton, R., Singh, S., Precup, D., Ravindran, B.: Improved switching among temporally abstract actions. In: Proceedings of Neural Information Processings Systems, pp. 1066–1072. MIT Press, Cambridge (1999)
Google Scholar
Peshkin, L., Shelton, H.: Learning from scarce experience. In: Proceedings of the Nineteenth International Conference on Machine Learning, pp. 498–505 (2002)
Google Scholar
Peshkin, L., Meuleau, N., Kaelbling, L.P.: Learning Policies with External Memory. In: Proceedings of the Sixteenth International Conference on Machine Learning, pp. 307–314 (1999)
Google Scholar
Tesauro, G.: Programming Backgammon Using Self-Teaching Neural Nets. Artificial Intelligence, 181–199 (2002)
Google Scholar
Carpenter, G.A., Grossberg, S.: A massively parallel architecture for a self-organizing neural pattern recognition machine. Computer Vision, Graphics, and Image Processing, 3754 (1987)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Boğaziçi University, 34342, Bebek, Istanbul, Turkey
Alp Sardağ & H. Levent Akın

Authors

Alp Sardağ
View author publications
You can also search for this author in PubMed Google Scholar
H. Levent Akın
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Engineering, Boğaziçi University, 34342, Bebek, Istanbul, Turkey
pInar Yolum & Can Özturan &
Computer Engineering Department, Boğaziçi University, 34342, Bebek, İstanbul, Turkey
Tunga Güngör
Computer Engineering Department, Bogazici University, 80815, Bebek, Istanbul, Turkey
Fikret Gürgen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sardağ, A., Akın, H.L. (2005). ARKAQ-Learning: Autonomous State Space Segmentation and Policy Generation. In: Yolum, p., Güngör, T., Gürgen, F., Özturan, C. (eds) Computer and Information Sciences - ISCIS 2005. ISCIS 2005. Lecture Notes in Computer Science, vol 3733. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11569596_54

Download citation

DOI: https://doi.org/10.1007/11569596_54
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29414-6
Online ISBN: 978-3-540-32085-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics