Solving Partially Observable Reinforcement Learning Problems with Recurrent Neural Networks

Duell, Siegmund; Udluft, Steffen; Sterzing, Volkmar

doi:10.1007/978-3-642-35289-8_38

Siegmund Duell^18,19,
Steffen Udluft¹⁸ &
Volkmar Sterzing¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7700))

66k Accesses
3 Altmetric

Abstract

The aim of this chapter is to provide a series of tricks and recipes for neural state estimation, particularly for real world applications of reinforcement learning. We use various topologies of recurrent neural networks as they allow to identify the continuous valued, possibly high dimensional state space of complex dynamical systems. Recurrent neural networks explicitly offer possibilities to account for time and memory, in principle they are able to model any type of dynamical system. Because of these capabilities recurrent neural networks are a suitable tool to approximate a Markovian state space of dynamical systems. In a second step, reinforcement learning methods can be applied to solve a defined control problem. Besides the trick of using a recurrent neural network for state estimation, various issues regarding real world problems such as, large sets of observables and long-term dependencies are addressed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Totally model-free actor-critic recurrent neural-network reinforcement learning in non-Markovian domains

Article 08 November 2016

Reinforcement Learning with Neural Networks: A Survey

Learning Multiple Timescales in Recurrent Neural Networks

References

Bakker, B.: Reinforcement Learning with Long Short-Term Memory. In: Becker, S., Dietterich, T.G., Ghahramani, Y. (eds.) Advances in Neural Information Processing Systems, pp. 1475–1482. MIT Press (2002)
Google Scholar
Bellman, R.E.: Dynamic Programming. Princeton University Press (1957)
Google Scholar
Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks 5(2), 157–166 (1994)
Article Google Scholar
Duell, S., Hans, A., Udluft, S.: The Markov Decision Process Extraction Network. In: Proc. of the 18th European Symposium on Artificial Neural Networks (2010)
Google Scholar
Duell, S., Weichbrodt, L., Hans, A., Udluft, S.: Recurrent Neural State Estimation in Domains with Long-Term Dependencies. In: Proc. of the 20th European Symposium on Artificial Neural Networks (2012)
Google Scholar
Frasconi, P., Gori, M., Soda, G.: Local feedback multilayered networks. Neural Computation 4(1), 120–130 (1992)
Article Google Scholar
Gomez, F., Miikkulainen, R.: 2-D Balancing with Recurrent Evolutionary Networks. In: Proceedings of the International Conference on Artificial Neural Networks (ICANN 1998), pp. 425–430. Springer (1998)
Google Scholar
Gomez, F.: Robust Non-Linear Control through Neuroevolution. PhD thesis, Departement of Computer Sciences Technical Report AI-TR-03-3003 (2003)
Google Scholar
Haykin, S.: Neural networks and learning machines, vol. 3. Prentice-Hall (2009)
Google Scholar
Haykin, S., Principe, J., Sejnowski, T., McWhirter, J.: New directions in statistical signal processing: from systems to brain. MIT Press (2007)
Google Scholar
Kolen, J.F., Kremer, S.C.: A field guide to dynamical recurrent networks. IEEE Press (2001)
Google Scholar
Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artificial Intelligence 101, 99–134 (1998)
Article MathSciNet MATH Google Scholar
Kietzmann, T.C., Riedmiller, M.: The Neuro Slot Car Racer: Reinforcement Learning in a Real World Setting. In: Proc. of the Int. Conf. on Machine Learning and Applications. IEEE (2009)
Google Scholar
Lin, T., Horne, B.G., Tino, P., Giles, C.L.: Learning long-term dependencies in NARX recurrent neural networks. IEEE Transactions on Neural Networks 7(6) (1996)
Google Scholar
Medsker, L., Jain, L.: Recurrent Neural Networks: Design and Application. International Series on Comp. Intelligence, vol. I. CRC Press (1999)
Google Scholar
Mozer, M.C.: Induction of multiscale temporal structure. In: Advances in Neural Information Processing Systems, vol. 4, pp. 275–282 (1992)
Google Scholar
Meuleau, N., Peshkin, L., Kee-Eung, K., Kaebling, L.P.: Learning Finite-State Controllers for Partially Observable Environments. In: Proceedings of the Fifteenth International Conference on Uncertainty in Artificial Intelligence (UAI 1999), pp. 427–436 (1999)
Google Scholar
Neuneier, R., Zimmermann, H.-G.: How to Train Neural Networks. In: Orr, G.B., Müller, K.-R. (eds.) NIPS-WS 1996. LNCS, vol. 1524, pp. 373–423. Springer, Heidelberg (1998)
Chapter Google Scholar
Peters, J., Schaal, A.: Reinforcement learning of motor skills with policy gradients. Neural Networks 21(4) (2008)
Google Scholar
Ramachandran, D.: Knowledge and Ignorance in Reinforcement Learning. PhD thesis, University of Illinois (2011)
Google Scholar
Rosenstein, M.T., Barto, A.G., Si, J., Powell, W., Wunsch, D.: Supervised actor-critic reinforcement learning. In: Handbook of Learning and Approximate Dynamic Programming, pp. 359–380 (2012)
Google Scholar
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(9), 533–536 (1986)
Article Google Scholar
Riedmiller, M.: 10 Steps and Some Tricks to Set Up Neural Reinforcement Controllers. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) NN: Tricks of the Trade, 2nd edn. LNCS, vol. 7700, pp. 735–757. Springer, Heidelberg (2012)
Google Scholar
Riedmiller, M.: Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 317–328. Springer, Heidelberg (2005)
Chapter Google Scholar
Samuel, A.L.: Some studies in machine learning using the game of checkers. IBM Journal on Research and Developement, 210–229 (1959)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press (1998)
Google Scholar
Schneegass, D.: Steigerung der Informationseffizienz im Reinforcement-Learning. PhD thesis, Luebeck University (2008)
Google Scholar
Schäfer, A.M., Schneegass, D., Sterzing, V., Udluft, S.: A Neural Reinforcement Learning Approach to Gas Turbine Control. In: Proc. of the Int. Joint Conf. on Neural Networks (2007)
Google Scholar
Schäfer, A.M., Udluft, S.: Solving Partially Observable Reinforcement Learning Problems with Recurrent Neural Networks. In: Workshop Proc. of the European Conf. on Machine Learning (2005)
Google Scholar
Schneegass, D., Udluft, S., Martinetz, T.: Neural Rewards Regression for Near-Optimal Policy Identification in Markovian and Partial Observable Environments. In: Proc. of the European Symposium on Artificial Neural Networks, pp. 301–306 (2007)
Google Scholar
Schäfer, A.M., Udluft, S., Zimmermann, H.G.: The Recurrent Control Neural Network. In: Proc. of the European Symposium on Artificial Neural Networks, pp. 319–324 (2007)
Google Scholar
Schäfer, A.M., Zimmermann, H.-G.: Recurrent Neural Networks Are Universal Approximators. In: Kollias, S.D., Stafylopatis, A., Duch, W., Oja, E. (eds.) ICANN 2006. LNCS, vol. 4131, pp. 632–640. Springer, Heidelberg (2006)
Chapter Google Scholar
Takens, F.: Detecting strange attractors in turbulence. Dynamical Systems and Turbulence 898, 366–381 (1981)
MathSciNet MATH Google Scholar
Zimmermann, H.G., Grothmann, R., Schäfer, A.M., Tietz, C.: Identification and Forecasting of Large Dynamical Systems by Dynamical Consistent Neural Networks. In: New Directions in Statistical Signal Processing: From Systems to Brain, pp. 203–242. MIT Press (2006)
Google Scholar
Zimmermann, H.G., Neuneier, R.: Neural network architectures for the modeling of dynamical systems. In: Kolen, J.F., Kremer, S.C. (eds.) A Field Guide to Dynamical Recurrent Networks, pp. 311–350. IEEE Press (2001)
Google Scholar
Zimmermann, H.G., Tietz, C., Grothmann, R.: Forecasting with Recurrent Neural Networks: 12 Tricks. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) NN: Tricks of the Trade, 2nd edn. LNCS, vol. 7700, pp. 687–707. Springer, Heidelberg (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Intelligent Systems and Control, Siemens AG, Corporate Technology, Germany
Siegmund Duell, Steffen Udluft & Volkmar Sterzing
Intelligent Systems and Control, Berlin University of Technology, Machine Learning, Germany
Siegmund Duell

Authors

Siegmund Duell
View author publications
You can also search for this author in PubMed Google Scholar
Steffen Udluft
View author publications
You can also search for this author in PubMed Google Scholar
Volkmar Sterzing
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Computer Science, Technische Universität Berlin, Franklinstr. 28/29, 10587, Berlin, Germany
Grégoire Montavon & Klaus-Robert Müller &
Dept. of computer Science, Willamette University, 900 State Street, 97301, Salem, OR, USA
Geneviève B. Orr

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Duell, S., Udluft, S., Sterzing, V. (2012). Solving Partially Observable Reinforcement Learning Problems with Recurrent Neural Networks. In: Montavon, G., Orr, G.B., Müller, KR. (eds) Neural Networks: Tricks of the Trade. Lecture Notes in Computer Science, vol 7700. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35289-8_38

Download citation

DOI: https://doi.org/10.1007/978-3-642-35289-8_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35288-1
Online ISBN: 978-3-642-35289-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Solving Partially Observable Reinforcement Learning Problems with Recurrent Neural Networks

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Totally model-free actor-critic recurrent neural-network reinforcement learning in non-Markovian domains

Reinforcement Learning with Neural Networks: A Survey

Learning Multiple Timescales in Recurrent Neural Networks

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Solving Partially Observable Reinforcement Learning Problems with Recurrent Neural Networks

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Totally model-free actor-critic recurrent neural-network reinforcement learning in non-Markovian domains

Reinforcement Learning with Neural Networks: A Survey

Learning Multiple Timescales in Recurrent Neural Networks

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation