Skip to main content

Epoch-Incremental Queue-Dyna Algorithm

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5097))

Abstract

The basic reinforcement learning algorithm, as Q-learning, is characterized by short time-consuming single learning step, however, the number of epochs necessary to achieve the optimal policy is not satisfactory. There are many methods that reduce the number of necessary epochs, like TD(λ> 0), Dyna or prioritized sweeping, but their learning time is considerable. This paper proposes a combination of Q-learning algorithm performed in incremental mode with executed in epoch mode method of acceleration based on environment model and distance to terminal state. This approach ensures the maintenance of short time of a single learning step and high efficiency comparable with Dyna or prioritized sweeping. Proposed algorithm is compared with Q(λ)-learning, Dyna-Q and prioritized sweeping in the experiments on three maze tasks. The time-consuming learning process and number of epochs necessary to reach the terminal state is used to evaluate the efficiency of compared algorithms.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   189.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Asadpour, M., Siegwart, R.: Compact Q-learning optimized for micro-robots with processing and memory constraints. In: Robotics and Autonomous Systems. European Conference on Mobile Robots, vol. 48(1), pp. 49–61 (2004)

    Google Scholar 

  2. Barto, A.G., Sutton, R.S., Anderson, C.W.: Neuronlike adaptive elements that can solve difficult learning problem. IEEE Trans. SMC 13, 834–847 (1983)

    Google Scholar 

  3. Barto, A.G., Bradtke, S.J., Singh, S.P.: Learning to Act using Real-Time Dynamic Programming. Artificial Intelligence. Special Vol. on Computational Research on Interaction and Agency 72(1), 81–138 (1995)

    Google Scholar 

  4. Cichosz, P.: Learning systems. WNT, Warsaw (2000) (in polish)

    Google Scholar 

  5. Crook, P., Hayes, G.: Learning in a State of Confusion: Perceptual Aliasing in Grid World Navigation. In: Proc. of Towards Intelligent Mobile Robots (2003)

    Google Scholar 

  6. Kaelbing, L.P., Litman, M.L., Moore, A.W.: Reinforcement Learning: A Survey. Journal of Artificial Intelligence Research 4, 237–285 (1996)

    Google Scholar 

  7. Lanzi, P.L.: Adaptive Agents with Reinforcement Learning and Internal Memory. In: Proc. of the Sixth International Conference on the Simulation of Adaptive Behavior, pp. 333–342. The MIT Press, Cambridge (2000)

    Google Scholar 

  8. Loch, J., Singh, S.: Using eligibility traces to find the best memoryless policy in partially observable Markov decision processes. In: ICML, pp. 323–331 (1998)

    Google Scholar 

  9. Moore, A.W., Atkeson, C.G.: Prioritized sweeping: Reinforcement learning with less data and less time. Machine Learning 13, 103–130 (1993)

    Google Scholar 

  10. Peng, J., Williams, R.J.: Efficient learning and planning within the Dyna framework. In: Meyer, J., Roitblat, H., Wilson, S. (eds.) From Animals to Animats 2, USA, pp. 281–290 (1993)

    Google Scholar 

  11. Pickett, M., Barto, A.G.: PolicyBlocks: An Algorithm for Creating Useful Macro-Actions in Reinforcement Learning. In: Proc. of the International Conference on Machine Learning, vol. 19, pp. 506–513 (2002)

    Google Scholar 

  12. Rummery, G.A., Niranjan, M.: On line q-learning using connectionist systems. Technical Report CUED/F-INFENG/TR 166, Cambridge University Engineering Department (1994)

    Google Scholar 

  13. Sherstov, A.A., Stone, P.: Improving Action Selection in MDP’s via Knowledge Transfer. In: Proc 20th the Nation Conference on Artificial Intelligence, vol. 20(2), pp. 1024–1029 (2005)

    Google Scholar 

  14. Sutton, R.S.: Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming. In: Proc. of Seventh Int. Conf. on Machine Learning, pp. 216–224 (1990)

    Google Scholar 

  15. Sutton, R.S.: Planning by incremental dynamic programming. In: Proc. of the Ninth Conference on Machine Learning, pp. 353–357 (1991)

    Google Scholar 

  16. Sutton, R.S., Barto, A.G.: Reinforcement learning: An Introduction. MIT Press, Cambridge (1998)

    Google Scholar 

  17. Tadepalli, P., Ok, D.: Model–Based Average Reward Reinforcement Learning. Artificial Intelligence 100, 177–224 (1998)

    Article  MATH  Google Scholar 

  18. Tanner, B., Sutton, R.S.: Temporal-Difference Networks with History. In: Proc. of the 2005 International Joint Conference on Artificial Intelligence, pp. 865–870 (2005)

    Google Scholar 

  19. Watkins, C.J.C.H.: Learning from delayed Rewards. PhD thesis, Cambridge University, Cambridge, England (1989)

    Google Scholar 

  20. Wellstead, P.E.: Introduction to Physical System Modelling, Control System Principles (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Leszek Rutkowski Ryszard Tadeusiewicz Lotfi A. Zadeh Jacek M. Zurada

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zajdel, R. (2008). Epoch-Incremental Queue-Dyna Algorithm. In: Rutkowski, L., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing – ICAISC 2008. ICAISC 2008. Lecture Notes in Computer Science(), vol 5097. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69731-2_109

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-69731-2_109

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-69572-1

  • Online ISBN: 978-3-540-69731-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics