Temporal Difference Coding in Reinforcement Learning

Iwata, Kazunori; Ikeda, Kazushi

doi:10.1007/978-3-540-45080-1_30

Kazunori Iwata⁷ &
Kazushi Ikeda⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2690))

Included in the following conference series:

International Conference on Intelligent Data Engineering and Automated Learning

1288 Accesses

Abstract

In this paper, we regard the sequence of returns as outputs from a parametric compound source. The coding rate of the source shows the amount of information on the return, so the information gain concerning future information is given by the sum of the discounted coding rates. We accordingly formulate a temporal difference learning for estimating the expected information gain, and give a convergence proof of the information gain under certain conditions. As an example of applications, we propose the ratio w of return loss to information gain to be used in probabilistic action selection strategies. We found in experiments that our w-based strategy performs well compared with the conventional Q-based strategy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Sutton, R.S., Barto, A.G.: Reinforcement Learning:An Introduction. Adaptive Computation and Machine Learning. MIT Press, Cambridge (1998)
Google Scholar
Zhang, W., Dietterich, T.G.: A reinforcement learning approach to job-stop scheduling. In: Mellish, C.S. (ed.) Proceedings of the 14th International Joint Conference on Artificial Intelligence, Montreal, Canada, pp. 1114–1120. Morgan Kaufmann, San Mateo (1995)
Google Scholar
Likas, A.: A reinforcement learning approach to on-line clustering. Neural Computation 11, 1915–1932 (1999)
Article Google Scholar
Sato, M., Kobayashi, S.: Variance-penalized reinforcement learning for risk-averse asset allocation. In: Leung, K.-S., Chan, L., Meng, H. (eds.) IDEAL 2000. LNCS, vol. 1983, pp. 244–249. Springer, Heidelberg (2000)
Chapter Google Scholar
Billingsley, P.: Probability and Measure, 3rd edn. Wiley Series in Probability and Mathematical Statistics. JohnWiley & Sons, NewYork (1995)
MATH Google Scholar
Rissanen, J.: Stochastic complexity and modeling. The Annals of Statistics 14, 1080–1100 (1986)
Article MATH MathSciNet Google Scholar
Han, T.S., Kobayashi, K.: Mathematics of Information and Coding. In: Translations of Mathematical Monographs, vol. 203, American Mathematical Society, Providence (2002) (Translated by Joe Suzuki)
Google Scholar
Watkins, C.J.C.H., Dayan, P.: Technical note: Q-learning. Machine Learning 8, 279–292 (1992)
MATH Google Scholar
Kushner, H.J., Yin, G.G.: Exercises in Computer Systems Analysis. Apprications of Mathematics, vol. 35. Springer, NewYork (1997)
MATH Google Scholar
Sato, M., Kobayashi, S.: Average-reward reinforcement learning for variance penalized markov decision problems. In: Brodley, C.E., Danyluk, A.P. (eds.) Proceedings of the 18th International Conference on Machine Learning, Williams College, pp. 473–480. Morgan Kaufmann Publishers, San Francisco (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Systems Science, Graduate School of Informatics, Kyoto University, Yoshida-Honmachi, Sakyo-ku, Kyoto, 606-8501, Japan
Kazunori Iwata & Kazushi Ikeda

Authors

Kazunori Iwata
View author publications
You can also search for this author in PubMed Google Scholar
Kazushi Ikeda
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong
Jiming Liu
Department of Computer Science, Hong Kong Baptist University, Hong Kong
Yiu-ming Cheung
School of Electrical and Electronic Engineering, University of Manchester, UK
Hujun Yin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Iwata, K., Ikeda, K. (2003). Temporal Difference Coding in Reinforcement Learning. In: Liu, J., Cheung, Ym., Yin, H. (eds) Intelligent Data Engineering and Automated Learning. IDEAL 2003. Lecture Notes in Computer Science, vol 2690. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45080-1_30

Download citation

DOI: https://doi.org/10.1007/978-3-540-45080-1_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40550-4
Online ISBN: 978-3-540-45080-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics