Relative Loss Bounds for Temporal-Difference Learning

Forster, Jürgen; Warmuth, Manfred K.

doi:10.1023/A:1021825927902

Relative Loss Bounds for Temporal-Difference Learning

Published: April 2003

Volume 51, pages 23–50, (2003)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Relative Loss Bounds for Temporal-Difference Learning

Download PDF

Jürgen Forster¹ &
Manfred K. Warmuth²

347 Accesses
2 Citations
Explore all metrics

Abstract

Foster and Vovk proved relative loss bounds for linear regression where the total loss of the on-line algorithm minus the total loss of the best linear predictor (chosen in hindsight) grows logarithmically with the number of trials. We give similar bounds for temporal-difference learning. Learning takes place in a sequence of trials where the learner tries to predict discounted sums of future reinforcement signals. The quality of the predictions is measured with the square loss and we bound the total loss of the on-line algorithm minus the total loss of the best linear predictor for the whole sequence of trials. Again the difference of the losses is logarithmic in the number of trials. The bounds hold for an arbitrary (worst-case) sequence of examples. We also give a bound on the expected difference for the case when the instances are chosen from an unknown distribution. For linear regression a corresponding lower bound shows that this expected bound cannot be improved substantially.

References

Azoury, K., & Warmuth, M. K. (2001). Relative loss bounds for on-line density estimation with the exponential family of distributions. Machine Learning, 43: 3, 211–246.
Google Scholar
Bollobás, B. (1999). Linear analysis: An introductory course. Cambridge: Cambridge University Press.
Google Scholar
Boyan, J. (1999). Least-squares temporal difference learning. In Proceedings of the Sixteenth International Conference on Machine Learning (pp. 63–70). San Francisco: Morgan Kaufmann.
Google Scholar
Bradtke, S. J., & Barto, A. G. (1996). Linear least-squares algorithms for temporal difference learning. Machine Learning, 22:1/3, 33–57.
Google Scholar
Cesa-Bianchi, N., Long, P., & Warmuth, M. K. (1996). Worst-case quadratic loss bounds for on-line predictions of linear functions by gradient descent. IEEE Transactions on Neural Networks, 7, 604–619
Google Scholar
Forster, J. (1999). On relative loss bounds in generalized linear regression. In Proceedings of the Twelfth International Symposium on Fundamentals of Computation Theory (pp. 269–280). Berlin: Springer.
Google Scholar
Forster, J. (2001). Ph.D. Thesis, Fakultät für Mathematik, Ruhr-Universität Bochum.
Foster, D. P. (1991). Prediction in the worst case. The Annals of Statistics, 19, 1084–1090.
Google Scholar
Graps, A. L. (1995). An introduction to wavelets. IEEE Computational Sciences and Engineering, 2, 50–61.
Google Scholar
Hassibi, B., Kivinen, J., & Warmuth, M. K. (1995). Unpublished manuscript. Department of Computer Science, University of California, Santa Cruz.
Herbster, M., & Warmuth, M. K. (1998). Tracking the best regressor. In Proceedings of the Twelfth Annual Conference on Computational Learning Theory (pp. 24–31). New York: ACM Press.
Google Scholar
Kivinen, J., & Warmuth, M. K. (1997). Additive versus exponentiated gradient updates for linear prediction. Journal of Information and Computation, 133, 1–64.
Google Scholar
Press, W. H., Flannery, B. P., Teukolsky, S. A., & Vetterling, W. T. (1989). Numerical recipes in pascal. Cambridge: Cambridge University Press.
Google Scholar
Rektorys, K. (1994). Survey of applicable mathematics, 2nd rev. edn. Boston: Kluwer Academic Publishers.
Google Scholar
Saunders, C., Gammerman, A., & Vovk, V. (1998). Ridge regression learning algorithm in dual variables. In Proceedings of the Fifteenth International Conference on Machine Learning (pp. 515–521). San Francisco: Morgan Kaufmann.
Google Scholar
Schapire, R. E., & Warmuth, M. K. (1996). On the worst-case analysis of temporal-difference learning algorithms. Machine Learning, 22:1/3, 95–121.
Google Scholar
Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3:1, 9–44.
Google Scholar
Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge, MA: MIT Press.
Google Scholar
Vovk, V. (1997). Competitive on-line linear regression. Technical Report CSD-TR-97-13, Department of Computer Science, Royal Holloway, University of London.
Walker, J. S. (1996). Fast Fourier transforms, 2nd edn. New York: CRC Press.
Google Scholar
Widrow, B., & Stearns, S. (1985). Adaptive signal processing. Englewood Cliffs, NJ: Prentice-Hall.
Google Scholar

Download references

Author information

Authors and Affiliations

Lehrstuhl Mathematik & Informatik, Fakultät für Mathematik, Ruhr-Universität Bochum, 44780, Bochum, Germany
Jürgen Forster
Computer Science Department, University of California, Santa Cruz, CA, 95064, USA
Manfred K. Warmuth

Authors

Jürgen Forster
View author publications
You can also search for this author in PubMed Google Scholar
Manfred K. Warmuth
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Forster, J., Warmuth, M.K. Relative Loss Bounds for Temporal-Difference Learning. Machine Learning 51, 23–50 (2003). https://doi.org/10.1023/A:1021825927902

Download citation

Issue Date: April 2003
DOI: https://doi.org/10.1023/A:1021825927902

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Relative Loss Bounds for Temporal-Difference Learning

Abstract

Article PDF

Similar content being viewed by others

On the Role of Time in Learning

Bounds for the Tracking Error of First-Order Online Optimization Methods

Relative deviation learning bounds and generalization with unbounded loss functions

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Relative Loss Bounds for Temporal-Difference Learning

Abstract

Article PDF

Similar content being viewed by others

On the Role of Time in Learning

Bounds for the Tracking Error of First-Order Online Optimization Methods

Relative deviation learning bounds and generalization with unbounded loss functions

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation