Abstract
Foster and Vovk proved relative loss bounds for linear regression where the total loss of the on-line algorithm minus the total loss of the best linear predictor (chosen in hindsight) grows logarithmically with the number of trials. We give similar bounds for temporal-difference learning. Learning takes place in a sequence of trials where the learner tries to predict discounted sums of future reinforcement signals. The quality of the predictions is measured with the square loss and we bound the total loss of the on-line algorithm minus the total loss of the best linear predictor for the whole sequence of trials. Again the difference of the losses is logarithmic in the number of trials. The bounds hold for an arbitrary (worst-case) sequence of examples. We also give a bound on the expected difference for the case when the instances are chosen from an unknown distribution. For linear regression a corresponding lower bound shows that this expected bound cannot be improved substantially.
Article PDF
Similar content being viewed by others
References
Azoury, K., & Warmuth, M. K. (2001). Relative loss bounds for on-line density estimation with the exponential family of distributions. Machine Learning, 43: 3, 211–246.
Bollobás, B. (1999). Linear analysis: An introductory course. Cambridge: Cambridge University Press.
Boyan, J. (1999). Least-squares temporal difference learning. In Proceedings of the Sixteenth International Conference on Machine Learning (pp. 63–70). San Francisco: Morgan Kaufmann.
Bradtke, S. J., & Barto, A. G. (1996). Linear least-squares algorithms for temporal difference learning. Machine Learning, 22:1/3, 33–57.
Cesa-Bianchi, N., Long, P., & Warmuth, M. K. (1996). Worst-case quadratic loss bounds for on-line predictions of linear functions by gradient descent. IEEE Transactions on Neural Networks, 7, 604–619
Forster, J. (1999). On relative loss bounds in generalized linear regression. In Proceedings of the Twelfth International Symposium on Fundamentals of Computation Theory (pp. 269–280). Berlin: Springer.
Forster, J. (2001). Ph.D. Thesis, Fakultät für Mathematik, Ruhr-Universität Bochum.
Foster, D. P. (1991). Prediction in the worst case. The Annals of Statistics, 19, 1084–1090.
Graps, A. L. (1995). An introduction to wavelets. IEEE Computational Sciences and Engineering, 2, 50–61.
Hassibi, B., Kivinen, J., & Warmuth, M. K. (1995). Unpublished manuscript. Department of Computer Science, University of California, Santa Cruz.
Herbster, M., & Warmuth, M. K. (1998). Tracking the best regressor. In Proceedings of the Twelfth Annual Conference on Computational Learning Theory (pp. 24–31). New York: ACM Press.
Kivinen, J., & Warmuth, M. K. (1997). Additive versus exponentiated gradient updates for linear prediction. Journal of Information and Computation, 133, 1–64.
Press, W. H., Flannery, B. P., Teukolsky, S. A., & Vetterling, W. T. (1989). Numerical recipes in pascal. Cambridge: Cambridge University Press.
Rektorys, K. (1994). Survey of applicable mathematics, 2nd rev. edn. Boston: Kluwer Academic Publishers.
Saunders, C., Gammerman, A., & Vovk, V. (1998). Ridge regression learning algorithm in dual variables. In Proceedings of the Fifteenth International Conference on Machine Learning (pp. 515–521). San Francisco: Morgan Kaufmann.
Schapire, R. E., & Warmuth, M. K. (1996). On the worst-case analysis of temporal-difference learning algorithms. Machine Learning, 22:1/3, 95–121.
Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3:1, 9–44.
Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge, MA: MIT Press.
Vovk, V. (1997). Competitive on-line linear regression. Technical Report CSD-TR-97-13, Department of Computer Science, Royal Holloway, University of London.
Walker, J. S. (1996). Fast Fourier transforms, 2nd edn. New York: CRC Press.
Widrow, B., & Stearns, S. (1985). Adaptive signal processing. Englewood Cliffs, NJ: Prentice-Hall.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Forster, J., Warmuth, M.K. Relative Loss Bounds for Temporal-Difference Learning. Machine Learning 51, 23–50 (2003). https://doi.org/10.1023/A:1021825927902
Issue Date:
DOI: https://doi.org/10.1023/A:1021825927902