Skip to main content

Re-adapting the Regularization of Weights for Non-stationary Regression

  • Conference paper
Algorithmic Learning Theory (ALT 2011)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6925))

Included in the following conference series:

Abstract

The goal of a learner in standard online learning is to have the cumulative loss not much larger compared with the best-performing prediction-function from some fixed class. Numerous algorithms were shown to have this gap arbitrarily close to zero compared with the best function that is chosen off-line. Nevertheless, many real-world applications (such as adaptive filtering) are non-stationary in nature and the best prediction function may not be fixed but drift over time. We introduce a new algorithm for regression that uses per-feature-learning rate and provide a regret bound with respect to the best sequence of functions with drift. We show that as long as the cumulative drift is sub-linear in the length of the sequence our algorithm suffers a regret that is sub-linear as well. We also sketch an algorithm that achieves the best of the two worlds: in the stationary settings has log(T) regret, while in the non-stationary settings has sub-linear regret. Simulations demonstrate the usefulness of our algorithm compared with other state-of-the-art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Auer, P., Warmuth, M.K.: Tracking the best disjunction. Electronic Colloquium on Computational Complexity (ECCC) 7(70) (2000)

    Google Scholar 

  2. Bershad, N.J.: Analysis of the normalized lms algorithm with gaussian inputs. IEEE Transactions on Acoustics, Speech, and Signal Processing 34(4), 793–806 (1986)

    Article  Google Scholar 

  3. Cavallanti, G., Cesa-Bianchi, N., Gentile, C.: Tracking the best hyperplane with a simple budget perceptron. Machine Learning 69(2-3), 143–167 (2007)

    Article  Google Scholar 

  4. Ceas-Bianchi, N., Long, P.M., Warmuth, M.K.: Worst case quadratic loss bounds for on-line prediction of linear functions by gradient descent. Technical Report IR-418, University of California, Santa Cruz, CA, USA (1993)

    Google Scholar 

  5. Cesa-Bianchi, N., Conconi, A., Gentile, C.: A second-order perceptron algorithm. Siam Journal of Commutation 34(3), 640–668 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  6. Cesa-Bianchi, N., Lugosi, G.: Prediction, Learning, and Games. Cambridge University Press, New York (2006)

    Book  MATH  Google Scholar 

  7. Chen, M.-S., Yen, J.-Y.: Application of the least squares algorithm to the observer design for linear time-varying systems. IEEE Transactions on Automatic Control 44(9), 1742–1745 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  8. Crammer, K., Dredze, M., Pereira, F.: Exact confidence-weighted learning. In: NIPS, vol. 22 (2008)

    Google Scholar 

  9. Crammer, K., Kulesza, A., Dredze, M.: Adaptive regularization of weighted vectors. In: Advances in Neural Information Processing Systems, vol. 23 (2009)

    Google Scholar 

  10. Dredze, M., Crammer, K., Pereira, F.: Confidence-weighted linear classification. In: ICML (2008)

    Google Scholar 

  11. Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. In: COLT, pp. 257–269 (2010)

    Google Scholar 

  12. Feuer, A., Weinstein, E.: Convergence analysis of lms filters with uncorrelated Gaussian data. IEEE Transactions on Acoustics, Speech, and Signal Processing 33(1), 222–230 (1985)

    Article  Google Scholar 

  13. Forster, J.: On relative loss bounds in generalized linear regression. In: Ciobanu, G., Păun, G. (eds.) FCT 1999. LNCS, vol. 1684, pp. 269–280. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  14. Foster, D.P.: Prediction in the worst case. The Annals of Statistics 19(2), 1084–1090 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  15. Golub, G.H., Van Loan, C.F.: Matrix computations, 3rd edn. Johns Hopkins University Press, Baltimore (1996)

    MATH  Google Scholar 

  16. Goodhart, S.G., Burnham, K.J., James, D.J.G.: Logical covariance matrix reset in self-tuning control. Mechatronics 1(3), 339–351 (1991)

    Article  Google Scholar 

  17. Goodwin, G.C., Teoh, E.K., Elliott, H.: Deterministic convergence of a self-tuning regulator with covariance resetting. Control Theory and App., IEE Proc. D 130(1), 6–8 (1983)

    Article  MathSciNet  MATH  Google Scholar 

  18. Vovk, V.G.: Aggregating strategies. In: Proceedings of the Third Annual Workshop on Computational Learning Theory, pp. 371–383. Morgan Kaufmann, San Francisco (1990)

    Google Scholar 

  19. Hayes, M.H.: 9.4: Recursive least squares. In: Statistical Digital Signal Processing and Modeling, p. 541 (1996)

    Google Scholar 

  20. Herbster, M., Warmuth, M.K.: Tracking the best linear predictor. Journal of Machine Learning Research 1, 281–309 (2001)

    MathSciNet  MATH  Google Scholar 

  21. Itmead, R.R., Anderson, B.D.O.: Performance of adaptive estimation algorithms in dependent random environments. IEEE Transactions on Automatic Control 25, 788–794 (1980)

    Article  MATH  Google Scholar 

  22. Kivinen, J., Warmuth, M.K.: Exponential gradient versus gradient descent for linear predictors. Information and Computation 132, 132–163 (1997)

    Article  MATH  Google Scholar 

  23. Kivinen, J., Smola, A.J., Williamson, R.C.: Online learning with kernels. In: NIPS, pp. 785–792 (2001)

    Google Scholar 

  24. Littlestone, N., Warmuth, M.K.: The weighted majority algorithm. Inf. Comput. 108(2), 212–261 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  25. McMahan, H.B., Streeter, M.J.: Adaptive bound optimization for online convex optimization. In: COLT, pp. 244–256 (2010)

    Google Scholar 

  26. Salgado, M.E., Goodwin, G.C., Middleton, R.H.: Modified least squares algorithm incorporating exponential resetting and forgetting. International Journal of Control 47(2), 477–491 (1988)

    Article  MATH  Google Scholar 

  27. Song, H.-S., Nam, K., Mutschler, P.: Very fast phase angle estimation algorithm for a single-phase system having sudden phase angle jumps. In: Industry Applications Conference. 37th IAS Annual Meeting, vol. 2, pp. 925–931 (2002)

    Google Scholar 

  28. Widrow, B., Hoff Jr., M.E.: Adaptive switching circuits (1960)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Vaits, N., Crammer, K. (2011). Re-adapting the Regularization of Weights for Non-stationary Regression. In: Kivinen, J., Szepesvári, C., Ukkonen, E., Zeugmann, T. (eds) Algorithmic Learning Theory. ALT 2011. Lecture Notes in Computer Science(), vol 6925. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24412-4_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-24412-4_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-24411-7

  • Online ISBN: 978-3-642-24412-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics