Skip to main content

Accelerating Stochastic Newton Method via Chebyshev Polynomial Approximation

  • Conference paper
  • First Online:
Advances in Knowledge Discovery and Data Mining (PAKDD 2023)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13936))

Included in the following conference series:

  • 961 Accesses

Abstract

To solve large scale optimization problems arising from machine learning, stochastic Newton methods have been proposed for reducing the cost of computing Hessian and Hessian inverse, while still maintaining fast convergence. Recently, a second-order method named LiSSA [12] was proposed to approximate the Hessian inverse with Taylor expansion and achieves (almost) linear running time in optimization. The approach is very simple yet effective, but still could be further accelerated. In this paper, we resort to Chebyshev polynomial and its variants to approximate the Hessian inverse. Note that Chebyshev polynomial approximation is broadly acknowledged as the optimal polynomial approximation in the deterministic setting, in this paper we introduce it into the stochastic setting of Newton optimization. We provide a complete convergence analysis and the experiments on multiple benchmarks show that our proposed algorithms outperform LiSSA, which validates our theoretical insights.

Supported by NSFC grant 11771148 and Science and Technology Commission of Shanghai Municipality grant 20511100200, 21JC1402500 and 22DZ2229014.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Blackard, J.A., Dean, D.J.: Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables. Comput. Electron. Agric. 24(3), 131–151 (1999)

    Article  Google Scholar 

  2. Defazio, A., Bach, F., Lacoste-Julien, S.: SAGA: a fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014)

    Google Scholar 

  3. Erdogdu, M.A., Montanari, A.: Convergence rates of sub-sampled newton methods, pp. 3034–3042. Advances in Neural Information Processing Systems (2015)

    Google Scholar 

  4. Golub, G.H., van Loan, C.F.: Matrix Computations, 2nd edn. The Johns Hopkins University Press, Baltimore and London (1989)

    MATH  Google Scholar 

  5. Golub, G.H., Varga, R.S.: Chebyshev semi-iterative methods, successive overrelaxation iterative methods, and second order Richardson iterative methods. Numer. Math. 3, 147–156 (1961)

    Article  MathSciNet  MATH  Google Scholar 

  6. Gower, R., Goldfarb, D., Richtárik, P.: Stochastic block BFGS: squeezing more curvature out of data. In: International Conference on Machine Learning, pp. 1869–1878. PMLR (2016)

    Google Scholar 

  7. Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. Adv. Neural. Inf. Process. Syst. 26, 315–323 (2013)

    Google Scholar 

  8. LeCun, Y., Cortes, C.: The MNIST database of handwritten digits (1998). http://yann.lecun.com/exdb/mnist/

  9. Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml

  10. McCallum, A.: Real-sim (1997). https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html#real-sim

  11. Moritz, P., Nishihara, R., Jordan, M.: A linearly-convergent stochastic l-BFGS algorithm. In: Artificial Intelligence and Statistics, pp. 249–258. PMLR (2016)

    Google Scholar 

  12. Naman Agarwal, B.B., Hazan, E.: Second-order stochastic optimization for machine learning in linear time. J. Mach. Learn. Res. 18, 1–40 (2017)

    MathSciNet  MATH  Google Scholar 

  13. Pilanci, M., Wainwright, M.J.: Newton sketch: a linear-time optimization algorithm with linear-quadratic convergence. SIAM J. Optim. 27(1), 205–245 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  14. Byrd, R.H., Chin, G.M., Neveitt, W., Norcedal, J.: On the use of stochastic Hessian information in optimization methods for machine learning. SIAM J. Optimiz. 21(3), 977–995 (2011)

    Google Scholar 

  15. Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Statist. 22, 400–407 (1951)

    Google Scholar 

  16. Shalev-Shwartz, S., Zhang, T.: Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization. In: International Conference on Machine Learning, pp. 64–72. PMLR (2014)

    Google Scholar 

  17. Tropp, J.A.: User-friendly tail bounds for sums of random matrices. Found. Comput. Math. 12(4), 389–434 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  18. Varga, R.S.: Matrix Iterative Analysis, 2nd edn. Englewood Cliff, N.J. Prentice-Hall Inc., (2000). https://doi.org/10.1007/978-3-642-05156-2

Download references

Acknowledgements

We would like to thank the anonymous reviewers for their constructive comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jianyu Pan .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 336 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sha, F., Pan, J. (2023). Accelerating Stochastic Newton Method via Chebyshev Polynomial Approximation. In: Kashima, H., Ide, T., Peng, WC. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2023. Lecture Notes in Computer Science(), vol 13936. Springer, Cham. https://doi.org/10.1007/978-3-031-33377-4_40

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-33377-4_40

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-33376-7

  • Online ISBN: 978-3-031-33377-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics