Stochastic Regularized Newton Methods for Nonlinear Equations

Wang, Jiani; Wang, Xiao; Zhang, Liwei

doi:10.1007/s10915-023-02099-4

Stochastic Regularized Newton Methods for Nonlinear Equations

Published: 22 January 2023

Volume 94, article number 51, (2023)
Cite this article

Journal of Scientific Computing Aims and scope Submit manuscript

676 Accesses
1 Altmetric
Explore all metrics

Abstract

In this paper, we study stochastic regularized Newton methods to find zeros of nonlinear equations, whose exact function information is normally expensive to calculate but approximations can be easily accessed via calls to stochastic oracles. To handle the potential singularity of Jacobian approximations, we compute a regularized Newton step at each iteration. Then we take a unit step if it can be accepted by an inexact line search condition, and a preset step otherwise. We investigate the global convergence properties and the convergence rate of the proposed algorithm with high probability. We also propose a stochastic regularized Newton method incorporating a variance reduction technique and establishing the corresponding sample complexities in terms of total numbers of stochastic oracle calls to find an approximate solution. Finally, we report some numerical results and demonstrate the promising performances of the two proposed algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Frank-Wolfe Algorithm: A Short Introduction

Article Open access 13 December 2023

The Deep Ritz Method: A Deep Learning-Based Numerical Algorithm for Solving Variational Problems

Article 14 February 2018

Mean convergence theorems using hybrid methods to find common fixed points for noncommutative nonlinear mappings in Hilbert spaces

Article 19 March 2021

Data Availability

The datasets Adult, Gisette and RCV1 analyzed during the current study are available in http://archive.ics.uci.edu/ml/datasets.php. The dataset CINA analyzed during the current study is available in http://www.causality.inf.ethz.ch/data/CINA.html.

References

Aliprantis, C.D., Border, K.C.: Infinite dimensional analysis. Springer Berlin, third ed. A hitchhiker’s guide (2006)
Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: ICML (2016)
Behling, R., Iusem, A.N.: The effect of calmness on the solution set of systems of nonlinear equations. Math. Program. 137(1), 155–165 (2013)
Bianchi, P.: Ergodic convergence of a stochastic proximal point algorithm. SIAM J. Optim. 26(4), 2235–2260 (2016)
Article MathSciNet MATH Google Scholar
Causality workbench team (2008). A marketing dataset. http://www.causality.inf.ethz.ch/data/CINA.html
Chen, S., Pang, L.-P., Guo, F.-F., Xia, Z.-Q.: Stochastic methods based on Newton method to the stochastic variational inequality problem with constraint conditions. Math. Comput. Model. 55(3), 779–784 (2012)
Article MathSciNet MATH Google Scholar
Dvurechensky, P., Gasnikov, A.: Stochastic intermediate gradient method for convex problems with stochastic inexact oracle. J. Optim. Theory App. 171(1), 121–145 (2016)
Article MathSciNet MATH Google Scholar
Fan, J.: Accelerating the modified Levenberg-Marquardt method for nonlinear equations. Math. Comput. 83(287), 1173–1187 (2013)
Article MathSciNet MATH Google Scholar
Guyon, I., Gunn, S., Ben-Hur, A., Dror, G.: Result analysis of the NIPS 2003 feature selection challenge. In: Advances in NIPS (pp. 545-552) (2005)
Iusem, A.N., Jofré, A., Oliveira, R.I., Thompson, P.: Extragradient method with variance reduction for stochastic variational inequalities. SIAM J. Optim. 27(2), 686–724 (2017)
Article MathSciNet MATH Google Scholar
Iusem, A.N., Jofré, A., Oliveira, R.I., Thompson, P.: Variance-based extragradient methods with line search for stochastic variational inequalities. SIAM J. Optim. 29(1), 175–206 (2019)
Article MathSciNet MATH Google Scholar
Iusem, A.N., Jofré, A., Thompson, P.: Incremental constraint projection methods for monotone stochastic variational inequalities. Math. Oper. Res. 44(1), 236–263 (2018)
MathSciNet MATH Google Scholar
Jiang, H., Xu, H.: Stochastic approximation approaches to the stochastic variational inequality problem. IEEE Trans. Automat. Contr. 53(6), 1462–1475 (2008)
Article MathSciNet MATH Google Scholar
Kloeden, P.E., Platen, E.: Numerical Solution of Stochastic Differential Equations. Springer, Berlin (1992)
Book MATH Google Scholar
Lan, G.: First-order and Stochastic Optimization Methods for Machine Learning. Springer, Berlin (2020)
Book MATH Google Scholar
Lee, J.D., Lin, Q., Ma, T., Yang, T.: Distributed stochastic variance reduced gradient methods by sampling extra data with replacement. J. Mach. Learn. Res. 18(122), 1–43 (2017)
MathSciNet MATH Google Scholar
Lewis, D.D., Yang, Y., Rose, T.G., Li, F.: RCV1: A new benchmark collection for text categorization research. J. Math. Learn. Res. 5, 361–397 (2004)
Google Scholar
Li, X., Zhao, T., Arora, R., Liu, H., Haupt, J.: Stochastic variance reduced optimization for nonconvex sparse learning. ICML 48 (pp. 917-925) (2016)
Lord, G., Malham, S.J.A., Wiese, A.: Efficient strong integrators for linear stochastic systems. SIAM J. Numer. Anal. 46(6), 2892–2919 (2008)
Article MathSciNet MATH Google Scholar
Moré, J.J., Sorensen, D.C.: Computing a trust region step. SIAM J. Sci. Comput., 4(3), https://doi.org/10.1137/0904038 (1983)
Mukherjee, I., Canini, K., Frongillo, R., Singer, Y.: Parallel boosting with momentum. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases (pp. 17-32). Springer, Berlin, Heidelberg (2013, September)
Nemirovski, A., Juditsky, A., Lan, G., Shapiro, A.: Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4), 1574–1609 (2009)
Article MathSciNet MATH Google Scholar
Nguyen, L.M., Liu, J., Scheinberg, K., Takác̆, M.: SARAH: A novel method for machine learning problems using stochastic recursive gradient. In: ICML, 2613-2621 (2017, July)
Nocedal, J., Wight, S.: Numerical Optimization. Springer, Berlin (2006)
Google Scholar
Nunno, G.D., Zhang, T.: Approximations of stochastic partial differential equations. Ann. Appl. Probab. 26(3), 1443–1466 (2016)
Article MathSciNet MATH Google Scholar
Øksendal, B.K.: Stochastic differential equations: an introduction with applications. J. Am. Stat. Assoc. 82(399), 948 (1987)
Article Google Scholar
Papini, M., Binaghi, D., Canonaco, G., Pirotta, M., Restelli, M.: Stochastic variance-reduced policy gradient. In: ICML (Vol. 80, pp. 4026-4035) (2018)
Paquette, C., Scheinberg, K.: A stochastic line search method with convergence rate analysis. SIAM J. Optim. 30(1), 349–376 (2020)
Article MathSciNet MATH Google Scholar
Reddi, S.J., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. In: NeurIPS (pp. 1145-1153) (2016)
Ross, S.M.: Introduction to Stochastic Dynamic Progrmaming. Academic Press (1983)
Google Scholar
Shapiro, A., Dentcheva, D., Ruszczyński, A.: Lectures on stochastic programming: modeling and theory. MOS-SIAM Series on Optimization (2009)
Tang, J., Ma, C.: A smoothing Newton method for solving a class of stochastic linear complementarity problems. Nonlinear Anal-Real World Appl 12(6), 3585–3601 (2011)
Article MathSciNet MATH Google Scholar
Tran-Dinh, Q., Pham N.H., Nguyen, L.: Stochastic Gauss-Newton Algorithms for Nonconvex Compositional Optimization. In: ICML (2020)
Tropp, J.A.: User-friendly tail bounds for sums of random matrices. Found. Comput. Math. 12(4), 389–434 (2012)
Article MathSciNet MATH Google Scholar
Ueda, K., Yamashita, N.: On a global complexity bound of the levenberg-marquardt method. J. Optim. Theory Appl. 147(3), 443–453 (2010)
Article MathSciNet MATH Google Scholar
Vaswani, S., Mishkin, A., Laradji, I., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: interpolation, line search, and convergence rates. arXiv preprint arXiv:1905.09997 (2019)
Wang, Z., Zhou, Y., Liang, Y., Lan, G.: Stochastic variance-reduced cubic regularization for nonconvex optimization. nt. Conf. Artif. Intell. Stat., 2731-2740 (2019)
Ypma, T.J.: Historical development of the Newton-Raphson method. SIAM Rev. 37(4), 531–551 (1995)
Article MathSciNet MATH Google Scholar
Zhang, J., Xiao, L., Zhang, S.: Adaptive stochastic variance reduction for subsampled newton method with cubic regularization. Optimization and Control, ArXiv (2018)
Zhang, J., Xiao, L.: A stochastic composite gradient method with incremental variance reduction. Adv. NeurIPS 32, 9078–9088 (2019)
Google Scholar
Zhang, J., Xiao, L.: Stochastic Variance-reduced Prox-linear Algorithms for Nonconvex Composite Optimization. Math. Program., 1-43 (2021)
Zhao, R., Fan, J.: Global complexity bound of the Levenberg-Marquardt method. Optim. Methods Softw. 31(4), 805–814 (2016)
Article MathSciNet MATH Google Scholar

Download references

Funding

This work was partially supported by the Major Key Project of PCL (No. PCL2022A05), the National Natural Science Foundation of China (11871453, 12271278, 12026604 and 11971089), and Dalian High-level Talent Innovation Project (2020RD09). The authors have no relevant financial or non-financial interests to disclose.

Author information

Authors and Affiliations

Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, 100190, China
Jiani Wang
Peng Cheng Laboratory, Shenzhen, 518066, China
Xiao Wang
School of Mathematical Sciences, Dalian University of Technology, Dalian, 116024, China
Liwei Zhang

Authors

Jiani Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Liwei Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiao Wang.

Ethics declarations

Conflict of interest

The authors have not disclosed any competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

The following lemma refers to Theorem 1.6 in [34] and is about concentration inequalities for vector- and matrix-valued martingales, respectively.

Lemma 5.1

Let $({\mathcal {U}}_k)^m_{k=0}$ be a given filtration of the $\sigma $-algebra ${\mathcal {F}}$.

(i):

Let $(X_k)^m_{k=1}, X_k: \Omega \rightarrow {\mathbb {R}}^n$, be a family of random vectors, satisfying $X_k\in {{\mathcal {U}}_k}$ and $\sigma \in {{\mathbb {R}}^m}$ be a given vector with $\sigma _k\ne 0$, $k=1,\ldots ,m$. Suppose that ${\mathbb {E}}[X_k\ |{\mathcal {U}}_{k-1}]=0$, and ${\mathbb {E}}[\Vert X_k\Vert ^2\ |{\mathcal {U}}_{k-1}]\le \sigma _k^2$ almost everywhere for all $k\in {[m]}$. Then it holds

$$\begin{aligned} {\mathbb {E}}[\Vert \sum ^{m}_{k=1}X_k\Vert ^2\ |{\mathcal {U}}_0]\le \Vert \sigma \Vert ^2,\ {\mathbb {P}}(\Vert \sum ^{m}_{k=1}X_k\Vert \ge \tau \Vert \sigma \Vert \ |{\mathcal {U}}_0)\le \tau ^{-2},\ \ \forall \tau >0 \end{aligned}$$

almost everywhere.

(ii):

Let $(X_k)^m_{k=1}$, $X_k: \Omega \rightarrow {\mathbb {R}}^{d_1\times d_2}$, be a sequence of random matrices satisfying $X_k\in {{\mathcal {U}}_k}$. Suppose that ${\mathbb {E}}[X_k\ |{\mathcal {U}}_{k-1}]=0$, and there exists a positive constant R such that $\Vert X_k\Vert \le R$ almost everywhere for all $k\in {[m]}$. Define $\nu ^2=\max \{\Vert \sum _{k=1}^m {\mathbb {E}}(X_kX_k^T)\Vert ,\Vert \sum _{k=1}^m {\mathbb {E}}(X_k^TX_k)\Vert \}$. Then it holds

$$\begin{aligned} {\mathbb {P}}(\Vert \sum ^{m}_{k=1}X_k\Vert \ge t\ |{\mathcal {U}}_0)\le (d_1+d_2)\cdot \exp { (\frac{-t^2/2}{\nu ^2+Rt/3} )},\ \ \forall t>0 \end{aligned}$$

almost everywhere.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, J., Wang, X. & Zhang, L. Stochastic Regularized Newton Methods for Nonlinear Equations. J Sci Comput 94, 51 (2023). https://doi.org/10.1007/s10915-023-02099-4

Download citation

Received: 22 November 2021
Revised: 23 December 2022
Accepted: 04 January 2023
Published: 22 January 2023
DOI: https://doi.org/10.1007/s10915-023-02099-4

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Stochastic Regularized Newton Methods for Nonlinear Equations

Abstract

Access this article

Similar content being viewed by others

The Frank-Wolfe Algorithm: A Short Introduction

The Deep Ritz Method: A Deep Learning-Based Numerical Algorithm for Solving Variational Problems

Mean convergence theorems using hybrid methods to find common fixed points for noncommutative nonlinear mappings in Hilbert spaces

Data Availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

Lemma 5.1

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Stochastic Regularized Newton Methods for Nonlinear Equations

Abstract

Access this article

Similar content being viewed by others

The Frank-Wolfe Algorithm: A Short Introduction

The Deep Ritz Method: A Deep Learning-Based Numerical Algorithm for Solving Variational Problems

Mean convergence theorems using hybrid methods to find common fixed points for noncommutative nonlinear mappings in Hilbert spaces

Data Availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

Appendix

Lemma 5.1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation