General multilevel adaptations for stochastic approximation algorithms of Robbins–Monro and Polyak–Ruppert type

Dereich, Steffen; Müller-Gronbach, Thomas

doi:10.1007/s00211-019-01024-y

General multilevel adaptations for stochastic approximation algorithms of Robbins–Monro and Polyak–Ruppert type

Published: 06 February 2019

Volume 142, pages 279–328, (2019)
Cite this article

Numerische Mathematik Aims and scope Submit manuscript

Steffen Dereich¹ &
Thomas Müller-Gronbach²

518 Accesses
13 Citations
Explore all metrics

Abstract

In this article we present and analyse new multilevel adaptations of classical stochastic approximation algorithms for the computation of a zero of a function $f:D \rightarrow {{\mathbb {R}}}^d$ defined on a convex domain $D\subset {{\mathbb {R}}}^d$, which is given as a parameterised family of expectations. The analysis of the error and the computational cost of our method is based on similar assumptions as used in Giles (Oper Res 56(3):607–617, 2008) for the computation of a single expectation. Additionally, we essentially only require that f satisfies a classical contraction property from stochastic approximation theory. Under these assumptions we establish error bounds in pth mean for our multilevel Robbins–Monro and Polyak–Ruppert schemes that decay in the computational time as fast as the classical error bounds for multilevel Monte Carlo approximations of single expectations known from Giles (Oper Res 56(3):607–617, 2008). Our approach is universal in the sense that having multilevel implementations for a particular application at hand it is straightforward to implement the corresponding stochastic approximation algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

A continuation multilevel Monte Carlo algorithm

Article 05 September 2014

Nathan Collier, Abdul-Lateef Haji-Ali, … Raúl Tempone

Adaptive Polynomial Approximation by Means of Random Discrete Least Squares

Numerical and theoretical approximation results for Schurer–Stancu operators with shape parameter $$ \lambda $$

Article 17 May 2022

Khursheed J. Ansari, Faruk Özger & Zeynep Ödemiş Özger

References

Benveniste, A., Métivier, M., Priouret, P.: Adaptive Algorithms and Stochastic Approximations. Volume 22 of Applications of Mathematics (New York), vol. 22. Springer, Berlin (1990)
Book MATH Google Scholar
Duflo, M.: Algorithmes Stochastiques. Volume 23 of Mathématiques & Applications (Berlin) [Mathematics & Applications], vol. 23. Springer, Berlin (1996)
MATH Google Scholar
Frikha, N.: Multi-level stochastic approximation algorithms. Ann. Appl. Probab. 26, 933–985 (2016)
Article MathSciNet MATH Google Scholar
Gaposhkin, V.F., Krasulina, T.P.: On the law of the iterated logarithm in stochastic approximation processes. Theory Probab. Appl. 19(4), 844–850 (1974)
Article MathSciNet MATH Google Scholar
Giles, M.B.: Multilevel Monte Carlo path simulation. Oper. Res. 56(3), 607–617 (2008)
Article MathSciNet MATH Google Scholar
Heinrich, S.: Multilevel Monte Carlo methods. In: Margenov, S., Waśniewski, J., Yalamov, P. (eds.) Large-Scale Scientific Computing, pp. 58–67. Springer, Berlin (2001)
Chapter Google Scholar
Kushner, H.J., Yang, J.: Stochastic approximation with averaging of the iterates: optimal asymptotic rate of convergence for general processes. SIAM J. Control Optim. 31(4), 1045–1062 (1993)
Article MathSciNet MATH Google Scholar
Kushner, H.J., Yin, G.G.: Stochastic Approximation and Recursive Algorithms and Applications, Volume 35 of Applications of Mathematics (New York). Stochastic Modelling and Applied Probability, 2nd edn. Springer, New York (2003)
Google Scholar
Lai, T.L.: Stochastic approximation. Ann. Stat. 31(2), 391–406 (2003). Dedicated to the memory of Herbert E. Robbins
Article MATH Google Scholar
Lai, T.L., Robbins, H.: Limit theorems for weighted sums and stochastic approximation processes. Proc. Nat. Acad. Sci. U.S.A. 75, 1068–1070 (1978)
Article MathSciNet MATH Google Scholar
Le Breton, A., Novikov, A.: Some results about averaging in stochastic approximation. Metrika 42(3–4):153–171 (1995). Second International Conference on Mathematical Statistics (Smolenice Castle, 1994)
Ljung, L., Pflug, G., Walk, H.: Stochastic Approximation and Optimization of Random Systems. Volume 17 of DMV Seminar, vol. 17. Birkhäuser Verlag, Basel (1992)
Book MATH Google Scholar
Nualart, D.: The Malliavin Calculus and Related Topics. Probability and Its Applications (New York), 2nd edn. Springer, Berlin (2006)
MATH Google Scholar
Pelletier, M.: On the almost sure asymptotic behaviour of stochastic algorithms. Stoch. Process. Appl. 78(2), 217–244 (1998)
Article MathSciNet MATH Google Scholar
Pelletier, M.: Weak convergence rates for stochastic approximation with application to multiple targets and simulated annealing. Ann. Appl. Probab. 8(1), 10–44 (1998)
Article MathSciNet MATH Google Scholar
Polyak, B.T.: A new method of stochastic approximation type. Avtomat. i Telemekh. 51(7), 937–1008 (1998)
MathSciNet MATH Google Scholar
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22, 400–407 (1951)
Article MathSciNet MATH Google Scholar
Ruppert, D.: Almost sure approximations to the Robbins-Monro and Kiefer-Wolfowitz processes with dependent noise. Ann. Probab. 10, 178–187 (1982)
Article MathSciNet MATH Google Scholar
Ruppert, D.: Stochastic Approximation. In: Ghosh, B.K., Sen, P.K. (eds.) Handbook of Sequential Analysis. Volume 118 of Statist. Textbooks Monogr., pp. 503–529. Dekker, New York (1991)
Google Scholar

Download references

Acknowledgements

We thank two anonymous referees for their valuable comments, which improved the presentation of the material.

Author information

Authors and Affiliations

Fachbereich 10: Mathematik und Informatik, Institut für Mathematische Statistik, Westfälische Wilhelms-Universität Münster, Orléans-Ring 10, 48149, Münster, Germany
Steffen Dereich
Fakultät für Informatik und Mathematik, Universität Passau, Innstraße 33, 94032, Passau, Germany
Thomas Müller-Gronbach

Authors

Steffen Dereich
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Müller-Gronbach
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thomas Müller-Gronbach.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Let $(\Omega ,{\mathcal {F}},{P})$ be a probability space endowed with a filtration $({\mathcal {F}}_n)_{n\in {{\mathbb {N}}}_0}$ and let $\Vert \cdot \Vert $ denote a Hilbert space norm on ${{\mathbb {R}}}^d$.

In this section we provide pth mean estimates for an adapted d-dimensional dynamical system $(\zeta _n)_{n\in {{\mathbb {N}}}_0}$ with the property that for each $n\in {{\mathbb {N}}}$, $\zeta _n$ is a zero-mean perturbation of a previsible proposal $\xi _n$ being comparable in size to $\zeta _{n-1}$. More formally, we assume that there exist a previsible d-dimensional process $(\xi _n)_{n\in {{\mathbb {N}}}}$, a d-dimensional martingale $(M_n)_{n\in {{\mathbb {N}}}_0}$ with $M_0=\zeta _0$ and a constant $c\ge 0$ such that for all $n\in {{\mathbb {N}}}$

$$\begin{aligned} \begin{aligned} \zeta _{n}&= \xi _{n}+\Delta M_n,\\ \Vert \xi _n\Vert&\le \Vert \zeta _{n-1}\Vert \vee c, \end{aligned} \end{aligned}$$

(106)

where $\Delta M_n = M_n-M_{n-1}$. Note that necessarily $\xi _n={{\mathbb {E}}}[\zeta _n|{\mathcal {F}}_{n-1}]$.

Theorem 5.1

Assume that $(\zeta _n)_{n\in {{\mathbb {N}}}_0}$ is an adapted d-dimensional process, which satisfies (106), and let $p\in [1,\infty )$. Then there exists a constant $\kappa \in (0,\infty )$, which only depends on p, such that for every $n\in {{\mathbb {N}}}_0$,

$$\begin{aligned} {{\mathbb {E}}}\left[ \max _{0\le k\le n}\Vert \zeta _k\Vert ^p\right] \le \kappa \, \bigl ( {{\mathbb {E}}}\bigl [ [M]_n^{p/2}\bigr ] + c^p\bigr ), \end{aligned}$$

where

$$\begin{aligned} {}[M]_n=\sum _{k=1}^{n} \Vert \Delta M_k\Vert ^2 +\Vert M_0\Vert ^2. \end{aligned}$$

Proof

Fix $p\in [1,\infty )$.

We first consider the case where $c=0$. Recall that by the Burkholder-Davis-Gundy inequality there exists a constant $\bar{\kappa }>0$ depending only on p such that for every d-dimensional martingale $(M_n)_{n\in {{\mathbb {N}}}_0}$,

$$\begin{aligned} {{\mathbb {E}}}\bigl [\max _{0\le k\le n}\Vert M_k\Vert ^p\bigr ]\le {\bar{\kappa }}\, {{\mathbb {E}}}\bigl [ [M]_n^{p/2}\bigr ]. \end{aligned}$$

We fix a time horizon $T\in {{\mathbb {N}}}_0$ and prove the statement of the theorem with $\kappa = {\bar{\kappa }}$ by induction: we say that the statement holds up to time $t\in \{0,\dots ,T\}$, if for every d-dimensional adapted process $(\zeta _n)_{n\in {{\mathbb {N}}}_0}$, for every d-dimensional previsible process $(\xi _n)_{n\in {{\mathbb {N}}}}$ and for every d-dimensional martingale $(M_n)_{n\in {{\mathbb {N}}}_0}$ with

one has

$$\begin{aligned} {{\mathbb {E}}}\left[ \max _{0\le n\le T}\Vert \zeta _n\Vert ^p\right] \le {\bar{\kappa }}\, {{\mathbb {E}}}\bigl [ [M]_T^{p/2}\bigr ]. \end{aligned}$$

Clearly, the statement is satisfied up to time 0 as a consequence of the Burkholder–Davis–Gundy inequality. Next, suppose that the statement is satisfied up to time $t\in \{0,\dots ,T-1\}$. Let $(\zeta _n)_{n\in {{\mathbb {N}}}_0}$ be a d-dimensional adapted process, $(\xi _n)_{n\in {{\mathbb {N}}}}$ be a d-dimensional previsible process and $(M_n)_{n\in {{\mathbb {N}}}_0}$ be a d-dimensional martingale satisfying property ($C_{t+1}$). Consider any ${\mathcal {F}}_{t}$-measurable random orthonormal transformation U on $({{\mathbb {R}}}^d,\Vert \cdot \Vert )$ and put

$$\begin{aligned} \zeta ^U_n={\left\{ \begin{array}{ll} \zeta _n,&{}\quad \text { if }n\le t,\\ \zeta _{t}+U(M_{n}-M_{t}), &{}\quad \text { if }n>t\end{array}\right. } \end{aligned}$$

as well as

$$\begin{aligned} M^U_n={\left\{ \begin{array}{ll} M_n,&{}\quad \text { if }n\le t,\\ M_{t}+ U(M_{n}-M_{t}), &{}\quad \text { if }n>t.\end{array}\right. } \end{aligned}$$

Then it is easy to check that $(M^U_n)_{n\in {{\mathbb {N}}}_0}$ is a martingale with $[M^U]_n = [M]_n$ for all $n\in {{\mathbb {N}}}$. Furthermore, $(\zeta ^U_n)_{n\in {{\mathbb {N}}}_0}$ is adapted and the triple $(\zeta ^U,\xi , M^U)$ satisfies property ($C_t$). Hence, by the induction hypothesis,

$$\begin{aligned} {{\mathbb {E}}}\bigl [\max _{0\le n\le T}\Vert \zeta ^U_n\Vert ^p\bigr ] \le {\bar{\kappa }}\, {{\mathbb {E}}}\bigl [ [M^U]_T^{p/2}\bigr ] ={\bar{\kappa }}\, {{\mathbb {E}}}\bigl [ [M]_T^{p/2}\bigr ]. \end{aligned}$$

(107)

Note that for any such random orthonormal transformation U, the norm of the random variable $\zeta _n^U$ is the same as the norm of the variable ${\bar{\zeta }}_n^U$ given by

$$\begin{aligned} {\bar{\zeta }}^U_n={\left\{ \begin{array}{ll} \zeta _n,&{}\quad \text { if }n\le t,\\ U^* \zeta _{t}+ M_{n}-M_{t}, &{}\quad \text { if }n>t,\end{array}\right. } \end{aligned}$$

whence

$$\begin{aligned} {{\mathbb {E}}}\left[ \max _{0\le n\le T}\Vert {\bar{\zeta }}^U_n\Vert ^p\right] = {{\mathbb {E}}}\left[ \max _{0\le n\le T}\Vert \zeta ^U_n\Vert ^p\right] . \end{aligned}$$

(108)

Clearly, we can choose an ${\mathcal {F}}_{t}$-measurable random orthonormal transformation U on $({{\mathbb {R}}}^d,\Vert \cdot \Vert )$ such that

$$\begin{aligned} U^* \zeta _t = \frac{\Vert \zeta _t\Vert }{\Vert \xi _{t+1}\Vert } \xi _{t+1} \end{aligned}$$

holds on $\{\xi _{t+1}\ne 0\}$. Let

$$\begin{aligned} \alpha = \frac{\Vert \xi _{t+1}\Vert +\Vert \zeta _t\Vert }{2\Vert \zeta _t\Vert }\cdot 1_{\{\zeta _t\ne 0\}}. \end{aligned}$$

Then $\alpha $ is ${\mathcal {F}}_{t}$-measurable and takes values in [0, 1] since $\Vert \xi _{t+1}\Vert \le \Vert \zeta _t\Vert $. Moreover, we have $\xi _{t+1} = \alpha U^* \zeta _t + (1-\alpha ) (-U)^* \zeta _t$ so that by property ($C_{t+1}$) of the triple $(\zeta ,\xi ,M)$,

$$\begin{aligned} \zeta _n= \xi _{t+1} + M_n-M_t = \alpha {\bar{\zeta }}^U_n+(1-\alpha ) {\bar{\zeta }}_n^{-U} \end{aligned}$$

for $n= t+1,\dots ,T$. Note that $\zeta _n=\zeta ^U_n=\zeta _n^{-U}$ for $n=0,\dots ,t$. By convexity of $\Vert \cdot \Vert ^p$ we thus obtain

$$\begin{aligned} \max _{0\le n\le T}\Vert {\bar{\zeta }}^U_n\Vert ^p&= \max _{0\le n\le T}\Vert \alpha {\bar{\zeta }}^U_n+ (1-\alpha ) {\bar{\zeta }}^{-U}_n\Vert ^p \\&\le \alpha \max _{0\le n\le T}\Vert {\bar{\zeta }}^U_n\Vert ^p + (1-\alpha )\max _{0\le n\le T}\Vert {\bar{\zeta }}^{-U}_n\Vert ^p. \end{aligned}$$

Hence

$$\begin{aligned} {{\mathbb {E}}}\left[ \max _{0\le n\le T}\Vert \zeta _{n}\Vert ^p|{\mathcal {F}}_{t}\right]&\le \alpha {{\mathbb {E}}}\left[ \max _{0\le n\le T}\Vert {\bar{\zeta }}^U_n\Vert ^p|{\mathcal {F}}_{t}\right] + (1-\alpha ){{\mathbb {E}}}\left[ \max _{0\le n\le T}\Vert {\bar{\zeta }}^{-U}_n\Vert ^p|{\mathcal {F}}_{t}\right] \\&\le {{\mathbb {E}}}\left[ \max _{0\le n\le T}\Vert {\bar{\zeta }}^{U'}_n\Vert ^p|{\mathcal {F}}_{t}\right] , \end{aligned}$$

where $U'$ is the ${\mathcal {F}}_{t}$-measurable random orthonormal transformation given by

$$\begin{aligned} U'(\omega )= {\left\{ \begin{array}{ll} U(\omega )&{}\quad \text { if }\omega \in \bigl \{ {{\mathbb {E}}}[\max _{0\le n\le T}\Vert {\bar{\zeta }}^U_n\Vert ^p|{\mathcal {F}}_{t}]\ge {{\mathbb {E}}}[\max _{0\le n\le T}\Vert {\bar{\zeta }}^{- U}_n\Vert ^p|{\mathcal {F}}_{t}]\bigr \},\\ -U(\omega ) &{}\quad \text { otherwise}. \end{array}\right. } \end{aligned}$$

Applying (107) and (108) with $U=U'$ finishes the induction step.

Next, we consider the case of $c > 0$. Suppose that $\zeta ,\xi $ and M are as stated in the theorem. For $n\in {{\mathbb {N}}}$ we put

$$\begin{aligned} {\tilde{\xi }}_n = (1-c/\Vert \xi _n\Vert )_+ \cdot \xi _n \end{aligned}$$

and

$$\begin{aligned} {\tilde{\zeta }}_n = {\tilde{\xi }}_n + \Delta M_n. \end{aligned}$$

Furthermore, let ${\tilde{\zeta }}_0=\zeta _0=M_0$. We will show that the triple $({\tilde{\zeta }},{\tilde{\xi }},M)$ satisfies (106) with $c=0$. Clearly, $({\tilde{\zeta }}_n)_{n\in {{\mathbb {N}}}_0}$ is adapted and $(\tilde{\xi }_n)_{n\in {{\mathbb {N}}}}$ is previsible. Moreover, one has for $n\in {{\mathbb {N}}}$ on $\{\Vert \xi _n\Vert \ge c\}$ that

$$\begin{aligned} \Vert {\tilde{\xi }}_n\Vert&= \Vert \xi _n\Vert -c\le \Vert \zeta _{n-1}\Vert -c=\Vert \tilde{\zeta }_{n-1}+\xi _{n-1}-{\tilde{\xi }}_{n-1}\Vert -c\\&\le \Vert {\tilde{\zeta }}_{n-1}\Vert +\Vert \xi _{n-1}-{\tilde{\xi }}_{n-1}\Vert -c = \Vert {\tilde{\zeta }}_{n-1}\Vert \end{aligned}$$

and on $\{\Vert \xi _n\Vert < c\}$ that $\Vert {\tilde{\xi }}_n\Vert =0\le \Vert \tilde{\zeta }_{n-1}\Vert $. We may thus apply Theorem 5.1 with $c=0$ to obtain that for every $n\in {{\mathbb {N}}}$,

$$\begin{aligned} {{\mathbb {E}}}\left[ \max _{0\le k \le n}\Vert {\tilde{\zeta }}_n\Vert ^p\right] \le \bar{\kappa }\, {{\mathbb {E}}}\bigl [ [ M]_n^{p/2}\bigr ]. \end{aligned}$$

Since for every $n\in {{\mathbb {N}}}$,

$$\begin{aligned} \Vert \zeta _n\Vert ^p = \Vert {\tilde{\zeta }}_n + \xi _n-{\tilde{\xi }}_n\Vert ^p \le 2^p(\Vert {\tilde{\zeta }}_n\Vert ^p + c^p), \end{aligned}$$

we conclude that

$$\begin{aligned} {{\mathbb {E}}}\left[ \max _{0\le k \le n}\Vert \zeta _n\Vert ^p\right] \le 2^p\bigl (\bar{\kappa }\, {{\mathbb {E}}}\bigl [ [ M]_n^{p/2}\bigr ] + c^p\bigr ) \le 2^p({\bar{\kappa }} \vee 1) \cdot \bigl ( {{\mathbb {E}}}\bigl [ [ M]_n^{p/2}\bigr ] + c^p\bigr ), \end{aligned}$$

which completes the proof. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dereich, S., Müller-Gronbach, T. General multilevel adaptations for stochastic approximation algorithms of Robbins–Monro and Polyak–Ruppert type. Numer. Math. 142, 279–328 (2019). https://doi.org/10.1007/s00211-019-01024-y

Download citation

Received: 01 May 2017
Revised: 07 January 2019
Published: 06 February 2019
Issue Date: 01 June 2019
DOI: https://doi.org/10.1007/s00211-019-01024-y

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

General multilevel adaptations for stochastic approximation algorithms of Robbins–Monro and Polyak–Ruppert type

Abstract

Access this article

Similar content being viewed by others

A continuation multilevel Monte Carlo algorithm

Adaptive Polynomial Approximation by Means of Random Discrete Least Squares

Numerical and theoretical approximation results for Schurer–Stancu operators with shape parameter $$ \lambda $$

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Theorem 5.1

Proof

Rights and permissions

About this article

Cite this article

Mathematics Subject Classification

Navigation

General multilevel adaptations for stochastic approximation algorithms of Robbins–Monro and Polyak–Ruppert type

Abstract

Access this article

Similar content being viewed by others

A continuation multilevel Monte Carlo algorithm

Adaptive Polynomial Approximation by Means of Random Discrete Least Squares

Numerical and theoretical approximation results for Schurer–Stancu operators with shape parameter $$ \lambda $$

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

Theorem 5.1

Proof

Rights and permissions

About this article

Cite this article

Share this article

Mathematics Subject Classification

Search

Navigation