Skip to main content
Log in

Steepest Descent Method with Random Step Lengths

  • Published:
Foundations of Computational Mathematics Aims and scope Submit manuscript

Abstract

The paper studies the steepest descent method applied to the minimization of a twice continuously differentiable function. Under certain conditions, the random choice of the step length parameter, independent of the actual iteration, generates a process that is almost surely R-convergent for quadratic functions. The convergence properties of this random procedure are characterized based on the mean value function related to the distribution of the step length parameter. The distribution of the random step length, which guarantees the maximum asymptotic convergence rate independent of the detailed properties of the Hessian matrix of the minimized function, is found, and its uniqueness is proved. The asymptotic convergence rate of this optimally created random procedure is equal to the convergence rate of the Chebyshev polynomials method. Under practical conditions, the efficiency of the suggested random steepest descent method is degraded by numeric noise, particularly for ill-conditioned problems; furthermore, the asymptotic convergence rate is not achieved due to the finiteness of the realized calculations. The suggested random procedure is also applied to the minimization of a general non-quadratic function. An algorithm needed to estimate relevant bounds for the Hessian matrix spectrum is created. In certain cases, the random procedure may surpass the conjugate gradient method. Interesting results are achieved when minimizing functions having a large number of local minima. Preliminary results of numerical experiments show that some modifications of the presented basic method may significantly improve its properties.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. except for the cases when the Hessian matrix in the local minimum of \(V(\mathsf {x})\) is singular

  2. The indirect way to minimize \(V(\mathsf {x}_{j+s})\) using s steps of the conjugate gradient method is not considered now because it differs from the steepest descent method (1.1).

  3. The possibility that the Barzilai–Borwein algorithm can converge linearly with the logarithmic convergence rate \(\sim \frac{2}{\kappa (\mathsf {A})+1}\), when an inappropriate starting point is chosen, is mentioned already in the paper [2]. Nevertheless, this situation occurs with zero probability.

  4. Only in the first step, no vectors \(\mathsf {r}_{j-1}-\mathsf {r}_j\) and the values \(\gamma _j\) for \(j\le 0\) are defined, so the values \(\overline{\lambda }^{(1)}\), \(\underline{\Lambda }^{(1)}\) were expressed directly using the vector \(\mathsf {Ar}_0\).

  5. The importance of the deviations in comparison with the R-linear descent of the residual norm decreases by increasing the number of steps, but the limited accuracy of the computation does not allow us realize the related observation.

  6. The convergence factor (3.15) is applicable only to the eigenspaces with \(\lambda _i\in \langle \overline{\lambda }^{(n)},\underline{\Lambda }^{(n)}\rangle \).

  7. The narrowed interval of the possible values \(l_j\) causes the more efficient suppression of the coordinates \(\beta _i^{(n)}\) for \(\lambda _i\) from the actual interval \(\langle \overline{\lambda }^{(n)},\underline{\Lambda }^{(n)}\rangle \); the number of the eigenvalues \(\lambda _i\not \in \langle \overline{\lambda }^{(n)},\underline{\Lambda }^{(n)}\rangle \) is much lower than the problem dimension M, and the insufficient efficiency of the process on the related eigenspaces does not significantly affect the total value \(\Vert \mathsf {r}_n\Vert \), while the decrease in the norm \(\Vert \mathsf {r}_n\Vert \) is small.

  8. The algorithm itself does not need the second derivatives; nevertheless, they are hidden in the considerations in “Appendix 6.”

  9. The number of steps is equal to the number of calculated gradients for other studied methods.

  10. A “safety” parameter of this kind is implicitly present in the algorithm as a consequence of the inexact estimates \(\overline{\lambda }^{(n)}>\lambda \), \(\underline{\Lambda }^{(n)}<\Lambda \).

  11. Algorithm 2 is a special realization of this idea.

  12. In fact, the attempt to interpret the Barzilai–Borwein method in a nonstandard way was the origin of the work resulting in this article.

  13. The mean value \(\widehat{E}_R(\lambda _i)\) exists by the definition of the point of the first type relative to the measure \(\phi \) and the condition \(\beta ^{(0)}_i\ne 0\) is included in the assumption \(\lambda _i\in \mathcal {S}_1^{(\phi )}(\mathsf {A,b})\).

  14. We refuse to waste time calculating the gradients \(\mathsf {g}(\mathsf {x}_0-\gamma _1\mathsf {g}_0)\) until it is needed for the following step creation.

References

  1. H. Akaike: On a successive transformation of probability distribution and its application to the analysis of the optimum gradient method. Ann. Inst. Statist. Math. Tokyo 11 (1959), 1 – 16

    Article  MathSciNet  MATH  Google Scholar 

  2. J. Barzilai and J. M. Borwein: Two-point step size gradient methods. IMA Journal of Numerical Analysis 8 (1988), 141 – 148

    Article  MathSciNet  MATH  Google Scholar 

  3. A. Cauchy: Méthode génerale pour la résolution des systèmes d’ équations simultanées. Comptes Rendus Hebd. Séances Acad. Sci. 25 (1847), 536–538,

    Google Scholar 

  4. Y. H. Dai and R. Fletcher: Projected Barzilai-Borwein methods for large-scale box-constrained quadratic programming. Numerische Mathematik 100 (2005), No. 1, 21 – 47

    Article  MathSciNet  MATH  Google Scholar 

  5. Y. H. Dai and L. Z. Liao: \(R\)-linear convergence of the Barzilai-Borwein gradient method. IMA Journal of Numerical Analysis vol. 22 (2002), 1 – 10

    Article  MathSciNet  MATH  Google Scholar 

  6. R. Fletcher and C. M. Reeves: Function minimization by conjugate gradients, The Computer Journal 1964, 149 – 154

  7. G. E. Forsythe: On the asymptotic directions of the \(s\)-dimensional optimum gradient method. Numerische Mathematik 11 (1968), 57 – 76

    Article  MathSciNet  MATH  Google Scholar 

  8. A. Friedlander, J. M. Martínez, N. Molina and M. Raydan: Gradient method with retards and generalizations. SIAM J. Numer. Anal. 36 (1999), 275 – 289

    Article  MathSciNet  MATH  Google Scholar 

  9. Kalousek, Z.: Appeal of inexact calculations in Proceedings of conference “Modern mathematical methods in engineering”, VŠB-TU Ostrava, 2013

  10. F. Luengo and M. Raydan: Gradient method with dynamical retards for large-scale optimization problems. Electronic Transactions on Numerical Analysis, 16 (2003), 186 – 193

    MathSciNet  MATH  Google Scholar 

  11. Y. Narushima, T. Wakamatsu and H. Yabe: Extended Barzilai-Borwein method for unconstrained minimization problems. Pacific journal of optimization 6 (2010), 591 - 613

    MathSciNet  MATH  Google Scholar 

  12. L. Pronzato, P. Wynn and A. A. Zhigljavsky: A dynamical-system analysis of the optimum \(s\)-gradient algorithm in Optimal design and related areas in optimization and statistics (editors L. Pronzato and A. A. Zhigljavsky) , Springer, 2009, pp. 39 – 80

  13. L. Pronzato and A. A. Zhigljavsky: Gradient algorithms for quadratic optimization with fast convergence rates. Computational Optimization and Applications 50 (2011), 597 – 617

    Article  MathSciNet  MATH  Google Scholar 

  14. M. Raydan: The Barzilai and Borwein gradient method for the large scale unconstrained minimization problem. SIAM Journal on Optimization 7 (1997), 26–33

    Article  MathSciNet  MATH  Google Scholar 

  15. M. Raydan and B. F. Svaiter: Relaxed steepest descent and Cauchy-Barzilai-Borwein method. Computational Optimization and Applications 21 (2002), 155 – 167

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zdeněk Kalousek.

Additional information

Communicated by Michael Overton.

Appendices

Appendix 1: Proof of Theorem 1

First, we prove an important property of the points of the second type:

Lemma 3

If \(\hat{x}\in \langle \lambda ,\Lambda \rangle \) is a point of the second type relative to the measure \(\phi \), then

$$\begin{aligned} \lim _{\delta \rightarrow 0^+}\int \limits _{ \langle \lambda ,\Lambda \rangle \setminus (\hat{x}-\delta ,\hat{x}+\delta )} \ln \left| 1-\frac{\hat{x}}{l}\right| \mathrm {d}\phi (l)=-\infty \ . \end{aligned}$$

Proof

Since \(\phi (\{\hat{x}\})=0\), it holds

$$\begin{aligned} \widehat{E}_R(\hat{x})= & {} \int \limits _{\langle \lambda ,\Lambda \rangle } \ln \left| 1-\frac{\hat{x}}{l}\right| \mathrm {d}\phi (l) =\int \limits _{\langle \lambda ,\hat{x})\cup (\hat{x},\Lambda \rangle } \ln \left| 1-\frac{\hat{x}}{l}\right| \mathrm {d}\phi (l)\nonumber \\= & {} \lim _{\delta \rightarrow 0^+}\int \limits _{ \langle \lambda ,\Lambda \rangle \setminus (\hat{x}-\delta ,\hat{x}+\delta )} \ln \left| 1-\frac{\hat{x}}{l}\right| \mathrm {d}\phi (l)\ . \end{aligned}$$
(8.1)

The value \(\widehat{E}_R(\hat{x})\) is undefined; therefore, the limit on the right-hand side of (8.1) is either infinite or it does not exist at all. The function

$$\begin{aligned} g(\delta )=\int \limits _{ \langle \lambda ,\Lambda \rangle \setminus (\hat{x}-\delta ,\hat{x}+\delta )} \ln \left| 1-\frac{\hat{x}}{l}\right| \mathrm {d}\phi (l) \end{aligned}$$

is non-decreasing on \((0,\frac{\hat{x}}{2})\), as follows from

$$\begin{aligned} g(\delta _1)-g(\delta _2)=\int \limits _{ \langle \lambda ,\Lambda \rangle \cap [(\hat{x}-\delta _2,\hat{x}-\delta _1\rangle \cup \langle \hat{x}+\delta _1,\hat{x}+\delta _2)]} \ln \left| 1-\frac{\hat{x}}{l}\right| \mathrm {d}\phi (l)\le 0 \end{aligned}$$
(8.2)

for \(0<\delta _1<\delta _2<\frac{\hat{x}}{2}\), because the argument of the integrated logarithm is

$$\begin{aligned} \left| 1-\frac{\hat{x}}{l}\right| =\frac{|l-\hat{x}|}{l}<\frac{~\frac{\hat{x}}{2}~}{\frac{\hat{x}}{2}} =1 \end{aligned}$$

and a non-positive function is integrated in (8.2). The limit of the non-decreasing function \(g(\delta )\) in (8.1) exists, and since it is not finite according to the assumptions, it must be equal to \(-\infty \). \(\square \)

Now, we can prove the particular statements of Theorem 1. The convergence of the process to the solution is equivalent to the convergence of the residuals \(\mathsf {r}_n\) to zero, so we will deal with the sequence of the residuals.

  1. 1.

    We will observe the coordinates \(\beta _i^{(j)}\) of the residual \(\mathsf {r}_j\) according to (2.1). The set \(\mathcal {S}_3^{(\phi )}(\mathsf {A,b})\) is finite; therefore, the value

    $$\begin{aligned} s_{\phi ,\mathsf {A,b}} =\min _{\lambda _i\in \mathcal {S}_3^{(\phi )}(\mathsf {A,b})} \phi \left( \{\lambda _i\}\right) >0 \end{aligned}$$

    exists. This result implies the inequality

    $$\begin{aligned} \Pr \left( l_j\ne \lambda _i\right) \le 1-s_{\phi ,\mathsf {A,b}} \end{aligned}$$

    for any eigenvalue \(\lambda _i\in \mathcal {S}(\mathsf {A,b})\). If at least one value \(l_j=\lambda _i\) for \(j=1,2,\dots ,n\), then as a consequence of (2.2), \(\beta _i^{(n)}=0\). Thus, the independence of the values \(l_j\) for different j leads to

    $$\begin{aligned} \Pr \left( \beta _i^{(n)}\ne 0\right) =\Pr \biggl (\bigcap _{j=1}^n\left[ l_j\ne \lambda _i\right] \biggr ) =\prod _{j=1}^n\Pr \left( l_j\ne \lambda _i\right) \le (1-s_{\phi ,\mathsf {A,b}})^n\ . \end{aligned}$$
    (8.3)

    The probability that the residual \(\mathsf {r}_n\ne 0\) is the same as the probability of at least one coordinate \(\beta _i^{(n)}\) of the residual \(\mathsf {r}_n\) being nonzero, so

    $$\begin{aligned} \Pr \left( \mathsf {r}_n\ne 0\right) =\Pr \biggl (\bigcup _{i=1}^N\left[ \beta _i^{(n)}\ne 0\right] \biggr ) \le \sum _{i=1}^N\Pr \left( \beta _i^{(n)}\ne 0\right) \le N(1-s_{\phi ,\mathsf {A,b}})^n \end{aligned}$$

    , and this probability converges to zero at \(n\rightarrow \infty \).

  2. 2.

    In the case \(\mathcal {S}_1^{(\phi )}(\mathsf {A,b})\ne \emptyset \), we get, according to (2.3) and (2.4) and due to the validity of the strong law of large numbers,

    $$\begin{aligned} \lim _{n\rightarrow \infty }\frac{1}{n}\ln \frac{|\beta ^{(n)}_i|}{|\beta ^{(0)}_i|} =\lim _{n\rightarrow \infty }\frac{1}{n}\sum _{j=1}^n\widetilde{R}_i^{(j)} =\widehat{E}_R(\lambda _i) \end{aligned}$$
    (8.4)

    for any \(\lambda _i\in \mathcal {S}_1^{(\phi )}(\mathsf {A,b})\) with probability one.Footnote 13

    1. (a)

      Let \(E_{\phi ,\mathsf {A,b}}>0\). Then, there exists an eigenvalue \(\lambda _{\hat{\imath }}\in \mathcal {S}_1^{(\phi )}(\mathsf {A,b})\) such that \(\widehat{E}_R(\lambda _{\hat{\imath }})=E_{\phi ,\mathsf {A,b}}>0\). We express the probability that there exists an infinite bounded subsequence of coordinates \(\{\beta _{\hat{\imath }}^{(n_k)}\}_{k>0}\). This event occurs, when for some \(C\in \mathcal {R}\) and all \(k\in \mathcal {N}\)

      $$\begin{aligned} \frac{1}{n_k}\ln \frac{|\beta ^{(n_k)}_{\hat{\imath }}|}{|\beta ^{(0)}_{\hat{\imath }}|} <\frac{1}{n_k}\ln \frac{C}{|\beta ^{(0)}_{\hat{\imath }}|}\ ; \end{aligned}$$
      (8.5)

      The necessary condition to satisfy (8.5) is

      $$\begin{aligned} \limsup _{k\rightarrow \infty }\frac{1}{n_k}\ln \frac{|\beta ^{(n_k)}_{\hat{\imath }}|}{|\beta ^{(0)}_{\hat{\imath }}|} \le \limsup _{k\rightarrow \infty }\frac{1}{n_k}\ln \frac{C}{|\beta ^{(0)}_{\hat{\imath }}|} =0\ . \end{aligned}$$
      (8.6)

      The relation (8.4) applied for \(\lambda _{\hat{\imath }}\)

      $$\begin{aligned} \lim _{n\rightarrow \infty }\frac{1}{n}\ln \frac{|\beta ^{(n)}_{\hat{\imath }}|}{|\beta ^{(0)}_{\hat{\imath }}|} =\widehat{E}_R(\lambda _{\hat{\imath }})>0 \end{aligned}$$

      characterizes an event, which is incompatible with the event characterized by (8.6). Therefore, the probability of the existence of an infinite bounded subsequence of coordinates \(\{\beta _{\hat{\imath }}^{(n_k)}\}_{k>0}\) is

      $$\begin{aligned} \Pr \left( \forall k\in \mathcal {N}:|\beta ^{(n_k)}_{\hat{\imath }}|<C\right)\le & {} \Pr \biggl (\limsup _{k\rightarrow \infty } \frac{1}{n_k}\ln \frac{|\beta ^{(n_k)}_{\hat{\imath }}|}{|\beta ^{(0)}_{\hat{\imath }}|}\le 0\biggr )\ \le \\\le & {} 1-\Pr \biggl (\lim _{n\rightarrow \infty }\frac{1}{n}\ln \frac{|\beta ^{(n)}_{\hat{\imath }}|}{|\beta ^{(0)}_{\hat{\imath }}|}>0\biggr )\ =\ 0, \end{aligned}$$

      and it implies

      $$\begin{aligned} \lim _{n\rightarrow \infty }|\beta ^{(n)}_{\hat{\imath }}|=\infty \end{aligned}$$

      with probability one. Since

      $$\begin{aligned} \Vert \mathsf {r}_n\Vert \ge \max _{\lambda _i\in \mathcal {S}^{(\phi )}(\mathsf {A,b})}|\beta ^{(n)}_i|\ , \end{aligned}$$

      the sequence of residuals and consequently the sequence of the iterations are almost surely divergent.

    2. (b)

      We prove the convergence of the process in both cases 2b) and 3) together. We will study the convergence of the sequence \(\{\beta _i^{(n)}\}_{n\ge 0}\) for general \(i=1,2,\dots ,N\) such that \(\lambda _i\in \mathcal {S}^{(\phi )}(\mathsf {A,b})\). If \(\lambda _i\in \mathcal {S}_3^{(\phi )}(\mathsf {A,b})\), the relation (8.3) implies

      $$\begin{aligned} \Pr \left( \lim _{n\rightarrow \infty }\beta _i^{(n)}=0\right) =1\ . \end{aligned}$$
      (8.7)

      For \(\lambda _i\in \mathcal {S}_1^{(\phi )}(\mathsf {A,b})\), the mean value \(\widehat{E}_R(\lambda _i)\) exists. Analogously to (8.4)

      $$\begin{aligned} \lim _{n\rightarrow \infty }\frac{1}{n}\ln \frac{|\beta ^{(n)}_i|}{|\beta ^{(0)}_i|} =\lim _{n\rightarrow \infty }\frac{1}{n}\sum _{j=1}^n\widetilde{R}_i^{(j)} =\widehat{E}_R(\lambda _i)\le E_{\phi ,\mathsf {A,b}}<0 \end{aligned}$$
      (8.8)

      with probability one. Let us suppose

      $$\begin{aligned} \lim _{n\rightarrow \infty }\beta _i^{(n)}\ne 0\ . \end{aligned}$$

      We formulate this assumption now in the standard way: There exists \(\varepsilon >0\) and some infinite subsequence \(\{n_k\}_{k\ge 1}\subset \mathcal {N}\) such that \(|\beta _i^{(n_k)}|\ge \varepsilon \) for all \(k\in \mathcal {N}\). The technique applied above gives

      $$\begin{aligned} \Pr \left( \lim _{n\rightarrow \infty }\beta _i^{(n)}\ne 0\right)= & {} \Pr \left( \forall k\in \mathcal {N}:|\beta _i^{(n_k)}|\ge \varepsilon \right) \nonumber \\\le & {} \Pr \biggl (\liminf _{k\rightarrow \infty } \frac{1}{n_k}\ln \frac{|\beta _i^{(n_k)}|}{|\beta ^{(0)}_i|}\ge 0\biggr )\\\le & {} 1-\Pr \biggl (\lim _{n\rightarrow \infty }\frac{1}{n}\ln \frac{|\beta ^{(n)}_i|}{|\beta ^{(0)}_i|}<0\biggr )\ =\ 0\ . \end{aligned}$$

      Thus, the relation (8.7) holds for \(\lambda _i\in \mathcal {S}_1^{(\phi )}(\mathsf {A,b})\) too. Let \(\lambda _i\in \mathcal {S}_2^{(\phi )}(\mathsf {A,b})\). Then, the statement of Lemma 3. implies for any \(K<0\) the existence of such \(\delta _K>0\) that

      $$\begin{aligned} \int \limits _{ \langle \lambda ,\Lambda \rangle \setminus (\lambda _i-\delta _K,\lambda _i+\delta _K)} \ln \left| 1-\frac{\lambda _i}{l}\right| \mathrm {d}\phi (l)\le K\ , \end{aligned}$$
      (8.9)

      this integral being finite. We define the random variable

      $$\begin{aligned} Q_i^{(j)}=\left\{ \begin{array}{ll} \ln \left| 1-\frac{\lambda _i}{l_j}\right| &{}\hbox {for} \quad l_j\in \langle \lambda ,\Lambda \rangle \setminus (\lambda _i-\delta _K,\lambda _i+\delta _K)\ ,\\ 0&{}\hbox {for}\quad l_j\in \langle \lambda ,\Lambda \rangle \cap (\lambda _i-\delta _K,\lambda _i+\delta _K) \end{array}\right. \end{aligned}$$

      The variable \(Q_i^{(j)}\) has its mean value \(E(Q_i)\le K\) due to (8.9), and at the given value \(l_j\) of the random variable L always holds \(\widetilde{R}_i^{(j)}\le Q_i^{(j)}\). Therefore,

      $$\begin{aligned} \limsup _{n\rightarrow \infty }\frac{1}{n}\ln \frac{|\beta ^{(n)}_i|}{|\beta ^{(0)}_i|} =\limsup _{n\rightarrow \infty }\frac{1}{n}\sum _{j=1}^n\widetilde{R}_i^{(j)} \le \lim _{n\rightarrow \infty }\frac{1}{n}\sum _{j=1}^n\widetilde{Q}_i^{(j)} =E(Q_i)\le K<0 \end{aligned}$$
      (8.10)

      almost surely, and the relation (8.7) may be proved using the same method as for \(\lambda _i\in \mathcal {S}_1^{(\phi )}(\mathsf {A,b})\). We define the vectors

      $$\begin{aligned} \mathsf {r}_n^{(k)}=\sum _{i=1}^k\beta _i^{(n)}\mathsf {e}_i \end{aligned}$$

      for \(k=0,1,\dots ,N\); thus, \(\mathsf {r}_n^{(N)}=\mathsf {r}_n\), \(\mathsf {r}_n^{(0)}=\mathsf {0}\). We express the probabilities

      $$\begin{aligned} \Pr \left( \lim _{n\rightarrow \infty }\Vert \mathsf {r}_n^{(k)}-\mathsf {r}_n^{(k-1)}\Vert =0\right) =\Pr \left( \lim _{n\rightarrow \infty }|\beta _k^{(n)}|=0\right) =1 \end{aligned}$$

      as a consequence of (8.7) for any \(k=1,2,\dots ,N\). It implies

      $$\begin{aligned} \lim _{n\rightarrow \infty }\mathsf {r}_n =\lim _{n\rightarrow \infty }\mathsf {r}_n^{(N)} =\lim _{n\rightarrow \infty }\mathsf {r}_n^{(N-1)} =\dots =\lim _{n\rightarrow \infty }\mathsf {r}_n^{(0)}=0 \end{aligned}$$

      with probability one. As for the rate of convergence, we express the values of \(\displaystyle \lim _{n\rightarrow \infty }\root n \of {|\beta _i^{(n)}|}\). If \(\beta _i^{(0)}=0\), all the members of the considered sequence are zero, otherwise

      $$\begin{aligned} \lim _{n\rightarrow \infty }\root n \of {|\beta _i^{(n)}|} =\lim _{n\rightarrow \infty }\root n \of {\frac{|\beta _i^{(n)}|}{|\beta _i^{(0)}|}}\ . \end{aligned}$$
      (8.11)

      If \(\lambda _i\in \mathcal {S}_3^{(\phi )}(\mathsf {A,b})\), then the relation (8.3) implies the limit (8.11) is almost surely equal to zero. For the values \(\lambda _i\in \mathcal {S}_2^{(\phi )}(\mathsf {A,b})\), we get, based on (8.10),

      $$\begin{aligned} \limsup _{n\rightarrow \infty }\root n \of {|\beta _i^{(n)}|} =e^{~\limsup \limits _{n\rightarrow \infty }\frac{1}{n}\ln \frac{|\beta _i^{(n)}|}{|\beta _i^{(0)}|}} \le e^K \end{aligned}$$
      (8.12)

      with probability one for arbitrary \(K<0\); therefore, the limit (8.11) is almost surely zero in this case. Analogously, the relation (8.8) implies for \(\lambda _i\in \mathcal {S}_1^{(\phi )}(\mathsf {A,b})\)

      $$\begin{aligned} \lim _{n\rightarrow \infty }\root n \of {|\beta _i^{(n)}|} =e^{\widehat{E}_R(\lambda _i)}\ . \end{aligned}$$

      with probability one. Now, we can evaluate

      $$\begin{aligned} \lim _{n\rightarrow \infty }\root n \of {\frac{\Vert \mathsf {r}_n\Vert }{\Vert \mathsf {r}_0\Vert }}= & {} \lim _{n\rightarrow \infty }\root 2n \of {\sum _{i=1}^N\left( \beta _i^{(n)}\right) ^2}\nonumber \\\ge & {} \lim _{n\rightarrow \infty }\root 2n \of {\max _{i\le N}\left( \beta _i^{(n)}\right) ^2} =\max _{i\le N}\lim _{n\rightarrow \infty }\root n \of {|\beta _i^{(n)}|}\ ; \end{aligned}$$
      (8.13)

      the last equality is valid, if all the limits on the right-hand side of (8.13) exist. However, their almost sure existence is confirmed by the preceding results. Analogously

      $$\begin{aligned} \lim _{n\rightarrow \infty }\root n \of {\frac{\Vert \mathsf {r}_n\Vert }{\Vert \mathsf {r}_0\Vert }} \le \lim _{n\rightarrow \infty }\root 2n \of { N\cdot \max _{i\le N}\left( \beta _i^{(n)}\right) ^2} =\max _{i\le N}\lim _{n\rightarrow \infty }\root n \of {|\beta _i^{(n)}|} \end{aligned}$$

      with probability one, thus

      $$\begin{aligned} \lim _{n\rightarrow \infty }\root n \of {\frac{\Vert \mathsf {r}_n\Vert }{\Vert \mathsf {r}_0\Vert }} =\max _{i\le N}\lim _{n\rightarrow \infty }\root n \of {|\beta _i^{(n)}|} \end{aligned}$$
      (8.14)

      almost surely. In the case \(\mathcal {S}_1^{(\phi )}(\mathsf {A,b})\ne \emptyset \) is by definition (2.5)

      $$\begin{aligned} \lim _{n\rightarrow \infty }\root n \of {\frac{\Vert \mathsf {r}_n\Vert }{\Vert \mathsf {r}_0\Vert }} =\max _{\lambda _i\in \mathcal {S}_1^{(\phi )}(\mathsf {A,b})}e^{\widehat{E}_R(\lambda _i)} =e^{E_{\phi ,\mathsf {A,b}}} \end{aligned}$$
      (8.15)

      with unit probability. The logarithm of the relation (8.15) results in (2.6).

    3. (c)

      In the case \(E_{\phi ,\mathsf {A,b}}=0\), the technique used in the proofs of the statements 2a) and 2b) cannot be applied; nevertheless, if we admit the possibility of the convergence of the process to the solution, then, according to (8.14),

      $$\begin{aligned} \lim _{n\rightarrow \infty }\root n \of {\frac{\Vert \mathsf {r}_n\Vert }{\Vert \mathsf {r}_0\Vert }} =\max _{\lambda _i\in \mathcal {S}_1^{(\phi )}(\mathsf {A,b})}e^{\widehat{E}_R(\lambda _i)}=1, \end{aligned}$$

      and the convergence is almost surely R-sublinear.

  3. 3.

    When \(\mathcal {S}_1^{(\phi )}(\mathsf {A,b})=\emptyset \), the relations (8.3), (8.12) imply

    $$\begin{aligned} \lim _{n\rightarrow \infty }\root n \of {|\beta _i^{(n)}|}=0 \end{aligned}$$

    for all the eigenvalues \(\lambda _i\in \mathcal {S}(\mathsf {A,b})\). Therefore, according to (8.14),

    $$\begin{aligned} \lim _{n\rightarrow \infty }\root n \of {\frac{\Vert \mathsf {r}_n\Vert }{\Vert \mathsf {r}_0\Vert }} =\max _{i\le N}(0)=0 \end{aligned}$$

    almost surely, which proves statement 3.

Appendix 2: Proofs of Lemma 1 and Lemma 2

The integrals \(I_c\) and \(I_d\) exist for any \(q\in \langle \lambda ,\Lambda \rangle \), as both the singularities of the integrands are integrable. After the standard substitution

$$\begin{aligned} x=\frac{\Lambda +\lambda }{2}+\frac{\Lambda -\lambda }{2}\sin \alpha \ , \ \ \alpha \in \langle -\frac{\pi }{2},\frac{\pi }{2}\rangle , \end{aligned}$$
(8.16)

we get

$$\begin{aligned} \begin{aligned} I_c&= {} \int \limits _{-\frac{\pi }{2}}^{\frac{\pi }{2}} \ln \left| \frac{\frac{\Lambda +\lambda -2q}{\Lambda -\lambda }+\sin \alpha }{\frac{\Lambda +\lambda }{\Lambda -\lambda }+\sin \alpha }\right| \mathrm {d}\alpha \\&=\int \limits _{-\frac{\pi }{2}}^{\frac{\pi }{2}} \ln \left| \frac{\Lambda +\lambda -2q}{\Lambda -\lambda }+\sin \alpha \right| \mathrm {d}\alpha -\int \limits _{- \frac{\pi }{2}}^{\frac{\pi }{2}} \ln \left| \frac{\Lambda +\lambda }{\Lambda -\lambda }+\sin \alpha \right| \mathrm {d}\alpha \ . \end{aligned} \end{aligned}$$
(8.17)

The second integral on the right-hand side of (8.17) is finite and does not depend on the parameter q, so we can neglect it. The first integral (we will denote it as \(I_{c1}\)) can be expressed using the variable \(\varphi \) (see Fig. 12) defined by:

$$\begin{aligned} v(\varphi )\cos \varphi= & {} \frac{\Lambda -\lambda }{2}\cos \alpha \ , \end{aligned}$$
(8.18)
$$\begin{aligned} v(\varphi )\sin \varphi= & {} \frac{\Lambda +\lambda }{2}-q+\frac{\Lambda -\lambda }{2}\sin \alpha \ . \end{aligned}$$
(8.19)

As follows from the Fig. 12, the mapping \(\varphi \longleftrightarrow \alpha \) is unequivocal for \(\alpha \in \langle -\frac{\pi }{2},\frac{\pi }{2}\rangle \), \(\varphi \in \langle -\frac{\pi }{2},\frac{\pi }{2}\rangle \) at \(q\in (\lambda ,\Lambda )\); for \(q=\lambda \), the unequivocalness is ensured for the mapping \(\alpha \in (-\frac{\pi }{2},\frac{\pi }{2}\rangle \longleftrightarrow \varphi \in (0,\frac{\pi }{2}\rangle \) and \(\alpha =-\frac{\pi }{2}\longleftrightarrow \varphi =0\); the analogous unequivocal mapping exists for \(q=\Lambda \). Therefore, we can define the functions \(\varphi (\alpha )\) and \(\alpha (\varphi )\) and use them as necessary.

Fig. 12
figure 12

Geometric interpretation of the substitution (8.18), (8.19)

The integral \(I_{c1}\) may be expressed as

$$\begin{aligned} I_{c1}= & {} \int \limits _{-\frac{\pi }{2}}^{\frac{\pi }{2}} \ln \left| \frac{2v(\varphi (\alpha ))\sin \varphi (\alpha )}{\Lambda -\lambda }\right| \mathrm {d}\alpha =\int \limits _{-\frac{\pi }{2}}^{\frac{\pi }{2}} \ln \left| \cos \alpha \tan \varphi (\alpha )\right| \mathrm {d}\alpha \nonumber \\= & {} \int \limits _{-\frac{\pi }{2}}^{\frac{\pi }{2}} \ln \left| \cos \alpha \right| \mathrm {d}\alpha +\int \limits _{-\frac{\pi }{2}}^{\frac{\pi }{2}} \ln \left| \tan \varphi (\alpha )\right| \mathrm {d}\alpha \ . \end{aligned}$$
(8.20)

The first integral on the right-hand side of (8.20) is finite and does not depend on the parameter q again, so we neglect it. The remaining integral (we will further denote it as \(I_{c2}\)) can depend on the parameter q through the function \(\varphi (\alpha )\). We can express it by use of the variable \(\varphi \) as the integration variable. The geometric interpretation of the substitution \(\alpha \rightarrow \varphi \) gives simply (see Fig. 12—the angle \(\angle QAC=|\varphi -\alpha |\))

$$\begin{aligned} v(\varphi )\mathrm {d}\varphi =\frac{\Lambda -\lambda }{2}\cos (\varphi -\alpha )\mathrm {d}\alpha \ . \end{aligned}$$
(8.21)

We must distinguish three situations now. For \(q\in (\lambda ,\Lambda )\),

$$\begin{aligned} I_{c2}= & {} \int \limits _{-\frac{\pi }{2}}^{\frac{\pi }{2}} \frac{2v(\varphi )\ln \left| \tan \varphi \right| }{(\Lambda -\lambda )\cos [\varphi -\alpha (\varphi )]}\mathrm {d}\varphi \\= & {} \int \limits _0^{\frac{\pi }{2}} \left[ \frac{2v(\varphi )\ln \tan \varphi }{(\Lambda -\lambda )\cos [\varphi -\alpha (\varphi )]} +\frac{2v(-\varphi )\ln \tan \varphi }{(\Lambda -\lambda )\cos [-\varphi -\alpha (-\varphi )]} \right] \mathrm {d}\varphi . \end{aligned}$$

From Fig. 12, it follows that the triangles QCB and \(QCB_1\) are identical. The points \(B_1\), Q, and A lie on the same straight line, and the triangle \(CAB_1\) is isosceles; therefore, the angles \(\angle QAC=\angle QBC\) and

$$\begin{aligned} \cos [-\varphi -\alpha (-\varphi )]=\cos [\varphi -\alpha (\varphi )] \end{aligned}$$
(8.22)

Moreover, the base of the triangle \(B_1CA\) has the length

$$\begin{aligned} AB_1=v(\varphi )+v(-\varphi )=(\Lambda -\lambda )\cos (\varphi -\alpha )\ , \end{aligned}$$
(8.23)

thus

$$\begin{aligned} I_{c2}=\int \limits _0^{\frac{\pi }{2}} \frac{[2v(\varphi )+2v(-\varphi )]\ln \tan \varphi }{(\Lambda -\lambda )\cos [\varphi -\alpha (\varphi )]}\mathrm {d}\varphi =\int \limits _0^{\frac{\pi }{2}}2\ln \tan \varphi \,\mathrm {d}\varphi =0\ , \end{aligned}$$

and the integral \(I_c\) does not depend on the parameter q.

If \(q=\lambda \), then

$$\begin{aligned} I_{c2}=\int \limits _0^{\frac{\pi }{2}} \frac{2v(\varphi )\ln \tan \varphi }{(\Lambda -\lambda )\cos [\varphi -\alpha (\varphi )]}\mathrm {d}\varphi \ . \end{aligned}$$

In this case, the point Q coincides with the point \(B_1\), and we can express directly the value

$$\begin{aligned} v(\varphi )=AB_1=(\Lambda -\lambda )\cos (\varphi -\alpha )\ , \end{aligned}$$

and we get

$$\begin{aligned} I_{c2}=\int \limits _0^{\frac{\pi }{2}}2\ln \tan \varphi \mathrm {d}\varphi =0 \end{aligned}$$

again. We can finish the proof of Lemma 1 for \(q=\Lambda \) analogously.

As for the integral (3.2), we get after the implementation of the variable \(\alpha \) according to (8.16)

$$\begin{aligned} I_d=\int \limits _{-\frac{\pi }{2}}^{\frac{\pi }{2}} \frac{\ln \frac{\left| q-\frac{\Lambda +\lambda }{2}-\frac{\Lambda -\lambda }{2}\sin \alpha \right| }{q}}{\frac{\Lambda +\lambda }{2}+\frac{\Lambda -\lambda }{2}\sin \alpha }\, \mathrm {d}\alpha \ , \end{aligned}$$

and the expression inside the absolute value in the numerator may be substituted using the variable \(\varphi \) defined by (8.18), (8.19)

$$\begin{aligned} I_d=\int \limits _{-\frac{\pi }{2}}^{\frac{\pi }{2}} \frac{\ln \frac{\left| (\Lambda -\lambda )\cos \alpha \tan \varphi (\alpha )\right| }{2q}}{\frac{\Lambda +\lambda }{2}+\frac{\Lambda -\lambda }{2}\sin \alpha }\, \mathrm {d}\alpha =\int \limits _{-\frac{\pi }{2}}^{\frac{\pi }{2}} \frac{\ln \frac{(\Lambda -\lambda )\cos \alpha }{2}}{\frac{\Lambda +\lambda }{2}+\frac{\Lambda -\lambda }{2}\sin \alpha }\, \mathrm {d}\alpha +\int \limits _{-\frac{\pi }{2}}^{\frac{\pi }{2}} \frac{\ln \frac{\left| \tan \varphi (\alpha )\right| }{q}}{\frac{\Lambda +\lambda }{2}+\frac{\Lambda -\lambda }{2}\sin \alpha }\, \mathrm {d}\alpha \ . \end{aligned}$$

The first integral on the right-hand side is finite and evidently does not depend on the parameter q. Therefore, we will deal with the second integral (denoted as \(I_{d1}\)) only. We change over the integration variable to \(\varphi \). For \(q\in (\lambda ,\Lambda )\), we get, according to (8.21),

$$\begin{aligned} I_{d1}= & {} \int \limits _{-\frac{\pi }{2}}^{\frac{\pi }{2}} \frac{2v(\varphi )\ln \frac{\left| \tan \varphi \right| }{q}}{(\Lambda -\lambda )\left[ q+v(\varphi )\sin \varphi \right] \cos [\varphi -\alpha (\varphi )]}\, \mathrm {d}\varphi \ \nonumber \\= & {} \int \limits _0^{\frac{\pi }{2}}\left[ \frac{2v(\varphi )\ln \frac{\tan \varphi }{q}}{(\Lambda -\lambda )\left[ q+v(\varphi )\sin \varphi \right] \cos [\varphi -\alpha (\varphi )]}\right. \nonumber \\&\left. +\,\frac{2v(-\varphi )\ln \frac{\tan \varphi }{q}}{(\Lambda -\lambda )\left[ q-v(-\varphi )\sin \varphi \right] \cos [\varphi -\alpha (\varphi )]} \right] \mathrm {d}\varphi \ \nonumber \\= & {} \int \limits _0^{\frac{\pi }{2}} \frac{2q\ln \frac{\tan \varphi }{q}}{q^2+q[v(\varphi )-v(-\varphi )]\sin \varphi -v(\varphi )v(-\varphi )\sin ^2\varphi } \,\mathrm {d}\varphi \ . \end{aligned}$$
(8.24)

(We have used the results (8.22), (8.23) during the calculations.)

If we use the law of cosines for the triangle QCA in Fig. 12, we get the relation

$$\begin{aligned} v(\varphi )^2-(\Lambda +\lambda -2q)v(\varphi )\sin \varphi +\Bigl (\frac{\Lambda +\lambda }{2}-q\Bigr )^2 =\Bigl (\frac{\Lambda -\lambda }{2}\Bigr )^2\ ; \end{aligned}$$
(8.25)

this relation also holds for \(q>\frac{\Lambda +\lambda }{2}\) and may be interpreted as a quadratic equation for the unknown variable \(v(\varphi )\). This quadratic equation always has two different solutions with opposite signs, the positive solution being the value \(v(\varphi )\) itself.

The law of cosines applied to the triangle QCB gives the equation

$$\begin{aligned} v(-\varphi )^2+(\Lambda +\lambda -2q)v(-\varphi )\sin \varphi +\Bigl (\frac{\Lambda +\lambda }{2}-q\Bigr )^2 =\Bigl (\frac{\Lambda -\lambda }{2}\Bigr )^2\ , \end{aligned}$$
(8.26)

and so v solves the Eq. (8.25) if and only if \((-v)\) satisfies the Eq. (8.26). This implies that the negative solution to the Eq. (8.25) is the opposite value of \(v(-\varphi )\). According to Vieta’s theorem, we can write therefore

$$\begin{aligned} v(\varphi )-v(-\varphi )= & {} (\Lambda +\lambda -2q)\sin \varphi \ ,\\ -v(\varphi )v(-\varphi )= & {} \Bigl (\frac{\Lambda +\lambda }{2}-q\Bigr )^2 -\Bigl (\frac{\Lambda -\lambda }{2}\Bigr )^2\ =\ (\Lambda -q)(\lambda -q)\ . \end{aligned}$$

After the substitution of these results into (8.24), we get

$$\begin{aligned} I_{d1}= & {} \int \limits _0^{\frac{\pi }{2}} \frac{2q\ln \frac{\tan \varphi }{q}}{q^2+q(\Lambda +\lambda -2q)\sin ^2\varphi +(\Lambda -q)(\lambda -q)\sin ^2\varphi } \,\mathrm {d}\varphi \nonumber \\= & {} \int \limits _0^{\frac{\pi }{2}} \frac{2q\ln \frac{\tan \varphi }{q}}{q^2\cos ^2\varphi +\Lambda \lambda \sin ^2\varphi } \,\mathrm {d}\varphi \ . \end{aligned}$$
(8.27)

Let us define the variable \(\psi \in \langle 0,\frac{\pi }{2})\) by the relation

$$\begin{aligned} \frac{\tan \psi }{\sqrt{\Lambda \lambda }}=\frac{\tan \varphi }{q}. \end{aligned}$$

The integral (8.27) expressed using this substitution is

$$\begin{aligned} I_{d1}=\int \limits _0^{\frac{\pi }{2}} \frac{2\ln \frac{\tan \psi }{\sqrt{\Lambda \lambda }}}{\sqrt{\Lambda \lambda }} \,\mathrm {d}\psi \ , \end{aligned}$$
(8.28)

i.e., its value does not depend on the parameter q.

If \(q=\lambda \), the relation (8.24) reduces to

$$\begin{aligned} I_{d1}=\int \limits _0^{\frac{\pi }{2}} \frac{2v(\varphi )\ln \frac{\tan \varphi }{q}}{(\Lambda -\lambda )\left[ p+v(\varphi )\sin \varphi \right] \cos [\varphi -\alpha (\varphi )]} \,\mathrm {d}\varphi =\int \limits _0^{\frac{\pi }{2}} \frac{2\ln \frac{\tan \varphi }{\lambda }}{\lambda +v(\varphi )\sin \varphi } \,\mathrm {d}\varphi . \end{aligned}$$

Since the angle \(\angle PCA=2\varphi \) in this case, the value \(v(\varphi )\) can be expressed as

$$\begin{aligned} v(\varphi )=(\Lambda -\lambda )\sin \varphi \end{aligned}$$

and

$$\begin{aligned} I_{d1}=\int \limits _0^{\frac{\pi }{2}} \frac{2\ln \frac{\tan \varphi }{\lambda }}{\lambda +(\Lambda -\lambda )\sin ^2\varphi } \,\mathrm {d}\varphi =\int \limits _0^{\frac{\pi }{2}} \frac{2\ln \frac{\tan \varphi }{\lambda }}{\lambda \cos ^2\varphi +\Lambda \sin ^2\varphi } \,\mathrm {d}\varphi \ . \end{aligned}$$

The substitution \(\frac{\tan \varphi }{\sqrt{\lambda }}=\frac{\tan \psi }{\sqrt{\Lambda }}\) gives the result (8.28) again; similarly, we get the same result for \(q=\Lambda \).

Appendix 3: Existence of the Integral J

Let us define

$$\begin{aligned}&\Omega _0=\textstyle \Omega \setminus \left( (\lambda ,\lambda +\frac{\Lambda }{\kappa (\mathsf {A})-1}\rangle \times \langle \lambda ,\Lambda \rangle \cup \langle \Lambda -\frac{\Lambda }{\kappa (\mathsf {A})-1},\Lambda )\times \langle \lambda ,\Lambda \rangle \cup \{|x-l|<\frac{\Lambda }{\kappa (\mathsf {A})-1}\}\right) ,\\&\Omega _k=\textstyle \left[ \Omega \setminus \left( (\lambda ,\lambda +\frac{\Lambda }{2^k(\kappa (\mathsf {A})-1)}\rangle \times \langle \lambda ,\Lambda \rangle \cup \langle \Lambda -\frac{\Lambda }{2^k(\kappa (\mathsf {A})-1)},\Lambda ) \times \langle \lambda ,\Lambda \rangle \right. \right. \nonumber \\&\quad \left. \left. \cup \{|x-l|<\frac{\Lambda }{2^k(\kappa (\mathsf {A})-1)}\}\right) \right] \setminus \bigcup _{j=0}^{k-1}\Omega _{j}\ \ \ \hbox {for}\ k\in \mathcal {N}\ . \end{aligned}$$
Fig. 13
figure 13

Splitting of the set \(\Omega \) into subsets \(\Omega _k\)

The sets \(\Omega _k\) for \(k\in \mathcal {N}\) consist of two V-shaped strips with the width \(\frac{\Lambda }{2^k(\kappa (\mathsf {A})-1)}\); these are shown in Fig. 13. Note, if the inequality

$$\begin{aligned} 2\cdot \frac{\Lambda }{2^k(\kappa (\mathsf {A})-1)}>\Lambda -\lambda \end{aligned}$$
(8.29)

is satisfied for a number k, then the set \(\Omega _k\) is empty; geometrically it follows from this fact than the point \(L_k\) in Fig. 13 lies to the right of the point \(R_k\). Contrarily, when \(\Omega _k\ne \emptyset \), then

$$\begin{aligned} 2^k(\kappa (\mathsf {A})-1)\ge \frac{2\Lambda }{\Lambda -\lambda }>2\ ; \end{aligned}$$
(8.30)

this information will be used later.

The intersection \(\Omega _i\cap \Omega _j=\emptyset \) for \(i\ne j\) and

$$\begin{aligned} \Omega =\bigcup _{k\ge 0}\Omega _k \subset \langle \lambda ,\Lambda \rangle \times \langle \lambda ,\Lambda \rangle \ . \end{aligned}$$
(8.31)

Now, we find the bounds for the values f(xl), when \([x,l]\in \Omega _0\). The argument of the logarithm in the definition of the function f(xl) is

$$\begin{aligned} \left| 1-\frac{x}{l}\right| =\frac{|x-l|}{l} \ge \frac{\Lambda }{l(\kappa (\mathsf {A})-1)} \ge \frac{1}{\kappa (\mathsf {A})-1}\ , \end{aligned}$$
(8.32)

at the same time

$$\begin{aligned} \left| 1-\frac{x}{l}\right| \le \frac{\max \limits _{[x,l]\in \langle \lambda ,\Lambda \rangle \times \langle \lambda ,\Lambda \rangle } \left| _{}l-x^{}\right| }{\min \limits _{[x,l]\in \langle \lambda ,\Lambda \rangle \times \langle \lambda ,\Lambda \rangle }~l} \le \frac{\Lambda -\lambda }{\lambda } =\kappa (\mathsf {A})-1 \end{aligned}$$
(8.33)

on the whole square \(\langle \lambda ,\Lambda \rangle \times \langle \lambda ,\Lambda \rangle \), so

$$\begin{aligned} \left| \ln \left| 1-\frac{x}{l}\right| ^{}\right| \le |\ln (\kappa (\mathsf {A})-1)|\ . \end{aligned}$$

on \(\Omega _0\). When \(k>0\), we can get by the way leading to (8.32)

$$\begin{aligned} \left| 1-\frac{x}{l}\right| >\frac{1}{2^k(\kappa (\mathsf {A})-1)} \end{aligned}$$

on \(\Omega _k\). If \(\Omega _k\ne \emptyset \), then \(\ln \left[ 2^k(\kappa (\mathsf {A}-1)\right] >0\) due to (8.30), and we can estimate

$$\begin{aligned} \left| \ln \left| 1-\frac{x}{l}\right| ^{}\right| \le \left| \ln \left[ 2^k(\kappa (\mathsf {A})-1)\right] \right| =\ln \left[ 2^k(\kappa (\mathsf {A})-1)\right] =k\ln 2+\ln (\kappa (\mathsf {A})-1)\ . \end{aligned}$$
(8.34)

The denominator of the expression (3.3) is bounded from below on \(\Omega _k\ne \emptyset \) by the value

$$\begin{aligned}&x\sqrt{\left( \frac{\Lambda -\lambda }{2}\right) ^2 -\left( x-\frac{\Lambda +\lambda }{2}\right) ^2}\nonumber \\&\quad \ge \left( \lambda +\frac{\Lambda }{2^k(\kappa (\mathsf {A})-1)}\right) \sqrt{\left( \frac{\Lambda -\lambda }{2}\right) ^2 -\left( \lambda +\frac{\Lambda }{2^k(\kappa (\mathsf {A})-1)} -\frac{\Lambda +\lambda }{2}\right) ^2}\nonumber \\&\quad =\lambda ^2\frac{(2^k+1)\kappa (\mathsf {A})-2^k}{2^k(\kappa (\mathsf {A})-1)} \sqrt{\left[ 1-\frac{\kappa (\mathsf {A})}{2^k(\kappa (\mathsf {A})-1)^2}\right] \frac{\kappa (\mathsf {A})}{2^k}} \ge \lambda ^2\sqrt{\frac{\kappa (\mathsf {A})}{2^{k+1}}}\ ;\qquad \end{aligned}$$
(8.35)

the last inequality is valid as a consequence of (8.30).

We can further estimate the measures

$$\begin{aligned} \mu (\Omega _0)\le & {} \Lambda -\lambda \ ,\nonumber \\ \mu (\Omega _k)\le & {} \frac{4\Lambda }{2^k(\kappa (\mathsf {A})-1)} \ \ \hbox {for}\ k\in \mathcal {N}\ , \end{aligned}$$
(8.36)
$$\begin{aligned} \mu \left( \langle \lambda ,\Lambda \rangle \times \langle \lambda ,\Lambda \rangle \setminus \bigcup _{i=0}^k\Omega _k\right)\le & {} \frac{4\Lambda }{2^k(\kappa (\mathsf {A})-1)} \end{aligned}$$
(8.37)

—all these results follow easily from Fig. 13. The estimate (8.37) together with (8.31) implies

$$\begin{aligned} \mu \left( \langle \lambda ,\Lambda \rangle \times \langle \lambda ,\Lambda \rangle \right) =\mu (\Omega ) \end{aligned}$$

A sufficient condition for the existence of the integral J defined by (3.4) is the existence of the integral

$$\begin{aligned} \widetilde{J}= & {} \int \limits _{\Omega }|f(x,l)|\,\mathrm {d}\mu =\sum _{k\ge 0}\,\int \limits _{\Omega _k}|f(x,l)|\,\mathrm {d}\mu \nonumber \\= & {} \int \limits _{\Omega _0}|f(x,l)|\,\mathrm {d}\mu +\sum _{k\ge \min (\log _2\frac{2\kappa (\mathsf {A})}{(\kappa (\mathsf {A})-1)^2},1)} \ \int \limits _{\Omega _k}|f(x,l)|\,\mathrm {d}\mu \ . \end{aligned}$$
(8.38)

The condition \(k\ge \log _2\frac{2\kappa (\mathsf {A})}{(\kappa (\mathsf {A})-1)^2}\) follows from the relation (8.29). It holds

$$\begin{aligned} \,\int \limits _{\Omega _k}|f(x,l)|\,\mathrm {d}\mu \le \sup _{[x,l]\in \Omega _k}|f(x,l)|\mu (\Omega _k) \end{aligned}$$
(8.39)

for any \(k\ge 0\). The first integral on the right-hand side of (8.38) is zero at \(\Omega _0=\emptyset \), or—according to (8.39)

$$\begin{aligned} \,\int \limits _{\Omega _0}|f(x,l)|\,\mathrm {d}\mu \le \frac{\ln (\kappa (\mathsf {A})-1)}{\lambda ^2\sqrt{\frac{\kappa (\mathsf {A})}{2}}}(\Lambda -\lambda )\ ; \end{aligned}$$

in any case, its value is finite. For \(k\ge k_0=\min (\log _2\frac{2\kappa (\mathsf {A})}{(\kappa (\mathsf {A})-1)^2},1)\), the terms in the sum on the right-hand side of (8.38) may be estimated by (8.39) using (8.34), (8.35), and (8.36)

$$\begin{aligned} \,\int \limits _{\Omega _k}|f(x,l)|\,\mathrm {d}\mu\le & {} \frac{k\ln 2+\ln (\kappa (\mathsf {A})-1)}{\lambda ^2\sqrt{\frac{\kappa (\mathsf {A})}{2^{k+1}}}} \cdot \frac{4\Lambda }{2^k(\kappa (\mathsf {A})-1)}\nonumber \\= & {} \frac{4\sqrt{2\kappa (\mathsf {A})}}{\lambda (\kappa (\mathsf {A})-1)} \cdot \frac{k\ln 2+\ln (\kappa (\mathsf {A})-1)}{\left( \sqrt{2}\right) ^k}\ . \end{aligned}$$

The sum

$$\begin{aligned} \sum _{k\ge k_0} \,\int \limits _{\Omega _k}|f(x,l)|\,\mathrm {d}\mu\le & {} \frac{4\sqrt{2\kappa (\mathsf {A})}}{\lambda (\kappa (\mathsf {A})-1)} \sum _{k\ge k_0} \frac{k\ln 2+\ln (\kappa (\mathsf {A})-1)}{\left( \sqrt{2}\right) ^k}\ =\nonumber \\= & {} \frac{4\sqrt{2\kappa (\mathsf {A})}}{\lambda (\kappa (\mathsf {A})-1)} \cdot \left[ \frac{k_0\ln 2+\ln (\kappa (\mathsf {A})-1)}{\left( \sqrt{2}\right) ^{k_0-1}\left( \sqrt{2}-1\right) } +\frac{\ln 2}{\left( \sqrt{2}\right) ^{k_0-1}\left( \sqrt{2}-1\right) ^2}\right] \end{aligned}$$

So, the series on the right-hand side of (8.38) is absolutely convergent, and the integral J exists.

Appendix 4: Construction of the Optimal Step Length Distribution

First, we prove

Lemma 4

If a distribution function \(F_L(l)\) generates the function \(\widehat{E}_R(x)\) satisfying the conditions of Theorem 3, then the function \(\widehat{E}_R(x)\) is defined for all \(x\in \langle \lambda ,\Lambda \rangle \).

Proof

We show that for any point \(\hat{x}\) of the second or third type relative to the measure \(\phi \), there exists such a \(\delta >0\) that for any point \(x\in \mathcal {D}_{\delta }(\hat{x})=(\hat{x}-\delta ,\hat{x}+\delta )\cap \langle \lambda ,\Lambda \rangle \) of the first type relative to the measure \(\phi \) is \(\widehat{E}_R(x)<\overline{E}_{\phi }-1\). Since almost all the points \(x\in \mathcal {D}_{\delta }(\hat{x})\) are the points of the first type relative to the measure \(\phi \) (it is a consequence of Theorem 2) and \(\nu \left( x\in \mathcal {D}_{\delta }(\hat{x})\right) \ge \frac{2\delta }{\Lambda (\Lambda -\lambda )}>0\), the conditions of Theorem 3 are not satisfied in the considered case.

Suppose \(\hat{x}\in \langle \lambda ,\Lambda \rangle \) be a point of the third type relative to the measure \(\phi \). According to the definition, \(\phi (\hat{x})=m>0\). If \(x\in \langle \lambda ,\Lambda \rangle \) is a point of the first type relative to the measure \(\phi \), then the mean value \(\widehat{E}_R(x)\) exists, and

$$\begin{aligned} \widehat{E}_R(x)= & {} \int \limits _{\langle \lambda ,\hat{x})}\ln \left| 1-\frac{x}{l}\right| \,\mathrm {d}\phi (l) +m\cdot \ln \left| 1-\frac{x}{\hat{x}}\right| +\int \limits _{(\hat{x},\Lambda \rangle }\ln \left| 1-\frac{x}{l}\right| \,\mathrm {d}\phi (l)\nonumber \\\le & {} (1-m)\ln (\kappa (\mathsf {A})-1) +m\cdot \ln \left| 1-\frac{x}{\hat{x}}\right| \end{aligned}$$
(8.40)

due to (8.33). Let us denote, using the symbols applied in the proof of Theorem 3

$$\begin{aligned}, \delta =\hat{x}\cdot \frac{e^{\frac{I_d}{m\nu (\langle \lambda ,\Lambda \rangle )}}}{e^{\frac{1}{m}}\left( \kappa (\mathsf {A})-1\right) ^{\frac{1}{m}-1}}>0\ . \end{aligned}$$

For any \(x\in \mathcal {D}_{\delta }(\hat{x})\) being the point of the first type relative to the measure \(\phi \), the right-hand side of (8.40) may be estimated

$$\begin{aligned} \widehat{E}_R(x)\le (1-m)\ln (\kappa (\mathsf {A})-1) +m\cdot \ln \frac{\delta }{\hat{x}} =\frac{I_d}{\nu (\langle \lambda ,\Lambda \rangle )}-1 \le \overline{E}_{\phi }-1 \end{aligned}$$

according to (3.10).

Assuming \(\hat{x}\) be a point of the second type relative to the measure \(\phi \), there exists, consequently of Lemma 3, such a \(\delta \in (0,\frac{\hat{x}}{2})\) that

$$\begin{aligned} \int \limits _{ \langle \lambda ,\Lambda \rangle \setminus (\hat{x}-\delta ,\hat{x}+\delta )} \ln \left| 1-\frac{\hat{x}}{l}\right| \mathrm {d}\phi (l) <\overline{E}_{\phi }-1-2\ln 2\ . \end{aligned}$$

For any \(x\in \mathcal {D}_{\delta }(\hat{x})\) being the point of first type relative to the measure \(\phi \), the mean value

$$\begin{aligned} \widehat{E}_R(x)= & {} \int \limits _{\langle \Lambda ,\lambda \rangle \setminus (\hat{x}-\delta ,\hat{x}+\delta )}\ln \left| 1-\frac{x}{l}\right| \,\mathrm {d}\phi (l) +\int \limits _{\mathcal {D}_{\delta }(\hat{x})}\ln \left| 1-\frac{x}{l}\right| \,\mathrm {d}\phi (l)\nonumber \\\le & {} \int \limits _{\langle \Lambda ,\lambda \rangle \setminus (\hat{x}-\delta ,\hat{x}+\delta )} \left( \ln \left| 1-\frac{\hat{x}}{l}\right| +\ln \left| \frac{l-x}{l-\hat{x}}\right| \right) \,\mathrm {d}\phi (l) +\int \limits _{\mathcal {D}_{\delta }(\hat{x})}\ln 2\,\mathrm {d}\phi (l)\nonumber \\< & {} \overline{E}_{\phi }-1-2\ln 2 +\int \limits _{\langle \Lambda ,\lambda \rangle \setminus (\hat{x}-\delta ,\hat{x}+\delta )} \ln \left| 1+\frac{\hat{x}-x}{l-\hat{x}}\right| \,\mathrm {d}\phi (l) +\ln 2\ . \end{aligned}$$
(8.41)

We can express the value x in the form \(x=\hat{x}+\alpha \delta \) for some \(\alpha \in (-1,1)\). Suppose first \(\alpha >0\). Then,

$$\begin{aligned} \int \limits _{\langle \Lambda ,\lambda \rangle \setminus (\hat{x}-\delta ,\hat{x}+\delta )} \ln \left| 1+\frac{\hat{x}-x}{l-\hat{x}}\right| \,\mathrm {d}\phi (l)= & {} \int \limits _{(0,\hat{x}-\delta )\cap \langle \lambda ,\hat{x}\rangle } \ln \left| 1+\frac{\alpha \delta }{\hat{x}-l}\right| \,\mathrm {d}\phi (l)\\&+\int \limits _{\langle \hat{x}+\delta ,\infty )\cap (\hat{x},\Lambda \rangle } \ln \left| 1-\frac{\alpha \delta }{l-\hat{x}}\right| \,\mathrm {d}\phi (l)\ . \end{aligned}$$

The integrand in the second integral on the right-hand side is always non-positive, and the denominator of the fraction in the first integral is always greater or equal to \(\delta \); therefore, \(1+\frac{\alpha \delta }{\hat{x}-l}\le 1+\alpha <2\). It implies

$$\begin{aligned} \int \limits _{\langle \Lambda ,\lambda \rangle \setminus (\hat{x}-\delta ,\hat{x}+\delta )} \ln \left| 1+\frac{\hat{x}-x}{l-\hat{x}}\right| \,\mathrm {d}\phi (l) <\int \limits _{(0,\hat{x}-\delta )\cap \langle \lambda ,\hat{x}\rangle } \ln 2\,\mathrm {d}\phi (l)\le \ln 2 \end{aligned}$$

and in (8.41)

$$\begin{aligned} \widehat{E}_R(x)<\overline{E}_{\phi }-1\ . \end{aligned}$$

Analogously we get the same result at \(\alpha <0\) \(\square \)

The wanted distribution function \(F_L(l)\) is, as a consequence of Lemma 4, continuous, especially \(F_L(\lambda )=0\). This fact will be used later.

The function \(\ln \left| 1-\frac{x}{l}\right| \) may be expressed in the form of Fourier series of Chebyshev polynomials of the first kind of the variable l; the Fourier coefficients \(c_n(x)\) are calculated in “Appendix 5.” We show that the partial sums of the created series are bounded by an integrable function for all \(x\in \langle \lambda ,\Lambda \rangle \), if the measure \(\phi \) generates the function \(\widehat{E}_R(x)\) satisfying the condition (3.6). Indeed, using the notation

$$\begin{aligned} \alpha =\arccos \frac{\Lambda +\lambda -2l}{\Lambda -\lambda }\ , \end{aligned}$$
(8.42)

and applying the results (8.61), (8.62), we get (the parameters \(\xi \), \(\zeta \) defined by (8.63) and (8.64), respectively, are used)

$$\begin{aligned} Z_k(l,x)= & {} c_0+\sum _{n=1}^k \sqrt{2}c_n(x)T_n(\cos \alpha )\ =\ -\zeta +\sum _{n=1}^k\frac{2}{n}(e^{-n\zeta }-\cos \,n\xi )\cos \,n\alpha \nonumber \\= & {} -\zeta +\sum _{n=1}^k\frac{2}{n}\left( e^{-n\zeta }-1\right) \cos \,n\alpha -\sum _{n=1}^k\frac{2}{n}\left( \cos \,n\xi -1\right) \cos \,n\alpha \nonumber \\= & {} -\zeta -\sum _{n=1}^k\int \limits _0^{\zeta }2e^{-nt}\cos \,n\alpha \,\mathrm {d}t +\sum _{n=1}^k\int \limits _0^{\xi }2\sin \,nt\cdot \cos \,n\alpha \,\mathrm {d}t\nonumber \\= & {} -\zeta +\int \limits _0^{\zeta }\left( 1-e^{-kt}\cos \,k\alpha -\frac{\sinh \,t\left( 1-e^{-kt}\cos \,k\alpha \right) }{\cosh \,t-\cos \alpha }\nonumber \right. \\&\left. -\frac{e^{-kt}\sin \,k\alpha \cos \frac{\alpha }{2}\sin \frac{\alpha }{2}}{\sinh ^2\frac{t}{2}+\sin ^2\frac{\alpha }{2}}\right) \,\mathrm {d}t\nonumber \\&+\int \limits _{\alpha }^{\xi +\alpha } \frac{\cos \frac{u}{2}-\cos (k+\frac{1}{2})u}{2\sin \frac{u}{2}}\,\mathrm {d}u +\int \limits _{-\alpha }^{\xi -\alpha } \frac{\cos \frac{u}{2}-\cos (k+\frac{1}{2})u}{2\sin \frac{u}{2}}\,\mathrm {d}u\ .\quad \end{aligned}$$
(8.43)

The integrated fraction in the last integral on the right-hand side of (8.43) is an odd and bounded function (it follows from the fact that it is a sum of k sines); therefore, it is integrable on any bounded interval. Due to the oddness of the integrand, the last integral is equal to the integral of the same function over the interval \((\alpha ,|\alpha -\xi |)\). Now, we can estimate the sum \(Z_k(l,x)\)

$$\begin{aligned} |Z_k(l,x)|\le & {} 3\zeta +\int \limits _0^{\zeta }\frac{2\sinh \,t}{\cosh \,t-\cos \alpha }\,\mathrm {d}t +\left| \sin \,k\alpha \cos \frac{\alpha }{2}\right| \int \limits _0^{\zeta } \frac{\cosh \frac{t}{2}\sin \frac{\alpha }{2}}{\sinh ^2\frac{t}{2}+\sin ^2\frac{\alpha }{2}}\,\mathrm {d}t\nonumber \\&+\int \limits _{\alpha }^{\xi +\alpha } \frac{1}{\sin \frac{u}{2}}\,\mathrm {d}u +\left| \int \limits _{\alpha }^{|\alpha -\xi |} \frac{1}{\sin \frac{u}{2}}\,\mathrm {d}u\right| \ . \end{aligned}$$
(8.44)

The absolute value of the last integral in (8.44) is necessary because the ordering of its limits depends on the relation between the values \(\alpha \), \(\xi \). Nevertheless, the integrand is always a positive function, and since in the case \(\alpha <|\alpha -\xi |\) it is \((\alpha ,|\alpha -\xi |)\subset (\alpha ,\alpha +\xi )\) and for \(\alpha \ge |\alpha -\xi |\) it holds \((|\alpha -\xi |,\alpha )\subset (|\alpha -\xi |,\alpha +\xi )\), we can estimate

$$\begin{aligned} \left| \int \limits _{\alpha }^{|\alpha -\xi |} \frac{1}{\sin \frac{u}{2}}\,\mathrm {d}u\right| \le \int \limits _{\alpha }^{\alpha +\xi } \frac{1}{\sin \frac{u}{2}}\,\mathrm {d}u +\int \limits _{|\alpha -\xi |}^{\alpha +\xi } \frac{1}{\sin \frac{u}{2}}\,\mathrm {d}u\ . \end{aligned}$$
(8.45)

The integrals on the right-hand side of (8.45) have their limits in ascending order, because \(\alpha \in \langle 0,\pi \rangle \), \(\xi \in \langle 0,\pi \rangle \). Furthermore, the last mentioned fact implies \((\alpha ,\alpha +\xi )\subset (\alpha ,\alpha +\pi )\), and we can continue estimating as follows:

$$\begin{aligned} |Z_k(l,x)|\le & {} 3\zeta +2\ln \frac{\cosh \zeta -\cos \alpha }{1-\cos \alpha } +2\left| \sin \,k\alpha \cos \frac{\alpha }{2}\right| \arctan \frac{\sinh \frac{\zeta }{2}}{\sin \frac{\alpha }{2}}\nonumber \\&+2\int \limits _{\alpha }^{\alpha +\pi } \frac{1}{\sin \frac{u}{2}}\,\mathrm {d}u +\int \limits _{|\alpha -\xi |}^{\alpha +\xi } \frac{1}{\sin \frac{u}{2}}\,\mathrm {d}u\ \nonumber \\\le & {} 3\zeta +2\ln \frac{l}{l-\lambda }+\pi +4\ln \frac{\tan \frac{\alpha +\pi }{4}}{\tan \frac{\alpha }{4}} +2\ln \frac{\tan \frac{\alpha +\xi }{4}}{\tan \frac{|\alpha -\xi |}{4}}\nonumber \\= & {} 3\zeta +\pi -2\ln \left( 1-\frac{\lambda }{l}\right) \nonumber \\&+\,2\ln \frac{4(1+\sin \frac{\alpha }{2})^2(1+\cos \frac{\alpha }{2})^2}{(1+\cos \alpha )(1-\cos \alpha )} +\,2\ln \frac{2(\sin \frac{\alpha }{2}+\sin \frac{\xi }{2})^2}{|\cos \xi -\cos \alpha |} \nonumber \\\le & {} 3\zeta +\pi -2\ln \left( 1-\frac{\lambda }{l}\right) +2\ln \frac{16(\Lambda -\lambda )^2}{(\Lambda -l)(l-\lambda )} +2\ln \frac{4(\Lambda -\lambda )}{|l-x|} \nonumber \\= & {} 3\zeta +\pi -2\ln \left( 1-\frac{\lambda }{l}\right) +12\ln 2+6\ln \frac{\Lambda -\lambda }{l}\nonumber \\&-\,2\ln \frac{\Lambda -l}{l}-2\ln \frac{l-\lambda }{l} -2\ln \frac{|l-x|}{l}\ \nonumber \\\le & {} 3\zeta +\pi +12\ln 2+6\ln (\kappa (\mathsf {A})-1) -4\ln \left( 1-\frac{\lambda }{l}\right) \nonumber \\&-\,2\ln \left| 1-\frac{\Lambda }{l}\right| -2\ln \left| 1-\frac{x}{l}\right| \ =\ z_x(l). \end{aligned}$$
(8.46)

The procedure resulting in (8.46) is fully correct only when \(\alpha >0\); in the case \(\alpha =0\), the integration of the term \(\frac{\cosh \frac{t}{2}\sin \frac{\alpha }{2}}{\sinh ^2\frac{t}{2}+\sin ^2\frac{\alpha }{2}}\) in (8.44) does not give the antiderivative \(2\arctan \frac{\sinh \frac{t}{2}}{\sin \frac{\alpha }{2}}\). The integrand is, however, zero except for the point \(t=0\) (where it is undefined) in this case; therefore, the corresponding integral is zero, and the estimate (8.46) holds too.

As follows from Lemma 4, if the measure \(\phi \) satisfies our requirements, then the integral

$$\begin{aligned} \int \limits _{\langle \lambda ,\Lambda \rangle }\ln \left| 1-\frac{x}{l}\right| \,\mathrm {d}\phi (l) \end{aligned}$$

exists for any \(x\in \langle \lambda ,\Lambda \rangle \), especially for \(x=\lambda \) and \(x=\Lambda \) too. It implies the partial sums \(\{Z_k(l,x)\}_{k\ge 0}\) are for any \(x\in \langle \lambda ,\Lambda \rangle \) dominated by a common integrable function \(z_x(l)\), and due to the Lebesgue dominated convergence theorem,

$$\begin{aligned} \widehat{E}_R(x)= & {} \int \limits _{\lambda }^{\Lambda }\left[ c_0+\sum _{n\ge 1} \sqrt{2}c_n(x) T_n\left( \frac{\Lambda +\lambda -2l}{\Lambda -\lambda }\right) \right] \mathrm {d}\phi (l)\nonumber \\= & {} c_0+\sqrt{2}\sum _{n\ge 1}c_n(x)\int \limits _{\lambda }^{\Lambda } T_n\left( \frac{\Lambda +\lambda -2l}{\Lambda -\lambda }\right) \mathrm {d}\phi (l) \end{aligned}$$
(8.47)

for all \(x\in \langle \lambda ,\Lambda \rangle \).

The integrals on the right-hand side of (8.47) may be modified using the integration by parts

$$\begin{aligned} \int \limits _{\lambda }^{\Lambda } T_n\left( \frac{\Lambda +\lambda -2l}{\Lambda -\lambda }\right) \,\mathrm {d}\phi (l) =(-1)^n+\frac{2}{\Lambda -\lambda }\int \limits _{\lambda }^{\Lambda } nU_{n-1}\left( \frac{\Lambda +\lambda -2l}{\Lambda -\lambda }\right) F_L(l)\,\mathrm {d}l \end{aligned}$$

(the basic properties of the distribution function \(F_L\) are used: \(F_L(\Lambda )=1\), \(F_L(\lambda )=0\) due to its necessary continuity). We use again the substitution (8.42); the distribution function \(F_L(l)\) changes over to the function \(\widetilde{F}_L(\alpha )=F_L(\frac{\Lambda +\lambda }{2}-\frac{\Lambda -\lambda }{2}\cos \alpha )\) and

$$\begin{aligned} \int \limits _{\lambda }^{\Lambda } T_n\left( \frac{\Lambda +\lambda -2l}{\Lambda -\lambda }\right) \,\mathrm {d}\phi (l) =(-1)^n+n\int \limits _0^{\pi } \widetilde{F}_L(\alpha )\,\sin \,n\alpha \,\mathrm {d}\alpha \ . \end{aligned}$$

The function \(\widetilde{F}_L(\alpha )\) may be symmetrically extended onto the interval \(\langle -\pi ,\pi \rangle \) defining

$$\begin{aligned} \breve{F}_L(\alpha )=\left\{ \begin{array}{ll} \widetilde{F}_L(\alpha )&{}\hbox {for }\alpha \ge 0,\\ -\widetilde{F}_L(-\alpha )&{} \hbox {for }\alpha \le 0 \end{array}\right. \end{aligned}$$

(\(\breve{F}_L(0)=\widetilde{F}_L(0)=0\) according to its definition). The function \(\breve{F}_L\) is continuous, bounded on \(\langle -\pi ,\pi \rangle \) and odd; therefore, \(\breve{F}_L\in L_2(-\pi ,\pi )\), and it may be expressed in the form of a trigonometric series

$$\begin{aligned} \breve{F}_L(\alpha )=\sum _{j\ge 1}d_j\sin \,j\alpha \ . \end{aligned}$$
(8.48)

The expansion (8.48) may be used also on the interval \(\langle 0,\pi \rangle \), where \(\widetilde{F}_L(l)=\breve{F}_L(l)\); then

$$\begin{aligned} \int \limits _{\lambda }^{\Lambda } T_n\left( \frac{\Lambda +\lambda -2l}{\Lambda -\lambda }\right) \,\mathrm {d}\phi (l) =(-1)^n+n\int \limits _0^{\pi } \sin \,n\alpha \sum _{j\ge 1}d_j\sin \,j\alpha \,\mathrm {d}\alpha =(-1)^n+n\frac{\pi }{2}d_n\ . \end{aligned}$$
(8.49)

Substituting the results (8.49) and (8.61) into (8.47), we get

$$\begin{aligned} \widehat{E}_R(x)= & {} c_0+2\sum _{n\ge 1}\frac{1}{n}\left( (-1)^n+n\frac{\pi }{2}d_n\right) \left[ e^{-n\zeta } -\cos \left( n\arccos \frac{\lambda +\Lambda -2x}{\Lambda -\lambda }\right) \right] \nonumber \\= & {} c_0+2\sum _{n\ge 1}\left( \frac{(-1)^n}{n}+\frac{\pi }{2}d_n\right) \left[ e^{-n\zeta } -T_n\left( \frac{\lambda +\Lambda -2x}{\Lambda -\lambda }\right) \right] \ . \end{aligned}$$
(8.50)

The expression on the right-hand side of (8.50) may be written in the form

$$\begin{aligned} \widehat{E}_R(x)=\sum _{n\ge 0}b_n\, T_n\left( \frac{\lambda +\Lambda -2x}{\Lambda -\lambda }\right) \ . \end{aligned}$$
(8.51)

The coefficient \(b_0\) is finite, because \(|d_n|\le \frac{\Vert \breve{F}_L\Vert }{\sqrt{\pi }}\le \sqrt{2}\) for any \(n\in \mathcal {N}\) due to (8.48) and

$$\begin{aligned} |b_0|= & {} \left| c_0 +2\sum _{n\ge 1}\left( \frac{(-1)^n}{n}+\frac{\pi }{2}d_n\right) e^{-n\zeta }\right| \le \left| c_0\right| +2\sum _{n\ge 1}\left( 1+\frac{\pi }{\sqrt{2}}\right) e^{-n\zeta }\\\le & {} \left| c_0\right| +\left( 2+\pi \sqrt{2}\right) \frac{e^{-\zeta }}{1-e^{-\zeta }}\ . \end{aligned}$$

Moreover, since \(b_n=-\frac{(-1)^n}{n}-\frac{\pi }{2}d_n\) and both the sequences \(\{\frac{(-1)^n}{n}\}_{n\ge 1}\), \(\{d_n\}_{n\ge 1}\) belong to the space \(l_2\), also \(\{b_n\}_{n\ge 0}\in l_2\) and the series (8.51) is a Fourier expansion of a function \(\widehat{E}_R(x)\in L_{2,f}(\lambda ,\Lambda )\) with the weight function \(f=\frac{1}{\sqrt{(\Lambda -x)(x-\lambda )}}\).

The condition necessary to satisfy (3.6) is \(\widehat{E}_R(x)=const\) almost everywhere on \(\langle \lambda ,\Lambda \rangle \). The Fourier series representation (8.51) of such a function must have all coefficients \(b_n=0\) for \(n\ge 1\), so

$$\begin{aligned} d_n=\frac{2}{\pi }\cdot \frac{(-1)^{n-1}}{n} \end{aligned}$$
(8.52)

for all \(n\in \mathcal {N}\). Therefore, the Fourier series of the distribution function \(\widetilde{F}_L(\alpha )\) is unequivocally defined.

Let us suppose the existence of two distribution functions \(\widetilde{F}_1(\alpha )\), \(\widetilde{F}_2(\alpha )\) satisfying our requirements and their extensions \(\breve{F}_1(\alpha )\), \(\breve{F}_2(\alpha )\). The functions \(\breve{F}_i(\alpha )\) are continuous, as proved above, and they have bounded variation (equal to 2, as follows from the fact that \(\widetilde{F}_i(\alpha )\) are distribution functions). The trigonometric series (8.48) of both the functions has the coefficients (8.52). According to Dirichlet-Jordan theorem, this series converges uniformly to the functions \(\breve{F}_1(\alpha )\), \(\breve{F}_2(\alpha )\) on any closed interval \(\langle a,b\rangle \subset (-\pi ,\pi )\); therefore, \(\breve{F}_1(\alpha )=\breve{F}_2(\alpha )\) for any \(\alpha \in (-\pi ,\pi )\). The equality \(\breve{F}_1(\pi )=\breve{F}_2(\pi )\) follows from the fact that \(\widetilde{F}_L(\pi )=F_L(\Lambda )=1\) for any distribution function \(F_L\). This implies that there exists at most one distribution function \(F_L(l)\) which generates the function \(\widehat{E}_R(x)\) satisfying the condition (3.6).

The last task is the finding of the distribution function \(F_l(l)\) corresponding to the series (8.48) with the coefficients (8.52). It is known that the series

$$\begin{aligned} \sum _{n\ge 1}\frac{2}{n}(-1)^{n-1}\sin \,nx \end{aligned}$$

is trigonometric expansion of the function x on the interval \((-\pi ,\pi )\). Using this fact and the requirement \(\widetilde{F}_L(\pi )=1\), we get

$$\begin{aligned} \widetilde{F}_L(\alpha )= & {} \frac{\alpha }{\pi }\ ,\nonumber \\ F_L(l)= & {} \frac{1}{\pi }\arccos \frac{\Lambda +\lambda -2l}{\Lambda -\lambda }\ . \end{aligned}$$
(8.53)

Appendix 5: Variance of the Process with Optimal Step Length Distribution

Calculations in this appendix are technically complicated, and the details would require a lot of space. Therefore, we usually present only their basic principles and the final results.

Considering the distribution (3.13) of the random variable L, we express the variance \(\widehat{\sigma }^2_R(x)\) using the second moment

$$\begin{aligned} \widehat{\sigma }^2_R(x)=\int \limits _{\lambda }^{\Lambda } \frac{\left( \ln \left| 1-\frac{x}{l}\right| \right) ^2}{\pi \sqrt{\left( \frac{\Lambda -\lambda }{2}\right) ^2 -\left( l-\frac{\Lambda +\lambda }{2}\right) ^2}} \,\mathrm {d}l-\left( \widehat{E}_R(x)\right) ^2\ . \end{aligned}$$
(8.54)

The integral in (8.54) exists, because all the present singularities are integrable. The probability density f defined by (3.13) may be considered a weight function on \((\lambda ,\Lambda )\); the existence of the integral in (8.54) implies

$$\begin{aligned} \ln \left| 1-\frac{x}{l}\right| \in L_{2,f}(\lambda ,\Lambda ), \end{aligned}$$

and we can write

$$\begin{aligned} \widehat{\sigma }^2_R(x) =\left\| \ln \left| 1-\frac{x}{l}\right| \,\right\| ^2 _{L_{2,f}(\lambda ,\Lambda )} -\left( \widehat{E}_R(x)\right) ^2 \end{aligned}$$
(8.55)

If \(\mathcal {B}=\{\mathsf {b}_i\}_{i\ge 0}\) is an orthonormal basis of a real Hilbert space \(\mathcal {H}\), \(\mathsf {u}\in \mathcal {H}\), then we can express the vector \(\mathsf {u}\) using the Fourier series

$$\begin{aligned} \mathsf {u}=\sum _{i\ge 0}c_i\cdot \mathsf {b}_i =\sum _{i\ge 0}\langle \mathsf {b}_i|\mathsf {u}\rangle _{\mathcal {H}}\cdot \mathsf {b}_i, \end{aligned}$$

and its norm satisfies the Parseval’s identity

$$\begin{aligned} \Vert \mathsf {u}\Vert ^2=\sum _{i\ge 0}c_i^2 =\sum _{i\ge 0} \langle \mathsf {b}_i|\mathsf {u}\rangle _{\mathcal {H}}^2\ . \end{aligned}$$
(8.56)

The most common orthonormal basis in the Hilbert space \(L_{2,f}(\lambda ,\Lambda )\) with the weight (3.13) is constituted by the Chebyshev polynomials of the first kind

$$\begin{aligned} \mathsf {b}_0= & {} T_0(l)\ \ =\ \ 1\ ,\\ \mathsf {b}_n= & {} \sqrt{2}\cdot T_n \left( \frac{\lambda +\Lambda -2l}{\Lambda -\lambda }\right) \ \ =\ \ \sqrt{2}\cdot \cos \left( n\cdot \arccos \frac{\lambda +\Lambda -2l}{\Lambda -\lambda }\right) \ \ \hbox {for }n\in \mathcal {N}\ . \end{aligned}$$

Thus, the problem of the variance expression is transformed to the calculation of the Fourier coefficients

$$\begin{aligned} c_0(x)= & {} \int \limits _{\lambda }^{\Lambda } \frac{\ln \left| 1-\frac{x}{l}\right| }{\pi \sqrt{\left( \frac{\Lambda -\lambda }{2}\right) ^2 -\left( l-\frac{\Lambda +\lambda }{2}\right) ^2}} \,\mathrm {d}l\ =\ \widehat{E}_R(x)\ ,\\ c_n(x)= & {} \sqrt{2}\int \limits _{\lambda }^{\Lambda } \frac{\ln \left| 1-\frac{x}{l}\right| \cdot T_n\left( \frac{2l-(\lambda +\Lambda )}{\Lambda -\lambda }\right) }{\pi \sqrt{\left( \frac{\Lambda -\lambda }{2}\right) ^2 -\left( l-\frac{\Lambda +\lambda }{2}\right) ^2}} \,\mathrm {d}l\ \hbox { for }n\in \mathcal {N}\ .\nonumber \end{aligned}$$
(8.57)

The relation (8.54) may be then written, using (8.56) and (8.57), in the form

$$\begin{aligned} \widehat{\sigma }^2_R(x)=\sum _{n\ge 1}c_n^2\ . \end{aligned}$$
(8.58)

Let \(\Omega _{\varepsilon }(x)=(\lambda ,\Lambda )\times \langle 0,x\rangle \setminus \{[l,t]:|l-t|<\varepsilon \}\) for \(x\in \langle \lambda ,\Lambda \rangle \), \(\varepsilon \in (0,\min (\lambda ,\frac{\Lambda -\lambda }{3}))\), and

$$\begin{aligned} g_n(l,t)=\frac{T_n\left( \frac{\lambda +\Lambda -2l}{\Lambda -\lambda }\right) }{\pi (t-l)\sqrt{\left( \frac{\Lambda -\lambda }{2}\right) ^2 -\left( l-\frac{\Lambda +\lambda }{2}\right) ^2}} \end{aligned}$$

for integer \(n\ge 0\). We express the integral

$$\begin{aligned} C_n(\varepsilon ,x)=\int \limits _{\Omega _{\varepsilon }(x)}g_n(l,t) \,\mathrm {d}l\,\mathrm {d}t \end{aligned}$$

The integral \(C_n(\varepsilon ,x)\) exists for arbitrary \(x\in \langle \lambda ,\Lambda \rangle \) and any \(\varepsilon \in (0,\min \left( \lambda ,\frac{\Lambda -\lambda }{3})\right) \), because the singularity \(\frac{1}{\sqrt{(\Lambda -\lambda )^2-(2l-\Lambda -\lambda )^2}}\) is integrable. So, according to the Fubini’s theorem,

$$\begin{aligned} C_n(\varepsilon ,x)=\int \limits _{\lambda }^{\Lambda }k_n(l)\,\mathrm {d}l \end{aligned}$$
(8.59)

where

$$\begin{aligned} k_n(l)=\int \limits _{\langle 0,x\rangle \setminus (l-\varepsilon ,l-\varepsilon )} g_n(l,t)\,\mathrm {d}t\ . \end{aligned}$$

After the integration (8.59), we get

$$\begin{aligned} c_n(x)=\sqrt{2}\lim _{\varepsilon \rightarrow 0^+}C_n(\varepsilon ,x) \end{aligned}$$
(8.60)

for \(n\in \mathcal {N}\); analogous result without the factor \(\sqrt{2}\) holds for the value \(c_0\).

The change in the integration order when computing the integral \(C_n(\varepsilon ,x)\) gives

$$\begin{aligned} C_n(\varepsilon ,x)=\int \limits _0^x h_n(t,\varepsilon )\,\mathrm {d}t \end{aligned}$$

where

$$\begin{aligned} h_n(t,\varepsilon )=\int \limits _ {\langle \lambda ,\Lambda \rangle \setminus (t-\varepsilon ,t+\varepsilon )} g_n(l,t)\,\mathrm {d}l\ . \end{aligned}$$

The recurrence formula for the Chebyshev polynomials implies

$$\begin{aligned} g_n(l,t) =T_n\left( \frac{\lambda +\Lambda -2t}{\Lambda -\lambda }\right) g_0(l,t) +\sum _{i=1}^n U_{i-1}\left( \frac{\lambda +\Lambda -2t}{\Lambda -\lambda }\right) r_{n-i}(l)\ , \end{aligned}$$

where \(U_n\) denote the Chebyshev polynomials of second kind and

$$\begin{aligned} r_n(l)=\frac{ \frac{2(1+\mathrm {sign}n)}{\Lambda -\lambda }\cdot T_n\left( \frac{\lambda +\Lambda -2l}{\Lambda -\lambda }\right) }{\pi \sqrt{\left( \frac{\Lambda -\lambda }{2}\right) ^2 -\left( l-\frac{\Lambda +\lambda }{2}\right) ^2}}\ . \end{aligned}$$

After all the needed calculations (during the process, we find that many calculated integrals converge to zero at \(\varepsilon \rightarrow 0^+\)), we get the Fourier coefficients \(c_n(x)\), according to (8.60)

$$\begin{aligned} c_n(x)= & {} \sqrt{2}\lim _{\varepsilon \rightarrow 0^+}\left[ -\int \limits _0^{\lambda -\varepsilon } \frac{T_n\left( \frac{\lambda +\Lambda -2t}{\Lambda -\lambda }\right) }{\sqrt{(\Lambda -t)(\lambda -t)}}\,\mathrm {d}t +\int \limits _0^x \frac{2U_{n-1}\left( \frac{\lambda +\Lambda -2t}{\Lambda -\lambda }\right) }{\Lambda -\lambda }\,\mathrm {d}t\right] \ \nonumber \\= & {} \frac{\sqrt{2}}{n}\left[ e^{-n\mathop {\mathrm {argcosh}}\frac{\lambda +\Lambda }{\Lambda -\lambda }} -\cos \left( n\arccos \frac{\lambda +\Lambda -2x}{\Lambda -\lambda }\right) \right] \ \ \hbox {for }n\in \mathcal {N}\ ,\qquad \quad \end{aligned}$$
(8.61)
$$\begin{aligned} c_0(x)= & {} -\lim _{\varepsilon \rightarrow 0^+} \int \limits _0^{\lambda -\varepsilon } \frac{1}{\sqrt{(\Lambda -t)(\lambda -t)}}\,\mathrm {d}t\ = \ -\mathop {\mathrm {argcosh}}\frac{\lambda +\Lambda }{\Lambda -\lambda }\ . \end{aligned}$$
(8.62)

The Fourier coefficients \(c_n(x)\) were calculated with the aim to express the variance \(\widehat{\sigma }^2_R(x)\) according to (8.58). Before the attempt to calculate this sum, we define the parameters

$$\begin{aligned} \xi= & {} \arccos \frac{\lambda +\Lambda -2x}{\Lambda -\lambda }\ , \end{aligned}$$
(8.63)
$$\begin{aligned} \zeta= & {} \mathop {\mathrm {argcosh}}\frac{\lambda +\Lambda }{\Lambda -\lambda }\ =\ \ln \frac{\sqrt{\kappa (\mathsf {A})}+1}{\sqrt{\kappa (\mathsf {A})}-1}\ . \end{aligned}$$
(8.64)

The sum (8.58) may be written using (8.61) in the form

$$\begin{aligned} \widehat{\sigma }^2_R(x)=2\sum _{n\ge 1}\frac{1}{n^2}\left[ \left( 1-e^{-n\zeta }\right) ^2 -2\left( 1-e^{-n\zeta }\right) (1-\cos n\xi ) +(1-\cos n\xi )^2\right] \end{aligned}$$
(8.65)

We did not find the analytic expression of the sum (8.65); nevertheless, it may be approximated rather exactly, namely for small values \(\zeta \) corresponding to large condition numbers \(\kappa (\mathsf {A})\).

The contribution of the first term in the square bracket in (8.65) to the total sum is

$$\begin{aligned} \sum _{n\ge 1}\frac{1}{n^2}\left( 1-e^{-n\zeta }\right) ^2 =\sum _{n\ge 1}\left[ \int \limits _0^{\zeta }e^{-nt}\,\mathrm {d}t\right] ^2 =\sum _{n\ge 1}\int \limits _{\langle 0,\zeta \rangle \times \langle 0,\zeta \rangle } e^{-n(t+u)}\,\mathrm {d}S\ . \end{aligned}$$
(8.66)

We describe the integration area \([t,u]\in \langle 0,\zeta \rangle \times \langle 0,\zeta \rangle \) using the variables

$$\begin{aligned} \begin{array}{rcl} v&{}=&{}t+u\ ,\\ w&{}=&{}t-u\ . \end{array} \end{aligned}$$

Then, the integral (8.66) may be transformed into the form

$$\begin{aligned} \sum _{n\ge 1}\frac{1}{n^2}\left( 1-e^{-n\zeta }\right) ^2 =2\zeta \ln \left( 1+e^{-\zeta }\right) +2\int \limits _0^{\zeta } \frac{ve^{-v}}{1+e^{-v}}\,\mathrm {d}v\ .\ ~\ ~\ \end{aligned}$$
(8.67)

We can estimate the remaining integral on the right-hand side of (8.67) using the approximation

$$\begin{aligned} v\approx \frac{2}{3}\sinh \frac{v}{2}+\frac{8}{3}\tanh \frac{v}{4}\ . \end{aligned}$$
(8.68)

This (upper) estimate is very accurate for common condition numbers \(\kappa (\mathsf {A})\); values of the inaccuracy of the result are demonstrated in Table 2.

Table 2 Inaccuracy inserted into the sum (8.65) applying the estimate (8.68) in (8.67)

The inaccuracy behaves proportionally to \([\kappa (\mathsf {A})]^{-3}\) for sufficiently large \(\kappa (\mathsf {A})\), and its values are negligible in comparison with the total sum in (8.65).

After the integration in (8.67) using the approximation (8.68), we get

$$\begin{aligned} 2\sum _{n\ge 1}\frac{1}{n^2}\left( 1-e^{-n\zeta }\right) ^2\approx & {} 4\ln \frac{\sqrt{\kappa (\mathsf {A})}+1}{\sqrt{\kappa (\mathsf {A})}-1} \ln \frac{2\sqrt{\kappa (\mathsf {A})}}{\sqrt{\kappa (\mathsf {A})}+1} +\frac{8}{3}\left[ \frac{\sqrt{\kappa (\mathsf {A})}-1}{\sqrt{\kappa (\mathsf {A})-1}}-1\right] \nonumber \\&+\,\frac{32}{3}\ln \frac{\sqrt{\kappa (\mathsf {A})-1}+\sqrt{\kappa (\mathsf {A})}}{2(\sqrt{\kappa (\mathsf {A})}+1)} +\frac{40}{3}\arctan \frac{1}{\sqrt{\kappa (\mathsf {A})-1}}\ .\nonumber \\ \end{aligned}$$
(8.69)

The second term in the square bracket in (8.65) may be expressed in a similar way (without the use of variables v, w). The application of Euler’s formula and the approximation (8.68) give

$$\begin{aligned}&-4\sum _{n\ge 1}\frac{1}{n^2}\left( 1-e^{-n\zeta }\right) \left( 1-\cos \,n\xi \right) =-4\sum _{n\ge 1}\int \limits _0^{\zeta }e^{-nt}\,\mathrm {d}t \int \limits _0^{\xi }\sin nu\,\mathrm {d}u\nonumber \\&\quad =-4\int \limits _0^{\zeta }\int \limits _0^{\xi } \frac{e^{-t}\sin u}{1-2e^{-t}\cos u+e^{-2t}}\,\mathrm {d}u\,\mathrm {d}t\ =\nonumber \\&\quad =-2\left[ \zeta \ln \frac{\cosh \zeta -\cos \xi }{\cosh \zeta -1} -\int \limits _0^{\zeta } \frac{t\sinh t(\cos \xi -1)}{(\cosh t-\cos \xi )(\cosh t-1)}\,\mathrm {d}t \right] \nonumber \\&\quad \approx -2\ln \frac{\sqrt{\kappa (\mathsf {A})}+1}{\sqrt{\kappa (\mathsf {A})}-1} \ln \frac{x}{\lambda } +\frac{32}{3}\left( \sqrt{\kappa (\mathsf {A})}-\sqrt{\kappa (\mathsf {A})-1}\right) \nonumber \\&\quad -\frac{8}{3}\left( 5+\sqrt{\frac{\Lambda -x}{\Lambda -\lambda }}\right) \frac{\sqrt{x-\lambda }}{\sqrt{\Lambda -\lambda }+\sqrt{\Lambda -x}} \mathop {\mathrm {arccotan}}\sqrt{\frac{x}{\lambda }-1}\nonumber \\&\quad -\frac{64}{3}\sqrt{\frac{\Lambda -x}{x-\lambda }} \arctan \frac{(\sqrt{\kappa (\mathsf {A})}-\sqrt{\kappa (\mathsf {A})-1})\sqrt{x-\lambda }}{\sqrt{\Lambda -\lambda }+\sqrt{\Lambda -x}} \ ; \end{aligned}$$
(8.70)

the inserted inaccuracy may be estimated, using the Schwartz’ inequality, as

$$\begin{aligned} |D|= & {} 2\int \limits _0^{\zeta } \frac{(\frac{2}{3}\sinh \frac{t}{2}+\frac{8}{3}\tanh \frac{t}{4}-t)\sinh t(1-\cos \xi )}{(\cosh t-\cos \xi )(\cosh t-1)}\,\mathrm {d}t\nonumber \\= & {} \sin ^2\frac{\xi }{2}\int \limits _0^{\zeta } \frac{(\frac{2}{3}\sinh \frac{t}{2}+\frac{8}{3}\tanh \frac{t}{4}-t)\sinh t}{(\sinh ^2\frac{t}{2}+\sin ^2\frac{\xi }{2})\sinh ^2\frac{t}{2}}\,\mathrm {d}t\\\le & {} \sin ^2\frac{\xi }{2}\int \limits _0^{\zeta } \frac{(\frac{2}{3}\sinh \frac{t}{2}+\frac{8}{3}\tanh \frac{t}{4}-t)\cosh \frac{t}{2}}{\sinh ^2\frac{t}{2}\sin \frac{\xi }{2}}\,\mathrm {d}t\nonumber \\= & {} \sin \frac{\xi }{2}\cdot I(\kappa (\mathsf {A})) =\sqrt{\frac{x-\lambda }{\Lambda -\lambda }}\cdot I(\kappa (\mathsf {A})) \ ; \end{aligned}$$

values of the integral \(I(\kappa (\mathsf {A}))\) are demonstrated in Table 3.

Table 3 Inaccuracy inserted into the sum (8.65) applying the estimate (8.68) calculating (8.70)

The value \(I(\kappa (\mathsf {A}))\) behaves proportionally to \([\kappa (\mathsf {A})]^{-2}\) for sufficiently large \(\kappa (\mathsf {A})\), which is still negligible.

The third term in the square bracket in (8.65) may be expressed exactly because of the known fact that the sum \(\sum \frac{1}{n^2}\cos nx\) is the trigonometric series representation of the function \(\frac{1}{4}(x-\pi )^2-\frac{\pi ^2}{12}\) on the interval \(\langle 0,2\pi \rangle \). Since \(\xi \in \langle 0,\pi \rangle \), we can use this result also for the sum containing \(\cos ^2\xi =\frac{1}{2}(1+\cos 2\xi )\) and consequently

$$\begin{aligned} \sum _{n\ge 1}\frac{1}{n^2}\left( 1-\cos (n\xi )\right) ^2= & {} \frac{\pi ^2}{6}-\left[ \frac{1}{2}(\xi -\pi )^2-\frac{\pi ^2}{6}\right] +\frac{\pi ^2}{12}+\frac{1}{8}(2\xi -\pi )^2-\frac{\pi ^2}{24}\nonumber \\= & {} \frac{\pi }{2}\xi =\frac{\pi }{2}\arccos \frac{\lambda +\Lambda -2x}{\Lambda -\lambda }\ . \end{aligned}$$
(8.71)

We can substitute the results (8.69), (8.70), (8.71) into (8.65); however, the obtained result is very complicated and chaotic. Therefore, we prefer the graphic illustration in Fig. 14. When the matrix \(\mathsf {A}\) is sufficiently ill-conditioned (\(\kappa (\mathsf {A})>10^4\)), the contributions of the expressions (8.69), (8.70) to the sum (8.65) are negligible except for the values \(x\sim \lambda \); therefore, we can write

$$\begin{aligned} \widehat{\sigma }^2_R(x) \approx \pi \arccos \frac{\lambda +\Lambda -2x}{\Lambda -\lambda }\ . \end{aligned}$$
Fig. 14
figure 14

Dependence of the function \(\widehat{\sigma }^2_R(x)\) on the condition number \(\kappa (\mathsf {A})\)

The used method allows the calculation of the correlations of the random variables \(R_x\), \(R_y\), too. The covariance of these variables may be written in the form analogous to (8.55)

$$\begin{aligned} \mathrm {cov}(R_x,R_y) =\left<\ln \left| 1-\frac{x}{l}\right| \,|\ln \left| 1-\frac{y}{l}\right| \right> _{L_{2,f}(\lambda ,\Lambda )} -E_R(x)E_R(y) =\sum _{n\ge 1}c_n(x)c_n(y), \end{aligned}$$

and the results (8.61) may be used. Analogously to (8.65)

$$\begin{aligned} \mathrm {cov}(R_x,R_y)= & {} 2\sum _{n\ge 1}\frac{1}{n^2}\left[ \left( 1-e^{-n\zeta }\right) ^2 -\left( 1-e^{-n\zeta }\right) \left( 1-\cos n\xi \right) \right. \nonumber \\&\left. -\left( 1-e^{-n\zeta }\right) \left( 1-\cos n\eta \right) +\left( 1-\cos n\xi \right) \left( 1-\cos n\eta \right) \right] , \qquad \quad \end{aligned}$$
(8.72)

where

$$\begin{aligned} \eta =\arccos \frac{\lambda +\Lambda -2y}{\Lambda -\lambda }\ . \end{aligned}$$

The summation of the first three terms in the square bracket in (8.72) is performed above, the last term may be expressed using the trigonometric sum applied in (8.71). When applying it, we must take into account the fact that the used formula is valid for arguments \(x\ge 0\):

$$\begin{aligned} 2\sum _{n\ge 1}\frac{1}{n^2}\left( 1-\cos n\xi \right) \left( 1-\cos n\eta \right)= & {} \frac{1}{2}(\xi +\eta )\pi -\frac{1}{2}|\xi -\eta |\pi \ =\ \pi \min (\xi ,\eta )\\= & {} \pi \arccos \frac{\lambda +\Lambda -2\min (x,y)}{\Lambda -\lambda }\ . \end{aligned}$$

If \(\kappa (\mathsf {A})\gg 1\) and \(\min (x,y)\gtrsim \sqrt{\lambda \Lambda }\), this term significantly exceeds the remaining ones in (8.72), and the correlation

$$\begin{aligned} \varrho (R_x,R_y)\approx \frac{\pi \arccos \frac{\lambda +\Lambda -2\min (x,y)}{\Lambda -\lambda }}{\pi \sqrt{\arccos \frac{\lambda +\Lambda -2x}{\Lambda -\lambda } \arccos \frac{\lambda +\Lambda -2y}{\Lambda -\lambda }}} =\sqrt{\frac{\arccos \frac{\lambda +\Lambda -2\min (x,y)}{\Lambda -\lambda }}{\arccos \frac{\lambda +\Lambda -2\max (x,y)}{\Lambda -\lambda }}}\ . \end{aligned}$$

Appendix 6: Estimates of the Hessian Matrix Eigenvalues

If \(V(\mathsf {x})\) is a quadratic function, then the estimate \(\hat{\lambda }^{(j+1)}\) given by (4.4) represents the second derivative of \(V(\mathsf {x})\) in the direction \(\mathsf {r}_j\) (independently of the point \(\mathsf {x}\)). In the case of non-quadratic \(V(\mathsf {x})\), we can proceed analogously: We will propose a suitable value of the second derivative of \(V(\mathsf {x})\) along the jth step of the iterative procedure for the estimate \(\hat{\lambda }^{(j+1)}\).

Let \(V(\mathsf {x}_{j-1})\), \(V(\mathsf {x}_j)\) and \(\mathsf {g}_{j-1}=\nabla V(\mathsf {x}_{j-1})\), \(\mathsf {g}_j=\nabla V(\mathsf {x}_j)\) be given. We define a function of one variable

$$\begin{aligned} \widehat{V}(t)=V(\mathsf {x}_{j-1}+t\mathsf {p}_j)\ , \end{aligned}$$

where \(\mathsf {p}_j=\mathsf {x}_j-\mathsf {x}_{j-1}=-\gamma _j\mathsf {g}_{j-1}\). The second derivative of V taken at the point \(x_{j-1}+tp_j\) along the direction \(\mathsf {p}_j\) is

$$\begin{aligned} \frac{\partial ^2V(\mathsf {x}_{j-1}+t\mathsf {p}_j)}{\partial \left( \mathsf {p}_j/\Vert \mathsf {p}_j\Vert \right) ^2}= {1\over \Vert p_j\Vert ^2}{d^2\over dt^2}V(x_{j-1}+tp_j) ={\widehat{V}''(t)\over \Vert p_j\Vert ^2}\ . \end{aligned}$$
(8.73)

The values

$$\begin{aligned} \widehat{V}(0)=V(\mathsf {x}_{j-1})\ ,\ \ \widehat{V}(1)=V(\mathsf {x}_j)\ ,\ \ \widehat{V}'(0)=\langle \mathsf {g}_{j-1}|\mathsf {p}_j\rangle \ ,\ \ \widehat{V}'(1)=\langle \mathsf {g}_j|\mathsf {p}_j\rangle \ \end{aligned}$$
(8.74)

are known; therefore, we can interpolate the function \(\widehat{V}(t)\) by a cubic polynomial

$$\begin{aligned} \widehat{P}(t)=at^3+bt^2+ct+d \end{aligned}$$
(8.75)

where

$$\begin{aligned} a= & {} \langle \mathsf {g}_j|\mathsf {p}_j\rangle +\langle \mathsf {g}_{j-1}|\mathsf {p}_j\rangle -2\left[ V(\mathsf {x}_j)-V(\mathsf {x}_{j-1})\right] \ , \end{aligned}$$
(8.76)
$$\begin{aligned} b= & {} 3\left[ V(\mathsf {x}_j)-V(\mathsf {x}_{j-1})\right] -2\langle \mathsf {g}_{j-1}|\mathsf {p}_j\rangle -\langle \mathsf {g}_j|\mathsf {p}_j\rangle \ , \end{aligned}$$
(8.77)
$$\begin{aligned} c= & {} \langle \mathsf {g}_{j-1}|\mathsf {p}_j\rangle \ , \end{aligned}$$
(8.78)

satisfying the relations (8.74). Substituting the polynomial \(\widehat{P}(t)\) for \(\widehat{V}(t)\) in (8.73) gives

$$\begin{aligned} \frac{\partial ^2V(\mathsf {x}_{j-1}+t\mathsf {p}_j)}{\partial \left( \mathsf {p}_j/\Vert \mathsf {p}_j\Vert \right) ^2} =\frac{6at+2b}{\Vert \mathsf {p}_j\Vert ^2}\ . \end{aligned}$$
(8.79)

The considered step \(\mathsf {p}_j\) is characterized by the values \(t\in \langle 0,1\rangle \). In accordance with the considerations performed above, the maximum suitable value provided by (8.79) will be selected. So, depending on the values a, bc:

  1. 1.

    If \(a\le 0\), then the corresponding maximum is achieved for \(t=0\), so \(\hat{\lambda }^{(j+1)}=\frac{2b}{\Vert \mathsf {p}_j\Vert ^2}\). Nevertheless, the substitutive cubic polynomial \(\widehat{P}(t)\) may be decreasing on the whole set \(\mathcal {R}\), and in this case, it does not give any information about the behavior of the function V around its potential local minimum. This situation occurs, when the derivative of the polynomial \(\widehat{P}(t)\) has at most one real root, i.e.,

    $$\begin{aligned} b^2-3ac\le 0\ . \end{aligned}$$
    (8.80)

    In this case, the result is not accepted. If \(b\le 0\), then the function \(\widehat{V}(t)\) is not convex in the observed region, and its behavior has no relation to its properties in the neighborhood of its local minimum; therefore, the result is disregarded too.

  2. 2.

    If \(a>0\), then the value (8.79) increases when increasing t. Since \(\widehat{V}'(0)<0\) (the vectors \(\mathsf {g}_{j-1}\) and \(\mathsf {p}_j\) are oppositely oriented), there exists such a \(t_m>0\) that the function \(\widehat{V}(t)\) acquires its local minimum for \(t=t_m\); evidently, \(\widehat{V}''(t_m)>0\). Now, two situations are possible:

    1. (a)

      If \(t_m\le 1\) (which is equivalent to \(\widehat{V}'(1)=\langle \mathsf {g}_j|\mathsf {p}_j\rangle \ge 0\)), then the value \(t=1\) provides the maximum relevant value \(\widehat{V}''(t)\) associated with the jth step and \(\hat{\lambda }^{(j+1)}=\frac{6a+2b}{\Vert \mathsf {p}_j\Vert ^2}\).

    2. (b)

      If \(t_m>1\), then the step \(\mathsf {p}_j\) is too short, and it does not achieve the local minimum of the function \(\widehat{V}\). Since the points of minima are in the center of our interest, we prefer to work with the value \(\widehat{V}''(t_m)>\widehat{V}''(1)\). The value \(t_m\) is the positive root of the equation

      $$\begin{aligned} 3at^2+2bt+c=0\ , \end{aligned}$$

      thus

      $$\begin{aligned} \hat{\lambda }^{(j+1)}=\frac{6at_m+2b}{\Vert \mathsf {p}_j\Vert ^2} =\frac{2\sqrt{b^2-3ac}}{\Vert \mathsf {p}_j\Vert ^2}\ . \end{aligned}$$

As for the estimate \(\hat{\Lambda }^{(j+1)}\), the formula (4.4) based on (4.2) represents the second derivative of the function V in the direction \(\sqrt{\mathsf {H}}\mathsf {p}_j\) in the case of quadratic function \(V(\mathsf {x})\) bounded from below—then, the Hessian matrix \(\mathsf {H}\) is positive, and its square root exists. Therefore, we try to estimate the value \(\hat{\Lambda }^{(j+1)}\) corresponding to the jth step as the minimum relevant value

$$\begin{aligned} \hat{\Lambda }^{(j+1)} =\frac{\partial ^2V(\mathsf {x}_{j-1}+t\mathsf {p}_j)}{\partial (\sqrt{\mathsf {H}}\mathsf {p}_j/\Vert \sqrt{\mathsf {H}}\mathsf {p}_j\Vert )^2} =\frac{\Vert \mathsf {H}(\mathsf {x}_{j-1}+t\mathsf {p}_j)\mathsf {p}_j\Vert ^2}{\langle \mathsf {p}_j|\mathsf {H}(\mathsf {x}_{j-1}+t\mathsf {p}_j)\mathsf {p}_j\rangle }\ . \end{aligned}$$
(8.81)

The vector \(\mathsf {H}(\mathsf {x}_{j-1}+t\mathsf {p}_j)\mathsf {p}_j\) may be expressed in the form

$$\begin{aligned} \mathsf {H}(\mathsf {x}_{j-1}+t\mathsf {p}_j)\mathsf {p}_j =\frac{\mathrm {d}\,\mathsf {g}(\mathsf {x}_{j-1}+t\mathsf {p}_j)}{\mathrm {d}\,t}. \end{aligned}$$
(8.82)

We decompose the gradients \(\mathsf {g}(\mathsf {x})\) into two parts —the component \(\mathsf {g}_{\parallel }\) parallel to \(\mathsf {p}_j\) and the component \(\mathsf {g}_{\perp }\) perpendicular to \(\mathsf {p}_j\):

$$\begin{aligned} \mathsf {g}_{\parallel }(\mathsf {x}_{j-1}+t\mathsf {p}_j)= & {} \frac{\langle \mathsf {g}(\mathsf {x}_{j-1}+t\mathsf {p}_j)|\mathsf {p}_j\rangle }{\Vert \mathsf {p}_j\Vert ^2}\mathsf {p}_j\ ,\\ \mathsf {g}_{\perp }(\mathsf {x}_{j-1}+t\mathsf {p}_j)= & {} \mathsf {g}(\mathsf {x}_{j-1}+t\mathsf {p}_j) -\mathsf {g}_{\parallel }(\mathsf {x}_{j-1}+t\mathsf {p}_j)\ , \end{aligned}$$

Since we have not enough information about the gradients \(\mathsf {g}(\mathsf {x}_{j-1}+t\mathsf {p}_j)\), we must approximate them. For the component \(\mathsf {g}_{\parallel }\), the cubic interpolation (8.75) of the function \(\widehat{V}(t)\) is the most accessible one, since

$$\begin{aligned} \langle \mathsf {g}(\mathsf {x}_{j-1}+t\mathsf {p}_j)|\mathsf {p}_j\rangle =\frac{\mathrm {d}V(\mathsf {x}_{j-1}+t\mathsf {p}_j)}{\mathrm {d}\,t} \approx 3at^2+2bt+c\ . \end{aligned}$$

For the perpendicular component, we have only two available entries —the values \(\mathsf {g}_{j-1\perp }=\mathsf {0}\) and \(\mathsf {g}_{j\perp }\) corresponding to \(t=1\). Therefore, the linear interpolation

$$\begin{aligned} \mathsf {g}_{\perp }(\mathsf {x}_{j-1}+t\mathsf {p}_j) \approx t\left[ \mathsf {g}_j -\frac{\langle \mathsf {g}_j|\mathsf {p}_j\rangle }{\Vert \mathsf {p}_j\Vert ^2}\mathsf {p}_j\right] \end{aligned}$$

is applied. The derivatives of the particular components of gradients will be

$$\begin{aligned} \frac{\mathrm {d}\,\mathsf {g}_{\parallel }(\mathsf {x}_{j-1}+t\mathsf {p}_j)}{\mathrm {d}\,t} =\frac{6at+2b}{\Vert \mathsf {p}_j\Vert ^2}\cdot \mathsf {p}_j\ ,\ \ \frac{\mathrm {d}\,\mathsf {g}_{\perp }(\mathsf {x}_{j-1}+t\mathsf {p}_j)}{\mathrm {d}\,t} =\mathsf {g}_j -\frac{\langle \mathsf {g}_j|\mathsf {p}_j\rangle }{\Vert \mathsf {p}_j\Vert ^2}\mathsf {p}_j\ , \end{aligned}$$

and according to (8.82), we get the interpolation

$$\begin{aligned} \mathsf {H}(\mathsf {x}_{j-1}+t\mathsf {p}_j)\mathsf {p}_j =\frac{6at+2b}{\Vert \mathsf {p}_j\Vert ^2}\cdot \mathsf {p}_j +\mathsf {g}_j -\frac{\langle \mathsf {g}_j|\mathsf {p}_j\rangle }{\Vert \mathsf {p}_j\Vert ^2}\mathsf {p}_j\ . \end{aligned}$$

Using this result in (8.81), we get

$$\begin{aligned} \hat{\Lambda }^{(j+1)}= & {} \frac{(6at+2b)^2 +\Vert \mathsf {g}_j\Vert ^2\Vert \mathsf {p}_j\Vert ^2 -\langle \mathsf {g}_j|\mathsf {p}_j\rangle ^2}{(6at+2b)\Vert \mathsf {p}_j\Vert ^2}\nonumber \\= & {} \frac{1}{\Vert \mathsf {p}_j\Vert ^2}\cdot \left[ 6at+2b +\frac{\Vert \mathsf {g}_j\Vert ^2\Vert \mathsf {p}_j\Vert ^2 -\langle \mathsf {g}_j|\mathsf {p}_j\rangle ^2}{6at+2b}\right] \end{aligned}$$
(8.83)

for an appropriate t.

If \(6at+2b>0\), the expression (8.83) acquires its minimal value at

$$\begin{aligned} 6at+2b=\sqrt{\Vert \mathsf {g}_j\Vert ^2\Vert \mathsf {p}_j\Vert ^2 -\langle \mathsf {g}_j|\mathsf {p}_j\rangle ^2}\ , \end{aligned}$$

assuming \(\mathsf {g}_j\) not to be parallel to \(\mathsf {p}_j\). The choice of the parameter t depends on the values a, b, c

  1. 1.

    If \(a\le 0\) and \(b\le 0\), then the value \(\hat{\Lambda }^{(j+1)}\) is not proposed.

  2. 2.

    If \(a\le 0\) and \(b\in (0,\frac{1}{2}\sqrt{\Vert \mathsf {g}_j\Vert ^2\Vert \mathsf {p}_j\Vert ^2-\langle \mathsf {g}_j|\mathsf {p}_j\rangle ^2}\rangle \), or \(a\ge 0\) and \(b\ge \frac{1}{2}\sqrt{\Vert \mathsf {g}_j\Vert ^2\Vert \mathsf {p}_j\Vert ^2-\langle \mathsf {g}_j|\mathsf {p}_j\rangle ^2}\), then the expression (8.83) increases for \(t\ge 0\), so the value \(t=0\) is used in (8.83).

  3. 3.

    If \(a>0\) and \(b\le \frac{1}{2}\sqrt{\Vert \mathsf {g}_j\Vert ^2\Vert \mathsf {p}_j\Vert ^2-\langle \mathsf {g}_j|\mathsf {p}_j\rangle ^2}\), or \(a<0\) and \(b\ge \frac{1}{2}\sqrt{\Vert \mathsf {g}_j\Vert ^2\Vert \mathsf {p}_j\Vert ^2-\langle \mathsf {g}_j|\mathsf {p}_j\rangle ^2}\), then the value

    $$\begin{aligned} t_0=\frac{1}{6a}\left[ \sqrt{\Vert \mathsf {g}_j\Vert ^2\Vert \mathsf {p}_j\Vert ^2 -\langle \mathsf {g}_j|\mathsf {p}_j\rangle ^2}-2b\right] \ge 0 \end{aligned}$$

    is expressed. Now:

    1. (a)

      If \(t_0\le 1\), then the minimal possible value of the expression (8.83) is met in the inside point of the step \(\mathsf {p}_j\), so the value \(t_0\) is used in (8.83).

    2. (b)

      If \(t_0>1\), then the expression (8.83) is a decreasing function of the parameter \(t\in \langle 0,t_0\rangle \); therefore, the selection \(t=1\) is recommended. Nevertheless, if, in addition, \(a>0\), then we can consider also all the values \(t\le t_m\) (\(t_m\) is the parameter minimizing the value \(\widehat{V}(t)\), see above). The possibility \(t_m>1\) is characterized by the condition \(\langle \mathsf {p}_j|\mathsf {g}_j\rangle <0\), and in this case, the value \(t=\min (t_0,t_m)\) is used in (8.83).

  4. 4.

    Analogously to the calculation of the value \(\hat{\lambda }^{(j+1)}\), the results are disregarded, when the condition (8.80) is satisfied.

The expression of the values \(\hat{\lambda }^{(j+1)}\), \(\hat{\Lambda }^{(j+1)}\) is impossible, when \(a\le 0\) and \(b\le 0\), or in the case (8.80); in all the remaining situations, the estimates are available. The inaccessibility of the values \(\hat{\lambda }^{(j+1)}\), \(\hat{\Lambda }^{(j+1)}\) may be solved by two different ways:

  1. 1.

    In the first step (\(j=0\)), we need initialize the estimates \(\overline{\lambda }^{(1)}\), \(\underline{\Lambda }^{(1)}\). First, we propose a step length parameter \(\gamma _1\). If the unwanted situation mentioned above occurs, the easiest way to get some results is to increase the value \(\gamma _1\). Really, if \(\mathsf {p}_1=-\gamma _1\mathsf {g}_0\), then adding of the Eqs. (8.76) and (8.77) results in

    $$\begin{aligned} a+b=V(\mathsf {x}_1)-V(\mathsf {x}_0)+\gamma _1\Vert \mathsf {g}_0\Vert ^2 \ge V_0-V(\mathsf {x}_0)+\gamma _1\Vert \mathsf {g}_0\Vert ^2\ , \end{aligned}$$
    (8.84)

    since the function \(V(\mathsf {x})\) is assumed to be bounded from below by a value \(V_0\). Therefore, if \(\mathsf {g}_0\ne \mathsf {0}\), then it exists a \(\gamma _1>0\) for which the expression on the right-hand side of (8.84) is positive, which implies \(a>0\) or \(b>0\). However, the possibility (8.80) is not yet excluded. The requirement opposite to (8.80) may be transformed, using (8.76), (8.77), (8.78), into the form

    $$\begin{aligned}&\langle \mathsf {g}_1|\mathsf {p}_1\rangle ^2 -\left[ 6\left( V(\mathsf {x}_1)-V(\mathsf {x}_0)\right) +\gamma _1\Vert \mathsf {g}_0\Vert ^2\right] \langle \mathsf {g}_1|\mathsf {p}_1\rangle \\&\quad +\left[ 3\left( V(\mathsf {x}_1)-V(\mathsf {x}_0)\right) +\gamma _1\Vert \mathsf {g}_0\Vert ^2\right] ^2 >0\ . \end{aligned}$$

    This inequality is satisfied for any value \(\langle \mathsf {g}_1|\mathsf {p}_1\rangle \),Footnote 14 when its discriminant is negative; thus, for \(\gamma _1>0\),

    $$\begin{aligned} 4\left( V(\mathsf {x}_1)-V(\mathsf {x}_0)\right) +\gamma _1\Vert \mathsf {g}_0\Vert ^2>0\ ; \end{aligned}$$
    (8.85)

    this holds, when

    $$\begin{aligned} \gamma _1>\frac{4\left( V(\mathsf {x}_0)-V_0\right) }{\Vert \mathsf {g}_0\Vert ^2}\ . \end{aligned}$$
    (8.86)

    The condition (8.86) is stronger than the condition following from (8.84); therefore, for any \(\gamma _1\) satisfying (8.86), the initial estimates \(\overline{\lambda }^{(1)}=\hat{\lambda }^{(1)}>0\), \(\underline{\Lambda }^{(1)}=\hat{\Lambda }^{(1)}>0\) exist. The value \(V_0\) is, however, unknown in general; nevertheless, all our requirements guaranteeing the existence of estimates \(\hat{\lambda }^{(1)}>0\), \(\hat{\Lambda }^{(1)}>0\) are met, as the condition (8.85) is satisfied.

  2. 2.

    For \(j\ge 1\), the process described above may be used too, or simply the results for \(\hat{\lambda }^{(j+1)}\), \(\hat{\Lambda }^{(j+1)}\) are not employed.

Since the maximum possible estimate of the minimal eigenvalue of the Hessian matrix is selected, and the estimate of the maximal eigenvalue is minimized, the possibility \(\hat{\lambda }^{(j+1)}>\hat{\Lambda }^{(j+1)}\) is not excluded. In general, the use of shorter steps in the steepest descent method causes less damage, so in the nonsense case \(\overline{\lambda }^{(j)}>\underline{\Lambda }^{(j)}\), we prefer the application of the value \(l_j=\overline{\lambda }^{(j)}\); this is supported also by the use of a finer interpolation during its calculation.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kalousek, Z. Steepest Descent Method with Random Step Lengths. Found Comput Math 17, 359–422 (2017). https://doi.org/10.1007/s10208-015-9290-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10208-015-9290-8

Keywords

Mathematics Subject Classification

Navigation