Skip to main content
Log in

On the Convergence of Adaptive Stochastic Search Methods for Constrained and Multi-objective Black-Box Optimization

  • Published:
Journal of Optimization Theory and Applications Aims and scope Submit manuscript

Abstract

Stochastic search methods for global optimization and multi-objective optimization are widely used in practice, especially on problems with black-box objective and constraint functions. Although there are many theoretical results on the convergence of stochastic search methods, relatively few deal with black-box constraints and multiple black-box objectives and previous convergence analyses require feasible iterates. Moreover, some of the convergence conditions are difficult to verify for practical stochastic algorithms, and some of the theoretical results only apply to specific algorithms. First, this article presents some technical conditions that guarantee the convergence of a general class of adaptive stochastic algorithms for constrained black-box global optimization that do not require iterates to be always feasible and applies them to practical algorithms, including an evolutionary algorithm. The conditions are only required for a subsequence of the iterations and provide a recipe for making any algorithm converge to the global minimum in a probabilistic sense. Second, it uses the results for constrained optimization to derive convergence results for stochastic search methods for constrained multi-objective optimization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Baba, N.: Convergence of a random optimization method for constrained optimization problems. J. Optim. Theory Appl. 33(4), 451–461 (1981)

    Article  MathSciNet  MATH  Google Scholar 

  2. Price, W.L.: Global optimization by controlled random search. J. Optim. Theory Appl. 40(3), 333–348 (1983)

    Article  MathSciNet  MATH  Google Scholar 

  3. Fonseca, C.M., Fleming, P.J.: Genetic algorithms for multiobjective optimization: formulation, discussion and generalization. In: Forrest, S. (ed.) Proceedings of the Fifth International Conference on Genetic Algorithms, pp. 416–423. Morgan Kaufmann, San Mateo, CA (1993)

  4. Deb, K., Agrawal, S., Pratap, A., Meyarivan, T.: A fast and elitist multi-objective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002)

    Article  Google Scholar 

  5. Solis, F.J., Wets, R.J.B.: Minimization by random search techniques. Math. Oper. Res. 6(1), 19–30 (1981)

    Article  MathSciNet  MATH  Google Scholar 

  6. Pinter, J.D.: Global Optimization in Action. Kluwer Academic Publishers, Dordrecht (1996)

    Book  MATH  Google Scholar 

  7. Stephens, C.P., Baritompa, W.: Global optimization requires global information. J. Optim. Theory Appl. 96(3), 575–588 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  8. Spall, J.C.: Introduction to Stochastic Search and Optimization. Wiley, New Jersey (2003)

    Book  MATH  Google Scholar 

  9. Zabinsky, Z.B.: Stochastic Adaptive Search in Global Optimization. Springer US, New York (2003). http://www.springer.com/us/book/9781402075261?cm_mmc=sgw-_-ps-_-book-_-1-4020-7526-X

  10. Birbil, S., Fang, S.C., Sheu, R.L.: On the convergence of a population-based global optimization algorithm. J. Global Optim. 30(2–3), 301–318 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  11. Pinter, J.D.: Convergence properties of stochastic optimization procedures. Optim. J. Math. Program. Oper. Res. 15(3), 405–427 (1984)

    MathSciNet  MATH  Google Scholar 

  12. Baba, N., Takeda, H., Miyake, T.: Interactive multi-objective programming technique using random optimization method. Int. J. Syst. Sci. 19(1), 151–159 (1988)

    Article  MathSciNet  MATH  Google Scholar 

  13. Hanne, T.: On the convergence of multiobjective evolutionary algorithms. Eur. J. Oper. Res. 117, 553–564 (1999)

    Article  MATH  Google Scholar 

  14. Rudolph, G., Agapie, A.: Convergence properties of some multi-objective evolutionary algorithms. In: Proceedings of the 2000 Congress on Evolutionary Computation (CEC 2000), vol. 2, pp. 1010–1016. IEEE, La Jolla, CA (2000)

  15. Laumanns, M., Thiele, L., Deb, K., Zitzler, E.: Combining convergence and diversity in evolutionary multiobjective optimization. Evol. Comput. 10(3), 263–282 (2002)

    Article  Google Scholar 

  16. Schüetze, O., Laumanns, M., Coello, C.A.C., Dellnitz, M., Talbi, E.: Convergence of stochastic search algorithms to finite size pareto set approximations. J. Global Optim. 41(4), 559–577 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  17. Brockhoff, D.: Theoretical aspects of evolutionary multiobjective optimization. In: Auger, A., Doerr, B. (eds.) Theory of Randomized Search Heuristics: Foundations and Recent Developments, pp. 101–139. World Scientific Publishing Co., Inc., River Edge, NJ (2011). http://dl.acm.org/citation.cfm?id=1996312

  18. Regis, R.G.: Convergence guarantees for generalized adaptive stochastic search methods for continuous global optimization. Eur. J. Oper. Res. 207(3), 1187–1202 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  19. Resnick, S.I.: A Probability Path. Birkhäuser, Boston (1999)

    MATH  Google Scholar 

  20. Hansen, N.: The cma evolution strategy: a comparing review. In: Lozano, J.A., Nga, P.L., Inza, I., Bengoetxea, E. (eds.) Towards a New Evolutionary Computation: Advances in Estimation of Distribution Algorithms, pp. 75–102. Springer-Verlag Berlin Heidelberg (2006). http://www.springer.com/us/book/9783540290063

  21. Fang, K.T., Zhang, Y.T.: Generalized Multivariate Analysis. Science Press, Springer, Beijing (1990)

    MATH  Google Scholar 

  22. Regis, R.G.: Evolutionary programming for high-dimensional constrained expensive black-box optimization using radial basis functions. IEEE Trans. Evol. Comput. 18(3), 326–347 (2014)

    Article  Google Scholar 

  23. Bäck, T., Rudolph, G., Schwefel, H.-P.: Evolutionary programming and evolution strategies: similarities and differences. In: Fogel, D.B., Atmar, J.W. (eds.) Proceedings of the Second Annual Conference on Evolutionary Programming, pp. 11–22. Evolutionary Programming Society, La Jolla, CA (1993)

  24. Miettinen, K.: Nonlinear Multiobjective Optimization. Kluwer Academic Publisher, Boston, MA (1999)

    MATH  Google Scholar 

  25. Geoffrion, A.M.: Proper efficiency and the theory of vector maximization. J. Math. Anal. Appl. 22(3), 618–630 (1968)

    Article  MathSciNet  MATH  Google Scholar 

  26. Soland, R.M.: Multicriteria optimization: a general characterization of efficient solutions. Decis. Sci. 10(1), 26–38 (1979)

    Article  Google Scholar 

  27. Yu, P.L.: A class of solutions for group decision problems. Manag. Sci. 19(8), 936–946 (1973)

    Article  MathSciNet  MATH  Google Scholar 

  28. Zeleny, M.: Compromise programming. In: Cochrane, J.L., Zeleny, M. (eds.) Multiple Criteria Decision Making, pp. 262–301. University of South Carolina Press, Columbia, SC (1973)

    Google Scholar 

  29. Bazaraa, M.S., Sherali, H.D., Shetty, C.M.: Nonlinear Programming: Theory and Algorithms, 3rd edn. Wiley, New York (2006)

    Book  MATH  Google Scholar 

Download references

Acknowledgments

The author thanks the anonymous reviewers for their comments. He also thanks Saint Joseph’s University for awarding him with a Michael J. Morris Grant for this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rommel G. Regis.

Appendix

Appendix

This section provides complete proofs of some of the results mentioned earlier.

Proof of Proposition 2.1

Fix \(\epsilon >0\) and define \({\mathcal {S}}_{\epsilon } := \{x \in {\mathcal {D}} : f(x)<f^*+\epsilon \}\). By assumption,

$$\begin{aligned} P[X_{n_k} \in {\mathcal {S}}_{\epsilon }\ |\ \sigma (\mathcal {E}_{(n_k)-1})] \ge P[Y_{n_k} \in {\mathcal {S}}_{\epsilon }\ |\ \sigma (\mathcal {E}_{(n_k)-1})] \ge L(\epsilon ),\ \text{ for } \text{ any } k \ge 1. \end{aligned}$$

Now for each \(k \ge 1\), we have

$$\begin{aligned}&P[X_{n_1} \not \in {\mathcal {S}}_{\epsilon }, X_{n_2} \not \in {\mathcal {S}}_{\epsilon }, \ldots , X_{n_k} \not \in {\mathcal {S}}_{\epsilon }] \\&\quad = \prod _{i=1}^k P[X_{n_i} \not \in {\mathcal {S}}_{\epsilon }| X_{n_1} \not \in {\mathcal {S}}_{\epsilon }, X_{n_2} \not \in {\mathcal {S}}_{\epsilon }, \ldots , X_{n_{(i-1)}} \not \in {\mathcal {S}}_{\epsilon }]. \end{aligned}$$

By conditioning on the random elements in \(\mathcal {E}_{(n_i)-1}\), it is easy to check that for each \(\epsilon >0\), we have

$$\begin{aligned} P[X_{n_i} \in {\mathcal {S}}_{\epsilon }| X_{n_1} \not \in {\mathcal {S}}_{\epsilon }, X_{n_2} \not \in {\mathcal {S}}_{\epsilon }, \ldots , X_{n_{(i-1)}} \not \in {\mathcal {S}}_{\epsilon }] \ge L(\epsilon ). \end{aligned}$$

Thus,

$$\begin{aligned}&P[X_{n_1} \not \in {\mathcal {S}}_{\epsilon }, X_{n_2} \not \in {\mathcal {S}}_{\epsilon }, \ldots , X_{n_k} \not \in {\mathcal {S}}_{\epsilon }] \\&\quad = \prod _{i=1}^k P[X_{n_i} \not \in {\mathcal {S}}_{\epsilon }| X_{n_1} \not \in {\mathcal {S}}_{\epsilon }, X_{n_2} \not \in {\mathcal {S}}_{\epsilon }, \ldots , X_{n_{(i-1)}} \not \in {\mathcal {S}}_{\epsilon }]\\&\quad = \prod _{i=1}^k \left( 1-P[X_{n_i} \in {\mathcal {S}}_{\epsilon }| X_{n_1} \not \in {\mathcal {S}}_{\epsilon }, X_{n_2} \not \in {\mathcal {S}}_{\epsilon }, \ldots , X_{n_{(i-1)}} \not \in {\mathcal {S}}_{\epsilon }] \right) \le \left( 1-L(\epsilon )\right) ^k. \end{aligned}$$

Observe that if i is the smallest index such that \(X_i \in {\mathcal {S}}_{\epsilon }\), it follows that \(X_i^*=X_i\) and \(X_n^* \in {\mathcal {S}}_{\epsilon }\) for all \(n \ge i\). Consequently, if \(X_{n_k}^* \not \in {\mathcal {S}}_{\epsilon }\), then \(X_{n_1} \not \in {\mathcal {S}}_{\epsilon }, X_{n_2} \not \in {\mathcal {S}}_{\epsilon }, \ldots , X_{n_k} \not \in {\mathcal {S}}_{\epsilon }\). Hence, for each \(k \ge 1\),

$$\begin{aligned} P[f(X_{n_k}^*)-f^* \ge \epsilon ]= & {} P[f(X_{n_k}^*) \ge f^*+\epsilon ] \le P[X_{n_k}^* \not \in {\mathcal {S}}_{\epsilon }] \\\le & {} P[X_{n_1} \not \in {\mathcal {S}}_{\epsilon }, X_{n_2} \not \in {\mathcal {S}}_{\epsilon }, \ldots , X_{n_k} \not \in {\mathcal {S}}_{\epsilon }] \le \left( 1-L(\epsilon )\right) ^k, \end{aligned}$$

and so, \(\displaystyle {\lim _{k \rightarrow \infty }P[f(X_{n_k}^*)-f^* \ge \epsilon ]=0}\), i.e., \(f(X_{n_k}^*) \longrightarrow f^*\) in probability. By a standard result in probability theory (e.g., see [19], Theorem 6.3.1(b)), \(f(X_{n_{k(i)}}^*) \longrightarrow f^*\) a.s. as \(i \rightarrow \infty \) for some subsequence \(\{n_{k(i)}\}_{i \ge 1}\). Hence, \(\exists \mathcal {H}\subseteq \Omega \) such that \(P(\mathcal {H})=0\) and \(\displaystyle {\lim _{i \rightarrow \infty } f(X_{n_{k(i)}}^*(\omega ))=f^*}\) for all \(\omega \in \Omega {\setminus } \mathcal {H}\).

Next, define

$$\begin{aligned} \mathcal {I}\!:=\! \{ \omega \in \Omega \ :\ X_{n_k}^*(\omega )\! \not \in \! {\mathcal {D}}\ \text{ for } \text{ all } \text{ k }\} \!=\! \bigcap _{k=1}^{\infty } \{ \omega \in \Omega \ :\ X_{n_k}^*(\omega ) \!\not \in \! {\mathcal {D}}\} = \bigcap _{k=1}^{\infty } [X_{n_k}^* \not \in {\mathcal {D}}]. \end{aligned}$$

We wish to show that \(P(\mathcal {I})=0\). Since \(\{[X_{n_k}^* \not \in {\mathcal {D}}]\}_{k \ge 1}\) is a decreasing sequence of events in the \(\sigma \)-field \(\mathcal {B}\), it follows that

$$\begin{aligned} P(\mathcal {I}) = P( \bigcap _{k=1}^{\infty } [X_{n_k}^* \not \in {\mathcal {D}}] ) = \lim _{k \rightarrow \infty } P(X_{n_k}^* \not \in {\mathcal {D}}). \end{aligned}$$

Now, for all \(k \ge 1\),

$$\begin{aligned} \begin{array}{lcl} P(X_{n_k}^* \not \in {\mathcal {D}}) &{} \le &{} P(X_{n_1} \not \in {\mathcal {D}}, X_{n_2} \not \in {\mathcal {D}}, \ldots , X_{n_k} \not \in {\mathcal {D}}) \\ &{}=&{} \prod _{i=1}^k P[X_{n_i} \not \in {\mathcal {D}} | X_{n_1} \not \in {\mathcal {D}}, X_{n_2} \not \in {\mathcal {D}}, \ldots , X_{n_{(i-1)}} \not \in {\mathcal {D}}] \\ &{} = &{} \prod _{i=1}^k \left( 1-P[X_{n_i} \in {\mathcal {D}} | X_{n_1} \not \in {\mathcal {D}}, X_{n_2} \not \in {\mathcal {D}}, \ldots , X_{n_{(i-1)}} \not \in {\mathcal {D}}] \right) \\ \end{array} \end{aligned}$$

Moreover, for each \(i=1,\ldots ,k\),

$$\begin{aligned}&P[X_{n_i} \in {\mathcal {D}} | X_{n_1} \not \in {\mathcal {D}}, X_{n_2} \not \in {\mathcal {D}}, \ldots , X_{n_{(i-1)}} \not \in {\mathcal {D}}] \\&\quad \ge P[X_{n_i} \in {{\mathcal {S}}_{\epsilon }} | X_{n_1} \not \in {\mathcal {D}}, X_{n_2} \not \in {\mathcal {D}}, \ldots , X_{n_{(i-1)}} \not \in {\mathcal {D}}] \ge L(\epsilon ). \end{aligned}$$

Hence, for all \(k \ge 1\), \(P(\mathcal {I}) \le P(X_{n_k}^* \not \in {\mathcal {D}}) \le (1-L(\epsilon ))^k\), and so, \(P(\mathcal {I}) \le \lim _{k \rightarrow \infty } (1-L(\epsilon ))^k = 0\). This shows that \(P(\mathcal {I})=0\).

Clearly, \(P(\mathcal {I}\cup \mathcal {H}) \le P(\mathcal {I})+P(\mathcal {H})=0\), and so, \(P(\mathcal {I}\cup \mathcal {H})=0\). Next, fix \(\omega \in \Omega {\setminus } (\mathcal {I}\cup \mathcal {H})\). Since \(\omega \in \Omega {\setminus } \mathcal {I}\), \(\exists k(\omega )\) such that \(X_{n_{k(\omega )}}^*(\omega ) \in {\mathcal {D}}\), and so, \(X_n^*(\omega ) \in {\mathcal {D}}\) for all \(n \ge n_{k(\omega )}\). Hence, \(\{f(X_n^*(\omega ))\}_{n \ge n_{k(\omega )}}\) is monotonically non-increasing. Moreover, \(f(X_n^*(\omega )) \ge f^*\) for all \(n \ge n_{k(\omega )}\). Hence, \(\lim _{n \rightarrow \infty } f(X_n^*(\omega ))\) exists. Also, since \(\omega \in \Omega {\setminus } \mathcal {H}\), it follows that \(\lim _{i \rightarrow \infty } f(X_{n_{k(i)}}^*(\omega ))=f^*\). Hence, \(\lim _{n \rightarrow \infty } f(X_n^*(\omega ))=f^*\). This shows that \(f(X_n^*) \longrightarrow f^*\) a.s. \(\square \)

Proof of Proposition 2.2

Fix \(\epsilon >0\) and let \(\widetilde{f} := \inf \{ f(x) : x \in {\mathcal {D}}, \Vert x-x^*\Vert \ge \epsilon \}\). Since \(f(X_n^*) \longrightarrow f(x^*)\) a.s., it follows that \(\exists \mathcal {N}\subseteq \Omega \) with \(P(\mathcal {N})=0\) such that \(f(X_n^*(\omega )) \longrightarrow f(x^*)\) for all \(\omega \in \Omega {\setminus } {\mathcal {N}}\). As in the proof of Proposition 2.1, define \(\mathcal {I}:= \{ \omega \in \Omega \ :\ X_{n_k}^*(\omega ) \not \in {\mathcal {D}}\ \text{ for } \text{ all } k\} = \bigcap _{k=1}^{\infty } [X_{n_k}^* \not \in {\mathcal {D}}]\). It was shown that\(P(\mathcal {I})=0\).

Note that \(P(\mathcal {I}\cup \mathcal {N}) = 0\). Fix \(\omega \in \Omega {\setminus } (\mathcal {I}\cup \mathcal {N})\). Since \(\omega \in \Omega {\setminus } \mathcal {N}\), we have \(f(X_n^*(\omega )) \longrightarrow f(x^*)\). By assumption, \(\widetilde{f} - f(x^*) > 0\). Hence, there is an integer \(N(\omega )\) such that for all \(n \ge N(\omega )\), we have

$$\begin{aligned} f(X_n^*(\omega )) - f(x^*) = |f(X_n^*(\omega )) - f(x^*)| < \widetilde{f} - f(x^*), \end{aligned}$$

or equivalently, \(f(X_n^*(\omega )) < \widetilde{f}\). Moreover, since \(\omega \in \Omega {\setminus } \mathcal {I}\), there is an integer \(k(\omega )\) such that \(X_{n_{k(\omega )}}^*(\omega ) \in \mathcal {D}\). This implies that \(X_n^*(\omega ) \in \mathcal {D}\) for all \(n \ge n_{k(\omega )}\).

Now, for any \(n \ge \max (N(\omega ), k(\omega ))\), we have \(X_n^*(\omega ) \in \mathcal {D}\) and \(f(X_n^*(\omega ))<f^*\). Note that we must have \(\Vert X_n^*(\omega )-x^*\Vert <\epsilon \). (Otherwise, if \(\Vert X_n^*(\omega ) - x^*\Vert \ge \epsilon \), then \(f(X_n^*(\omega )) \ge \inf \{ f(x) : x \in {\mathcal {D}}, \Vert x-x^*\Vert \ge \epsilon \} = \widetilde{f}\), which is a contradiction.) This shows that \(X_n^*(\omega )~\longrightarrow ~x^*\) for each \(\omega \in \Omega {\setminus } (\mathcal {I}\cup \mathcal {N})\). Thus, \(X_n^* \longrightarrow x^*\) a.s. \(\square \)

Proof of Proposition 2.4

Since \(S \ne \emptyset \), the given assumption implies that \(\text{ int }(S) \ne \emptyset \) (otherwise \(\text{ cl }(S)=\emptyset \)). Next, if \(\text{ bd }(S)=\emptyset \), then the above statement is vacuously true so assume that \(\text{ bd }(S) \ne \emptyset \) and let \(x \in \text{ bd }(S)\). Then, \(x \in \text{ cl }(S)\). Since \(\text{ cl }(\text{ int }(S))=\text{ cl }(S)\), it follows that \(x \in \text{ cl }(\text{ int }(S))\). Since \(x \not \in \text{ int }(S)\), it follows that x is a limit point of \(\text{ int }(S)\). Thus, every neighborhood of x contains an interior point of S.

The second part of the proposition follows from a result in [29] (Corollary 3 p. 48), which states that if C is a convex set in \(\mathbb {R}^d\) with a nonempty interior, then \(\text{ cl }(\text{ int }(C))=\text{ cl }(C)\). \(\square \)

Proof of Corollary 2.1

For each \(k \ge 1\), the conditional distribution of \(Y_{n_k}\) given \(\sigma (\mathcal {E}_{(n_k)-1})\) is uniform over the box \([\ell ,u]\): \(h_{n_k}(y \ |\ \sigma (\mathcal {E}_{(n_k)-1})) = 1/\mu ([\ell ,u]),\ y \in [\ell ,u]\). Here, \(\mu ([\ell ,u])=\prod _{i=1}^d (u^{(i)}-\ell ^{(i)})\). Hence, for any \(y \in [\ell ,u]\),

$$\begin{aligned} h(y) := \inf _{k \ge 1} h_{n_k}(y \ |\ \sigma (\mathcal {E}_{(n_k)-1})) = 1/\mu ([\ell ,u]) > 0, \end{aligned}$$

and so, \(\mu (\{y \in [\ell ,u]\ :\ h(y)=0\})=\mu (\emptyset )=0\). By Proposition 2.5, \(f(X_n^*) \longrightarrow f^*\) a.s.\(\square \)

Proof of Proposition 2.6

For each \(k \ge 1\), the conditional distribution of \(Y_{n_k}\) given \(\sigma (\mathcal {E}_{(n_k)-1})\) is an elliptical distribution with conditional density

$$\begin{aligned} h_{n_k}(y \ |\ \sigma (\mathcal {E}_{(n_k)-1})) = \gamma [\det (C_k)]^{-1/2}\ \Psi ((y-u_k)^TC_k^{-1}(y-u_k)), \qquad y \in \mathbb {R}^d, \end{aligned}$$

where \(u_k \in \mathbb {R}^d\) is a realization of the random vector \(U_k\). By the same argument as in the proof of Theorem 6 in [18], it can be shown that for any \(y \in [\ell ,u]\),

$$\begin{aligned} h(y) \!:=\! \inf _{k \ge 1} h_{n_k}(y \ |\ \sigma (\mathcal {E}_{(n_k)-1})) \ge \gamma \left( \sup _{k \ge 1} \lambda _\mathrm{max}(C_k) \right) ^{-d/2} \Psi \left( \frac{\text{ diam }([\ell ,u])^2}{\inf _{k \ge 1} \lambda _\mathrm{min}(C_k)} \right) \!>\! 0, \end{aligned}$$

where \(\text{ diam }([\ell ,u]):=\Vert u-\ell \Vert \) is the largest distance between any two points in \([\ell ,u]\). Hence, \(\mu (\{y \in [\ell ,u]\ :\ h(y)=0\})=\mu (\emptyset )=0\). By Proposition 2.5, \(f(X_n^*) \longrightarrow f^*\) a.s. \(\square \)

Proof of Proposition 2.7

As before, we first check that the above EP algorithm follows the GARSCO framework. For \(n=1,2,\ldots ,\mu \), we have \(k_n=1\) and \(\Lambda _{n,1} = Y_n\). Moreover, for all \(n \ge \mu +1\), we have \(k_n=2\) and

$$\begin{aligned} \Lambda _{n,1}=Z_n\qquad \text{ and }\qquad \Lambda _{n,2}=\Xi _{t,i}=[\xi _{t,i}^{(0)},\xi _{t,i}^{(1)},\ldots ,\xi _{t,i}^{(d)}], \end{aligned}$$

where t and i are the unique integers such that \(t \ge 1\), \(1 \le i \le \mu \) and \(n=t\mu +i\). Hence,

$$\begin{aligned} \displaystyle { {\mathcal {E}}_{t\mu +i-1} = \{ Y_1,\ldots ,Y_{\mu }\} \bigcup \left( \displaystyle { \bigcup _{s=1}^{t-1} \bigcup _{j=1}^{\mu } \{Z_{s\mu +j},\ \Xi _{s,j} \} } \right) \bigcup \left( \displaystyle { \bigcup _{j=1}^{i-1} \{Z_{t\mu +j},\ \Xi _{t,j} \} } \right) }. \end{aligned}$$

Since the selection of the new parent population in Step 3.4 is done in a greedy manner, it follows that for each integer \(t \ge 1\) and \(i=1,2,\ldots ,\mu \), \(X(P_i(t-1))\) is a deterministic function of \(Y_1,Y_2,\ldots ,Y_{t\mu }\). This also implies that for each integer \(t \ge 1\) and \(i=1,2,\ldots ,\mu \), \(X(P_i(t-1))\) is also a deterministic function of \(Y_1,Y_2,\ldots ,Y_{t\mu +i-1}\). Hence, for each integer \(t \ge 1\) and \(i=1,2,\ldots ,\mu \), we have \(Y_{t\mu +i} = \Phi _{(t-1)\mu +i}(\mathcal {E}_{t\mu +i-1}) + Z_{t\mu +i}\), for some deterministic function \(\Phi _{(t-1)\mu +i}\). Note that this implies that

$$\begin{aligned} Y_{\mu +k} = \Phi _k(\mathcal {E}_{\mu +k-1}) + Z_{\mu +k},\ \text{ for } \text{ all }\ k \ge 1 \end{aligned}$$

where \(Z_{\mu +k}\) is a random vector whose conditional distribution given \(\sigma ({\mathcal {E}}_{\mu +k-1})\) is a normal distribution with mean vector 0 and diagonal covariance matrix

$$\begin{aligned} \text{ Cov }(Z_{\mu +k}) = \text{ diag }\left( \left( \sigma _{\mu +k}^{(1)}\right) ^2, \left( \sigma _{\mu +k}^{(2)}\right) ^2, \ldots , \left( \sigma _{\mu +k}^{(d)}\right) ^2 \right) \end{aligned}$$

For each integer \(k \ge 1\) and \(j=1,2,\ldots ,d\), we have \(\left( \sigma _{\mu +k}^{(j)} \right) ^2 \ge \sigma _{\small min}^2 > 0\). Define the subsequence \(\{n_k\}_{k \ge 1}\) by \(n_k:=\mu +k\) for all \(k \ge 1\). Then, we have \(Y_{n_k} = \Phi _k(\mathcal {E}_{(n_k)-1}) + W_k\), for all \(k \ge 1\), where \(W_k=Z_{n_k}\). Let \(\lambda _k\) be the smallest eigenvalue of \(\text{ Cov }(W_k)\). Since the eigenvalues of \(\text{ Cov }(W_k)\) are \(\left( \sigma _{\mu +k}^{(1)}\right) ^2, \ldots , \left( \sigma _{\mu +k}^{(d)}\right) ^2\), we have \(\displaystyle {\lambda _k = \min _{1 \le j \le d} \left( \sigma _{\mu +k}^{(j)} \right) ^2 \ge \sigma _{\small min}^2}\), and so, \(\inf _{k \ge 1} \lambda _k \ge \sigma _{\small min}^2 > 0\). Moreover, the conditional distribution of \(W_k\) given \(\sigma ({\mathcal {E}}_{(n_k)-1})\) is an elliptical distribution with conditional density given by

$$\begin{aligned} q_k(w\ |\ \sigma (\mathcal {E}_{(n_k)-1})) = \gamma [\det (C_k)]^{-1/2}\ \Psi (w^TC_k^{-1}w),\quad z \in \mathbb {R}^d \end{aligned}$$

where \(\Psi (y)=e^{-y/2}\) and \(\gamma = (2\pi )^{-d/2}\). Again, \(\Psi (y)=e^{-y/2}\) is monotonically nonincreasing, and so, the conclusion follows from Proposition 2.6. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Regis, R.G. On the Convergence of Adaptive Stochastic Search Methods for Constrained and Multi-objective Black-Box Optimization. J Optim Theory Appl 170, 932–959 (2016). https://doi.org/10.1007/s10957-016-0977-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10957-016-0977-z

Keywords

Mathematics Subject Classification

Navigation