Abstract
Stochastic search methods for global optimization and multi-objective optimization are widely used in practice, especially on problems with black-box objective and constraint functions. Although there are many theoretical results on the convergence of stochastic search methods, relatively few deal with black-box constraints and multiple black-box objectives and previous convergence analyses require feasible iterates. Moreover, some of the convergence conditions are difficult to verify for practical stochastic algorithms, and some of the theoretical results only apply to specific algorithms. First, this article presents some technical conditions that guarantee the convergence of a general class of adaptive stochastic algorithms for constrained black-box global optimization that do not require iterates to be always feasible and applies them to practical algorithms, including an evolutionary algorithm. The conditions are only required for a subsequence of the iterations and provide a recipe for making any algorithm converge to the global minimum in a probabilistic sense. Second, it uses the results for constrained optimization to derive convergence results for stochastic search methods for constrained multi-objective optimization.
Similar content being viewed by others
References
Baba, N.: Convergence of a random optimization method for constrained optimization problems. J. Optim. Theory Appl. 33(4), 451–461 (1981)
Price, W.L.: Global optimization by controlled random search. J. Optim. Theory Appl. 40(3), 333–348 (1983)
Fonseca, C.M., Fleming, P.J.: Genetic algorithms for multiobjective optimization: formulation, discussion and generalization. In: Forrest, S. (ed.) Proceedings of the Fifth International Conference on Genetic Algorithms, pp. 416–423. Morgan Kaufmann, San Mateo, CA (1993)
Deb, K., Agrawal, S., Pratap, A., Meyarivan, T.: A fast and elitist multi-objective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002)
Solis, F.J., Wets, R.J.B.: Minimization by random search techniques. Math. Oper. Res. 6(1), 19–30 (1981)
Pinter, J.D.: Global Optimization in Action. Kluwer Academic Publishers, Dordrecht (1996)
Stephens, C.P., Baritompa, W.: Global optimization requires global information. J. Optim. Theory Appl. 96(3), 575–588 (1998)
Spall, J.C.: Introduction to Stochastic Search and Optimization. Wiley, New Jersey (2003)
Zabinsky, Z.B.: Stochastic Adaptive Search in Global Optimization. Springer US, New York (2003). http://www.springer.com/us/book/9781402075261?cm_mmc=sgw-_-ps-_-book-_-1-4020-7526-X
Birbil, S., Fang, S.C., Sheu, R.L.: On the convergence of a population-based global optimization algorithm. J. Global Optim. 30(2–3), 301–318 (2004)
Pinter, J.D.: Convergence properties of stochastic optimization procedures. Optim. J. Math. Program. Oper. Res. 15(3), 405–427 (1984)
Baba, N., Takeda, H., Miyake, T.: Interactive multi-objective programming technique using random optimization method. Int. J. Syst. Sci. 19(1), 151–159 (1988)
Hanne, T.: On the convergence of multiobjective evolutionary algorithms. Eur. J. Oper. Res. 117, 553–564 (1999)
Rudolph, G., Agapie, A.: Convergence properties of some multi-objective evolutionary algorithms. In: Proceedings of the 2000 Congress on Evolutionary Computation (CEC 2000), vol. 2, pp. 1010–1016. IEEE, La Jolla, CA (2000)
Laumanns, M., Thiele, L., Deb, K., Zitzler, E.: Combining convergence and diversity in evolutionary multiobjective optimization. Evol. Comput. 10(3), 263–282 (2002)
Schüetze, O., Laumanns, M., Coello, C.A.C., Dellnitz, M., Talbi, E.: Convergence of stochastic search algorithms to finite size pareto set approximations. J. Global Optim. 41(4), 559–577 (2008)
Brockhoff, D.: Theoretical aspects of evolutionary multiobjective optimization. In: Auger, A., Doerr, B. (eds.) Theory of Randomized Search Heuristics: Foundations and Recent Developments, pp. 101–139. World Scientific Publishing Co., Inc., River Edge, NJ (2011). http://dl.acm.org/citation.cfm?id=1996312
Regis, R.G.: Convergence guarantees for generalized adaptive stochastic search methods for continuous global optimization. Eur. J. Oper. Res. 207(3), 1187–1202 (2010)
Resnick, S.I.: A Probability Path. Birkhäuser, Boston (1999)
Hansen, N.: The cma evolution strategy: a comparing review. In: Lozano, J.A., Nga, P.L., Inza, I., Bengoetxea, E. (eds.) Towards a New Evolutionary Computation: Advances in Estimation of Distribution Algorithms, pp. 75–102. Springer-Verlag Berlin Heidelberg (2006). http://www.springer.com/us/book/9783540290063
Fang, K.T., Zhang, Y.T.: Generalized Multivariate Analysis. Science Press, Springer, Beijing (1990)
Regis, R.G.: Evolutionary programming for high-dimensional constrained expensive black-box optimization using radial basis functions. IEEE Trans. Evol. Comput. 18(3), 326–347 (2014)
Bäck, T., Rudolph, G., Schwefel, H.-P.: Evolutionary programming and evolution strategies: similarities and differences. In: Fogel, D.B., Atmar, J.W. (eds.) Proceedings of the Second Annual Conference on Evolutionary Programming, pp. 11–22. Evolutionary Programming Society, La Jolla, CA (1993)
Miettinen, K.: Nonlinear Multiobjective Optimization. Kluwer Academic Publisher, Boston, MA (1999)
Geoffrion, A.M.: Proper efficiency and the theory of vector maximization. J. Math. Anal. Appl. 22(3), 618–630 (1968)
Soland, R.M.: Multicriteria optimization: a general characterization of efficient solutions. Decis. Sci. 10(1), 26–38 (1979)
Yu, P.L.: A class of solutions for group decision problems. Manag. Sci. 19(8), 936–946 (1973)
Zeleny, M.: Compromise programming. In: Cochrane, J.L., Zeleny, M. (eds.) Multiple Criteria Decision Making, pp. 262–301. University of South Carolina Press, Columbia, SC (1973)
Bazaraa, M.S., Sherali, H.D., Shetty, C.M.: Nonlinear Programming: Theory and Algorithms, 3rd edn. Wiley, New York (2006)
Acknowledgments
The author thanks the anonymous reviewers for their comments. He also thanks Saint Joseph’s University for awarding him with a Michael J. Morris Grant for this research.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
This section provides complete proofs of some of the results mentioned earlier.
Proof of Proposition 2.1
Fix \(\epsilon >0\) and define \({\mathcal {S}}_{\epsilon } := \{x \in {\mathcal {D}} : f(x)<f^*+\epsilon \}\). By assumption,
Now for each \(k \ge 1\), we have
By conditioning on the random elements in \(\mathcal {E}_{(n_i)-1}\), it is easy to check that for each \(\epsilon >0\), we have
Thus,
Observe that if i is the smallest index such that \(X_i \in {\mathcal {S}}_{\epsilon }\), it follows that \(X_i^*=X_i\) and \(X_n^* \in {\mathcal {S}}_{\epsilon }\) for all \(n \ge i\). Consequently, if \(X_{n_k}^* \not \in {\mathcal {S}}_{\epsilon }\), then \(X_{n_1} \not \in {\mathcal {S}}_{\epsilon }, X_{n_2} \not \in {\mathcal {S}}_{\epsilon }, \ldots , X_{n_k} \not \in {\mathcal {S}}_{\epsilon }\). Hence, for each \(k \ge 1\),
and so, \(\displaystyle {\lim _{k \rightarrow \infty }P[f(X_{n_k}^*)-f^* \ge \epsilon ]=0}\), i.e., \(f(X_{n_k}^*) \longrightarrow f^*\) in probability. By a standard result in probability theory (e.g., see [19], Theorem 6.3.1(b)), \(f(X_{n_{k(i)}}^*) \longrightarrow f^*\) a.s. as \(i \rightarrow \infty \) for some subsequence \(\{n_{k(i)}\}_{i \ge 1}\). Hence, \(\exists \mathcal {H}\subseteq \Omega \) such that \(P(\mathcal {H})=0\) and \(\displaystyle {\lim _{i \rightarrow \infty } f(X_{n_{k(i)}}^*(\omega ))=f^*}\) for all \(\omega \in \Omega {\setminus } \mathcal {H}\).
Next, define
We wish to show that \(P(\mathcal {I})=0\). Since \(\{[X_{n_k}^* \not \in {\mathcal {D}}]\}_{k \ge 1}\) is a decreasing sequence of events in the \(\sigma \)-field \(\mathcal {B}\), it follows that
Now, for all \(k \ge 1\),
Moreover, for each \(i=1,\ldots ,k\),
Hence, for all \(k \ge 1\), \(P(\mathcal {I}) \le P(X_{n_k}^* \not \in {\mathcal {D}}) \le (1-L(\epsilon ))^k\), and so, \(P(\mathcal {I}) \le \lim _{k \rightarrow \infty } (1-L(\epsilon ))^k = 0\). This shows that \(P(\mathcal {I})=0\).
Clearly, \(P(\mathcal {I}\cup \mathcal {H}) \le P(\mathcal {I})+P(\mathcal {H})=0\), and so, \(P(\mathcal {I}\cup \mathcal {H})=0\). Next, fix \(\omega \in \Omega {\setminus } (\mathcal {I}\cup \mathcal {H})\). Since \(\omega \in \Omega {\setminus } \mathcal {I}\), \(\exists k(\omega )\) such that \(X_{n_{k(\omega )}}^*(\omega ) \in {\mathcal {D}}\), and so, \(X_n^*(\omega ) \in {\mathcal {D}}\) for all \(n \ge n_{k(\omega )}\). Hence, \(\{f(X_n^*(\omega ))\}_{n \ge n_{k(\omega )}}\) is monotonically non-increasing. Moreover, \(f(X_n^*(\omega )) \ge f^*\) for all \(n \ge n_{k(\omega )}\). Hence, \(\lim _{n \rightarrow \infty } f(X_n^*(\omega ))\) exists. Also, since \(\omega \in \Omega {\setminus } \mathcal {H}\), it follows that \(\lim _{i \rightarrow \infty } f(X_{n_{k(i)}}^*(\omega ))=f^*\). Hence, \(\lim _{n \rightarrow \infty } f(X_n^*(\omega ))=f^*\). This shows that \(f(X_n^*) \longrightarrow f^*\) a.s. \(\square \)
Proof of Proposition 2.2
Fix \(\epsilon >0\) and let \(\widetilde{f} := \inf \{ f(x) : x \in {\mathcal {D}}, \Vert x-x^*\Vert \ge \epsilon \}\). Since \(f(X_n^*) \longrightarrow f(x^*)\) a.s., it follows that \(\exists \mathcal {N}\subseteq \Omega \) with \(P(\mathcal {N})=0\) such that \(f(X_n^*(\omega )) \longrightarrow f(x^*)\) for all \(\omega \in \Omega {\setminus } {\mathcal {N}}\). As in the proof of Proposition 2.1, define \(\mathcal {I}:= \{ \omega \in \Omega \ :\ X_{n_k}^*(\omega ) \not \in {\mathcal {D}}\ \text{ for } \text{ all } k\} = \bigcap _{k=1}^{\infty } [X_{n_k}^* \not \in {\mathcal {D}}]\). It was shown that\(P(\mathcal {I})=0\).
Note that \(P(\mathcal {I}\cup \mathcal {N}) = 0\). Fix \(\omega \in \Omega {\setminus } (\mathcal {I}\cup \mathcal {N})\). Since \(\omega \in \Omega {\setminus } \mathcal {N}\), we have \(f(X_n^*(\omega )) \longrightarrow f(x^*)\). By assumption, \(\widetilde{f} - f(x^*) > 0\). Hence, there is an integer \(N(\omega )\) such that for all \(n \ge N(\omega )\), we have
or equivalently, \(f(X_n^*(\omega )) < \widetilde{f}\). Moreover, since \(\omega \in \Omega {\setminus } \mathcal {I}\), there is an integer \(k(\omega )\) such that \(X_{n_{k(\omega )}}^*(\omega ) \in \mathcal {D}\). This implies that \(X_n^*(\omega ) \in \mathcal {D}\) for all \(n \ge n_{k(\omega )}\).
Now, for any \(n \ge \max (N(\omega ), k(\omega ))\), we have \(X_n^*(\omega ) \in \mathcal {D}\) and \(f(X_n^*(\omega ))<f^*\). Note that we must have \(\Vert X_n^*(\omega )-x^*\Vert <\epsilon \). (Otherwise, if \(\Vert X_n^*(\omega ) - x^*\Vert \ge \epsilon \), then \(f(X_n^*(\omega )) \ge \inf \{ f(x) : x \in {\mathcal {D}}, \Vert x-x^*\Vert \ge \epsilon \} = \widetilde{f}\), which is a contradiction.) This shows that \(X_n^*(\omega )~\longrightarrow ~x^*\) for each \(\omega \in \Omega {\setminus } (\mathcal {I}\cup \mathcal {N})\). Thus, \(X_n^* \longrightarrow x^*\) a.s. \(\square \)
Proof of Proposition 2.4
Since \(S \ne \emptyset \), the given assumption implies that \(\text{ int }(S) \ne \emptyset \) (otherwise \(\text{ cl }(S)=\emptyset \)). Next, if \(\text{ bd }(S)=\emptyset \), then the above statement is vacuously true so assume that \(\text{ bd }(S) \ne \emptyset \) and let \(x \in \text{ bd }(S)\). Then, \(x \in \text{ cl }(S)\). Since \(\text{ cl }(\text{ int }(S))=\text{ cl }(S)\), it follows that \(x \in \text{ cl }(\text{ int }(S))\). Since \(x \not \in \text{ int }(S)\), it follows that x is a limit point of \(\text{ int }(S)\). Thus, every neighborhood of x contains an interior point of S.
The second part of the proposition follows from a result in [29] (Corollary 3 p. 48), which states that if C is a convex set in \(\mathbb {R}^d\) with a nonempty interior, then \(\text{ cl }(\text{ int }(C))=\text{ cl }(C)\). \(\square \)
Proof of Corollary 2.1
For each \(k \ge 1\), the conditional distribution of \(Y_{n_k}\) given \(\sigma (\mathcal {E}_{(n_k)-1})\) is uniform over the box \([\ell ,u]\): \(h_{n_k}(y \ |\ \sigma (\mathcal {E}_{(n_k)-1})) = 1/\mu ([\ell ,u]),\ y \in [\ell ,u]\). Here, \(\mu ([\ell ,u])=\prod _{i=1}^d (u^{(i)}-\ell ^{(i)})\). Hence, for any \(y \in [\ell ,u]\),
and so, \(\mu (\{y \in [\ell ,u]\ :\ h(y)=0\})=\mu (\emptyset )=0\). By Proposition 2.5, \(f(X_n^*) \longrightarrow f^*\) a.s.\(\square \)
Proof of Proposition 2.6
For each \(k \ge 1\), the conditional distribution of \(Y_{n_k}\) given \(\sigma (\mathcal {E}_{(n_k)-1})\) is an elliptical distribution with conditional density
where \(u_k \in \mathbb {R}^d\) is a realization of the random vector \(U_k\). By the same argument as in the proof of Theorem 6 in [18], it can be shown that for any \(y \in [\ell ,u]\),
where \(\text{ diam }([\ell ,u]):=\Vert u-\ell \Vert \) is the largest distance between any two points in \([\ell ,u]\). Hence, \(\mu (\{y \in [\ell ,u]\ :\ h(y)=0\})=\mu (\emptyset )=0\). By Proposition 2.5, \(f(X_n^*) \longrightarrow f^*\) a.s. \(\square \)
Proof of Proposition 2.7
As before, we first check that the above EP algorithm follows the GARSCO framework. For \(n=1,2,\ldots ,\mu \), we have \(k_n=1\) and \(\Lambda _{n,1} = Y_n\). Moreover, for all \(n \ge \mu +1\), we have \(k_n=2\) and
where t and i are the unique integers such that \(t \ge 1\), \(1 \le i \le \mu \) and \(n=t\mu +i\). Hence,
Since the selection of the new parent population in Step 3.4 is done in a greedy manner, it follows that for each integer \(t \ge 1\) and \(i=1,2,\ldots ,\mu \), \(X(P_i(t-1))\) is a deterministic function of \(Y_1,Y_2,\ldots ,Y_{t\mu }\). This also implies that for each integer \(t \ge 1\) and \(i=1,2,\ldots ,\mu \), \(X(P_i(t-1))\) is also a deterministic function of \(Y_1,Y_2,\ldots ,Y_{t\mu +i-1}\). Hence, for each integer \(t \ge 1\) and \(i=1,2,\ldots ,\mu \), we have \(Y_{t\mu +i} = \Phi _{(t-1)\mu +i}(\mathcal {E}_{t\mu +i-1}) + Z_{t\mu +i}\), for some deterministic function \(\Phi _{(t-1)\mu +i}\). Note that this implies that
where \(Z_{\mu +k}\) is a random vector whose conditional distribution given \(\sigma ({\mathcal {E}}_{\mu +k-1})\) is a normal distribution with mean vector 0 and diagonal covariance matrix
For each integer \(k \ge 1\) and \(j=1,2,\ldots ,d\), we have \(\left( \sigma _{\mu +k}^{(j)} \right) ^2 \ge \sigma _{\small min}^2 > 0\). Define the subsequence \(\{n_k\}_{k \ge 1}\) by \(n_k:=\mu +k\) for all \(k \ge 1\). Then, we have \(Y_{n_k} = \Phi _k(\mathcal {E}_{(n_k)-1}) + W_k\), for all \(k \ge 1\), where \(W_k=Z_{n_k}\). Let \(\lambda _k\) be the smallest eigenvalue of \(\text{ Cov }(W_k)\). Since the eigenvalues of \(\text{ Cov }(W_k)\) are \(\left( \sigma _{\mu +k}^{(1)}\right) ^2, \ldots , \left( \sigma _{\mu +k}^{(d)}\right) ^2\), we have \(\displaystyle {\lambda _k = \min _{1 \le j \le d} \left( \sigma _{\mu +k}^{(j)} \right) ^2 \ge \sigma _{\small min}^2}\), and so, \(\inf _{k \ge 1} \lambda _k \ge \sigma _{\small min}^2 > 0\). Moreover, the conditional distribution of \(W_k\) given \(\sigma ({\mathcal {E}}_{(n_k)-1})\) is an elliptical distribution with conditional density given by
where \(\Psi (y)=e^{-y/2}\) and \(\gamma = (2\pi )^{-d/2}\). Again, \(\Psi (y)=e^{-y/2}\) is monotonically nonincreasing, and so, the conclusion follows from Proposition 2.6. \(\square \)
Rights and permissions
About this article
Cite this article
Regis, R.G. On the Convergence of Adaptive Stochastic Search Methods for Constrained and Multi-objective Black-Box Optimization. J Optim Theory Appl 170, 932–959 (2016). https://doi.org/10.1007/s10957-016-0977-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10957-016-0977-z
Keywords
- Constrained optimization
- Multi-objective optimization
- Random search
- Convergence
- Evolutionary programming