Population model-based optimization

Chen, Xi; Zhou, Enlu

doi:10.1007/s10898-015-0288-1

Population model-based optimization

Original Paper
Published: 12 March 2015

Volume 63, pages 125–148, (2015)
Cite this article

Journal of Global Optimization Aims and scope Submit manuscript

Xi Chen¹ &
Enlu Zhou²

495 Accesses
Explore all metrics

Abstract

Model-based optimization methods are a class of stochastic search methods that iteratively find candidate solutions by generating samples from a parameterized probabilistic model on the solution space. In order to better capture the multi-modality of the objective functions than the traditional model-based methods which use only a single model, we propose a framework of using a population of models at every iteration with an adaptive mechanism to propagate the population over iterations. The adaptive mechanism is derived from estimating the optimal parameter of the probabilistic model in a Bayesian manner, and thus provides a proper way to determine the diversity in the population of the models. We provide theoretical justification on the convergence of this framework by showing that the posterior distribution of the parameter asymptotically converges to a degenerate distribution concentrating on the optimal parameter. Under this framework, we develop two practical algorithms by incorporating sequential Monte Carlo methods, and carry out numerical experiments to illustrate their performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Particle swarm optimization algorithm: an overview

Article 17 January 2017

Dongshu Wang, Dapei Tan & Lei Liu

A practical guide to multi-objective reinforcement learning and planning

Article Open access 13 April 2022

Conor F. Hayes, Roxana Rădulescu, … Diederik M. Roijers

Machine Learning Optimization Techniques: A Survey, Classification, Challenges, and Future Research Issues

Article 29 March 2024

Kewei Bian & Rahul Priyadarshi

References

Ali, M.M., Khompatraporn, C., Zabinsky, Z.B.: A numerical evaluation of several stochastic algorithms on selected continuous global optimization test problems. J. Global Optim. 31, 635–672 (2005)
Article MathSciNet MATH Google Scholar
Azimi-Sadjadi, B., Krishnaprasad, P.S.: Approximate nonlinear filtering and its application in navigation. Automatica 41(6), 945–956 (2005)
Article MathSciNet MATH Google Scholar
Crisan, D., Doucet, A.: A survey of convergence results on particle filtering methods for practitioners. IEEE Trans. Signal Process. 50(3), 736–746 (2002)
Article MathSciNet Google Scholar
DeBoer, P.T., Kroese, D.P., Mannor, S., Rubinstein, R.Y.: A tutorial on the cross-entropy method. Ann. Oper. Res. 134, 19–67 (2005)
Article MathSciNet Google Scholar
Denny, M.: Introduction to importance sampling in rare-event simulations. Eur. J. Phys. 22(4), 403–411 (2001)
Article Google Scholar
Dorigo, M., Gambardella, L.M.: Ant colony system: a cooperative learning approach to the traveling salesman problem. IEEE Trans. Evol. Comput. 1, 53–66 (1997)
Article Google Scholar
Doucet, A., deFreitas, J.F.G., Gordon, N.J. (eds.): Sequential Monte Carlo Methods In Practice. Springer, New York (2001)
MATH Google Scholar
Glover, F.W.: Tabu search: a tutorial. Interfaces 20, 74–94 (1990)
Article Google Scholar
Goldberg, D.E.: Genetic Algorithms in Search, Optimizaion and Machine Learning. Addison-Wesley Longman Publishing Co., Inc., Boston, MA (1989)
Google Scholar
Hu, J., Chang, H.S., Fu, M.C., Marcus, S.I.: Dynamic sample budget allocation in model-based optimization. J. Global Optim. 50, 575–596 (2011)
Article MathSciNet MATH Google Scholar
Hu, J., Fu, M.C., Marcus, S.I.: A model reference adaptive search method for global optimization. Oper. Res. 55, 549–568 (2007)
Article MathSciNet MATH Google Scholar
Kirkpatrick, S., Gelatt, C.D., Vecchi Jr, M.P.: Optimization by simulated annealing. Science 220, 671–680 (1983)
Article MathSciNet MATH Google Scholar
Laguna, M., Marti, R.: Experimental testing of advanced scatter search designs for global optimization of multimodal functions. J. Global Optim. 33, 235–255 (2005)
Article MathSciNet MATH Google Scholar
Larranaga, P., Lozano, J.A.: Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation. Kluwer Academic Publishers, Boston, MA (2002)
Book Google Scholar
Liu, J., West, M.: Combined parameter and state estimation in simulation-based filtering. In: Doucet, A., de Freitas, J.F.G., Gordon, N.J. (eds.) Sequential Monte Carlo Methods in Practice. Springer, New York (2001)
Google Scholar
Molvalioglu, O., Zabinsky, Z.B., Kohn, W.: The interacting-particle algorithm with dynamic heating and cooling. J. Global Optim. 43, 329–356 (2009)
Article MathSciNet MATH Google Scholar
Rubinstein, R.Y., Kroese, D.P.: The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning. Springer, New York (2004)
Book Google Scholar
Shi, L., Ólafsson, S.: Nested partitions method for global optimization. Oper. Res. 48(3), 390–407 (2000)
Article MathSciNet MATH Google Scholar
Zhigljavsky, A.: Theory of Global Random Search. Kluwer, Netherlands (1991)
Book Google Scholar
Zhigljavsky, A., Zilinskas, A.: Stochastic Global Optimization. Springer, Berlin (2008)
MATH Google Scholar
Zhou, E., Fu, M.C., Marcus, S.I.: Solving continuous-state POMDPs via density projection. IEEE Trans. Autom. Control 55(5), 1101–1116 (2010)
Article MathSciNet Google Scholar
Zhou, E., Fu, M.C., Marcus, S.I.: Particle filtering framework for a class of randomized optimization algorithms. IEEE Trans. Autom. Control 59(4), 1025–1030 (2014)
Article MathSciNet Google Scholar
Zhou, E., Hu, J.: Gradient-based adaptive stochastic search for non-differentiable optimization. IEEE Trans. Autom. Control 59(7), 1818–1832 (2014)
Article MathSciNet Google Scholar
Zlochin, M., Birattari, M., Meuleau, N., Dorigo, M.: Model-based search for combinatorial optimization: a critical survey. Ann. Oper. Res. 131, 373–395 (2004)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

This work was supported by the National Science Foundation under Grant CMMI-1130273, and Air Force Office of Scientific Research under YIP Grant FA-9550-12-1-0250. The preliminary conference version of this paper, which presents the framework of PMO and part of the numerical results, is in the proceedings of the 2013 Winter Simulation Conference.

Author information

Authors and Affiliations

Department of Industrial and Enterprise Systems Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
Xi Chen
H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA, 30332, USA
Enlu Zhou

Authors

Xi Chen
View author publications
You can also search for this author in PubMed Google Scholar
Enlu Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Enlu Zhou.

Appendix

1.1 Proof of Lemma 1

Proof

Define $c_{k}\triangleq \mathbb {E}_{b_{k}}[H(X)]-\mathbb {E}_{\tilde{b}_{k}}[H(X)]$. First, we show that $\mathbb {E}_{b_k}[H(X)]\ge \mathbb {E}_{b_{k-1}}[H(X)]-c_{k-1}$. By (11), the posterior distribution $b_k(x,\theta )$ can be expressed by

$$\begin{aligned} b_k(x,\theta )\triangleq p(x,\theta |y_{1:k}) =\frac{\tilde{b}_{k-1}(x,\theta )\varphi (H(x)-y_k)}{\mathbb {E}_{\tilde{b}_{k-1}}\left[ \varphi (H(X)-y_k)\right] }. \end{aligned}$$

(17)

Then, the expectation of $H(X)$ with respect to $b_k(x,\theta )$ is represented by

$$\begin{aligned}&\mathbb {E}_{b_k}[H(X)]=\frac{\mathbb {E}_{\tilde{b}_{k-1}}\left[ H(X)\varphi (H(X)-y_k)\right] }{\mathbb {E}_{\tilde{b}_{k-1}}\left[ \varphi (H(X)-y_k)\right] }\\&\quad =\frac{\mathbb {E}_{\tilde{b}_{k-1}}\left[ (H(X)-E_{\tilde{b}_{k-1}}[H(X)])\varphi (H(X)-y_k)\right] +\mathbb {E}_{\tilde{b}_{k-1}}\left[ H(X)\right] \mathbb {E}_{\tilde{b}_{k-1}}\left[ \varphi (H(X)-y_k)\right] }{\mathbb {E}_{\tilde{b}_{k-1}}\left[ \varphi (H(X)-y_k)\right] }\\&\quad = \frac{\mathbb {E}_{\tilde{b}_{k-1}}\left[ (H(X)-E_{\tilde{b}_{k-1}}[H(X)])\varphi (H(X)-y_k)\right] }{\mathbb {E}_{\tilde{b}_{k-1}}\left[ \varphi (H(X)-y_k)\right] }+\mathbb {E}_{\tilde{b}_{k-1}}\left[ H(X)\right] \\&\quad \ge \mathbb {E}_{\tilde{b}_{k-1}}\left[ H(X)\right] , \end{aligned}$$

where the inequality follows from $\mathbb {E}_{\tilde{b}_{k-1}}\left[ (H(X)-E_{\tilde{b}_{k-1}}[H(X)])\varphi (H(X)-y_k)\right] \ge 0$, which can be proved as follows.

By Assumption 1, $\varphi (\cdot )$ is strictly increasing on its support. We have

$$\begin{aligned} \varphi (H(x)-y_k)-\varphi (\mathbb {E}_{\tilde{b}_{k-1}}[H(X)]-y_k)\le 0, \ \text {if}\ H(x)\le \mathbb {E}_{\tilde{b}_{k-1}}[H(X)], \end{aligned}$$

and

$$\begin{aligned} \varphi (H(x)-y_k)-\varphi (\mathbb {E}_{\tilde{b}_{k-1}}[H(X)]-y_k)> 0, \ \text {if}\ H(x)> \mathbb {E}_{\tilde{b}_{k-1}}[H(X)]. \end{aligned}$$

Then,

$$\begin{aligned}&\mathbb {E}_{\tilde{b}_{k-1}}\left[ (H(X)-E_{\tilde{b}_{k-1}}[H(X)])\varphi (H(X)-y_k)\right] \\&\quad =\mathbb {E}_{\tilde{b}_{k-1}}\left[ (H(X)-E_{\tilde{b}_{k-1}}[H(X)])(\varphi (H(X)-y_k)-\varphi (\mathbb {E}_{\tilde{b}_{k-1}}[H(X)]-y_k))\right] \\&\quad = \int _{H(x)\le \mathbb {E}_{\tilde{b}_{k-1}}[H(X)]}\int _{\Theta } (H(x)-E_{\tilde{b}_{k-1}}[H(X)])(\varphi (H(x)-y_k)\\&\qquad -\,\varphi (\mathbb {E}_{\tilde{b}_{k-1}}[H(X)]-y_k))\tilde{b}_{k-1}(x,\theta ) d\theta dx \\&\qquad +\,\int _{H(x)> \mathbb {E}_{\tilde{b}_{k-1}}[H(X)]}\int _{\Theta } (H(x)-E_{\tilde{b}_{k-1}}[H(X)])(\varphi (H(x)-y_k)\\&\qquad -\,\varphi (\mathbb {E}_{\tilde{b}_{k-1}}[H(X)]-y_k))\tilde{b}_{k-1}(x,\theta ) d\theta dx\\&\quad \ge 0. \end{aligned}$$

Therefore,

$$\begin{aligned} \mathbb {E}_{b_k}[H(X)]\ge \mathbb {E}_{b_{k-1}}[H(X)]-c_{k-1}. \end{aligned}$$

Then, we have

$$\begin{aligned} a_k \triangleq \mathbb {E}_{b_k}[H(X)]+\sum _{i=1}^{k-1}c_i \ge \mathbb {E}_{b_{k-1}}[H(X)]+\sum _{i=1}^{k-2}c_i=a_{k-1}. \end{aligned}$$

Thus, $\{a_k,\ k=1,2,\ldots \}$ is monotonically increasing. Moreover, $\{a_k\}$ is upper bounded, since for all $k\ge 1$, $a_k \le H^{u} + \sum _{i=1}^{k-1}c_i$ and

$$\begin{aligned} \sum _{i=1}^{k-1}c_{i}\le & {} \sum _{i=1}^{k-1}|c_{i}| \\\le & {} \int _{\mathcal {X}}{H(x) \sum _{i=1}^{k-1}|b_{i}(x) - \tilde{b}_{i}(x)|dx} \\\le & {} \int _{\mathcal {X}}{H(x) \sum _{i=1}^{\infty }|b_{i}(x) - \tilde{b}_{i}(x)|dx} < \infty , \end{aligned}$$

where the last inequality follows from Assumption 4 and the fact that $\mathcal {X}$ is compact. Since $\{a_k\}$ is monotonically increasing and upper bounded, $\lim _{k\rightarrow \infty }a_k$ exists. Using the dominated convergence theorem, we conclude that $\sum _{i=1}^{\infty }c_i$ exists and

$$\begin{aligned} \sum _{i=1}^{\infty }c_i= & {} \sum _{i=1}^{\infty } \int _{\mathcal {X}}H(x)(b_{i}(x) - \tilde{b}_{i}(x))dx \\= & {} \int _{\mathcal {X}}H(x)\sum _{i=1}^{\infty }(b_{i}(x) - \tilde{b}_{i}(x))dx < \infty . \end{aligned}$$

Therefore, the limit of the righthand side of $\mathbb {E}_{b_k}[H(X)] = a_k - \sum _{i=1}^{k-1}c_i$ exists, which implies that $\lim _{k\rightarrow \infty }\mathbb {E}_{b_k}[H(x)]$ exists. $\square $

1.2 Proof of Theorem 1

Proof

Since $y_k$ is monotonically increasing and upper bounded by $H^*$ and is updated only when $\gamma _k\ge y_{k-1}+\epsilon $, there exists $K<\infty $ such that $y_k=y_K$, $\forall k\ge K$. There are two cases need to consider: (i) $y_K=H^*$, and (ii) $y_K<H^*$.

(i) Case 1: $y_K=H^*$

By (17), we have

$$\begin{aligned} b_k(x)=\tilde{b}_{k-1}(x)\frac{\varphi (H(x)-y_k)}{\mathbb {E}_{\tilde{b}_{k-1}}[\varphi (H(X)-y_k)]}. \end{aligned}$$

(18)

Since $\varphi (\cdot )$ has support on $[0,H^u-H^l]$, we have $\varphi (H(X)-y_K)=0$ if $H(x)<H^*$, which trivially gives us

$$\begin{aligned} b_k(x)=0, \ \ \forall x\ne x^*, \ \forall k\ge K. \end{aligned}$$

Thus,

$$\begin{aligned} \mathbb {E}_{b_k}[H(X)]=H^*, \ \forall k\ge K, \end{aligned}$$

which completes the proof of case (i).

(ii) Case 2: $y_K<H^*$

By lemma 1, the sequence $\{\mathbb {E}_{b_k}[H(X)],k=1,2,\dots \}$ converges. Suppose $\lim _{k\rightarrow \infty }\mathbb {E}_{b_k}[H(X)]=H_*$, and we will prove $H_*=H^*$ by contradiction.

We define the set $\mathcal {A}$ as

$$\begin{aligned} \mathcal {A}=\left\{ x\in \mathcal {X}:H(x)\ge H_*\right\} . \end{aligned}$$

For any fixed $x\in \mathcal {A}$ and any finite $i$, since $\varphi (H(X)-y_i)>0$ and by Assumption 3 $\tilde{b}_i(x)>0$, we have $b_i(x)>0$ and $\mathbb {E}_{\tilde{b}_{i-1}}[\varphi (H(X)-y_i)]>0$. Therefore, by induction we may represent (18) by

$$\begin{aligned} b_k(x)=b_{1}(x)\prod ^k_{i=2}\frac{\tilde{b}_{i-1}(x)}{b_{i-1}(x)}\frac{\varphi (H(x)-y_i)}{\mathbb {E}_{\tilde{b}_{i-1}}[\varphi (H(X)-y_i)]}, \quad \forall x\in \mathcal {A}, \end{aligned}$$

(19)

and from Assumption 4, we have $\lim _{i\rightarrow \infty }\frac{\tilde{b}_i(x)}{b_i(x)}=1$, almost everywhere in $\mathcal {A}$.

Hence, almost everywhere in $\mathcal {A}$,

$$\begin{aligned}&\lim _{i\rightarrow \infty }\frac{\tilde{b}_{i-1}(x)}{b_{i-1}(x)}\frac{\varphi (H(x)-y_i)}{\mathbb {E}_{\tilde{b}_{i-1}}[\varphi (H(X)-y_i)]}\\&\quad =\lim _{i\rightarrow \infty }\frac{\tilde{b}_{i-1}(x)}{b_{i-1}(x)}\lim _{i\rightarrow \infty }\frac{\varphi (H(x)-y_i)}{\mathbb {E}_{\tilde{b}_{i-1}}[\varphi (H(X)-y_i)]}\\&\quad =\frac{\lim _{i\rightarrow \infty }\varphi (H(x)-y_i)}{\lim _{i\rightarrow \infty }\mathbb {E}_{\tilde{b}_{i-1}}[\varphi (H(X)-y_i)]}\\&\quad =\frac{\varphi (H(x)-y_K)}{\lim _{i\rightarrow \infty }\mathbb {E}_{b_{i-1}}[\varphi (H(X)-y_i)]} \end{aligned}$$

where the last equality is because of the continuity of $\varphi (\cdot )$ under Assumption 1, and $\lim _{i\rightarrow \infty }\mathbb {E}_{\tilde{b}_{i-1}}[\varphi (H(X)-y_i)]=\lim _{i\rightarrow \infty }\mathbb {E}_{b_{i-1}}[\varphi (H(X)-y_i)]$ is yielded by bounded convergence theorem and Assumption 4.

Suppose $\lim _{k\rightarrow \infty }\mathbb {E}_{b_k}[H(X)]=H_*<H^*$, a trivial contradiction leads to

$$\begin{aligned} C\triangleq \lim _{k\rightarrow \infty }\int _{\{x:H(x)\le H_*\}}b_k(x)dx>0. \end{aligned}$$

We can write

$$\begin{aligned} \mathbb {E}_{b_{i-1}}[\varphi (H(X)-y_i)]= & {} \int _{\{x:H(x)\le H_*\}}\varphi (H(X)-y_i)b_k(x)dx\\&+\,\int _{\{x:H(x)> H_*\}}\varphi (H(X)-y_i)b_k(x)dx\\\le & {} \varphi (H_*-y_i)\int _{\{x:H(x)\le H_*\}}b_k(x)dx\\&+\,\varphi (H^*-y_i)\int _{\{x:H(x)> H_*\}}b_k(x)dx. \end{aligned}$$

Taking limit on both sides of the inequality, by the continuity of $\varphi (\cdot )$ we have

$$\begin{aligned} \lim _{k\rightarrow \infty }\mathbb {E}_{b_{i-1}}[\varphi (H(X)-y_i)]\le & {} \varphi (H_*-y_K)\lim _{k\rightarrow \infty }\int _{\{x:H(x)\le H_*\}}b_k(x)dx\\&+\,\varphi (H^*-y_K)\lim _{k\rightarrow \infty }\int _{\{x:H(x)> H_*\}}b_k(x)dx\\= & {} \varphi (H_*-y_K)C+\varphi (H^*-y_K)(1-C). \end{aligned}$$

We define the set $\mathcal {B}$ as

$$\begin{aligned} \mathcal {B}=\left\{ x\in \mathcal {A}:\varphi (H_*-y_K)C+\varphi (H^*-y_K)(1-C)<\varphi (H(x)-y_K)\right\} . \end{aligned}$$

Since $C>0$ and $\varphi (\cdot )$ is strictly increasing, $\varphi (H_*-y_K)C+\varphi (H^*-y_K)(1-C)<\varphi (H^*-y_K)$. Thus, $\mathcal {B}$ has a strict positive Lebesgue measure by Assumption 2.

Hence, almost everywhere in $\mathcal {B}$,

$$\begin{aligned} \lim _{i\rightarrow \infty }\frac{\tilde{b}_{i-1}(x)}{b_{i-1}(x)}\frac{\varphi (H(x)-y_i)}{\mathbb {E}_{\tilde{b}_{i-1}}[\varphi (H(X)-y_i)]} \ge \frac{\varphi (H(x)-y_K)}{\varphi (H_*-y_K)C+\varphi (H^*-y_K)(1-C)}>1, \end{aligned}$$

by the definition of $\mathcal {B}$. From the inequality above and (19), we have

$$\begin{aligned} \lim _{k\rightarrow \infty }b_k(x)=\infty , \ \text {almost everywhere in} \ \mathcal {B}. \end{aligned}$$

By Fatou’s lemma and the positive Lebesgue measure of $\mathcal {B}$, we have

$$\begin{aligned} \lim _{k\rightarrow \infty }\inf \int _{\mathcal {B}}b_k(x)dx\ge \int _{\mathcal {B}}\lim _{k\rightarrow \infty }\inf b_k(x)dx=\infty , \end{aligned}$$

which contradicts to the fact that

$$\begin{aligned} \lim _{k\rightarrow \infty }\inf \int _{\mathcal {B}}b_k(x)dx\le \lim _{k\rightarrow \infty }\inf \int _{\mathcal {X}}b_k(x)dx=1. \end{aligned}$$

Therefore, we conclude that $H_*=H^*$, and $\lim _{k\rightarrow \infty }\mathbb {E}_{b_k}[H(X)]=H^*$. $\square $

1.3 Proof of Theorem 2

Proof

We prove this theorem by showing that Assumption 4 is satisfied under Assumptions 5–7.

By (8) and (10), we have

$$\begin{aligned} \tilde{b}_k(x,\theta )=p(x|\theta ,y_{1:k})\int _{\Theta }p(\theta |\theta _k,y_{1:k})b_k(\theta _k)d\theta _k. \end{aligned}$$

(20)

For any fixed $\theta \in \Theta $, let $S_k^m(\theta )=\left\{ \theta _k \in \Theta : |\theta _k^i-\theta ^i|<\delta _k, i=1,\ldots ,m \right\} $, where $\theta ^i$ denotes the i-th element of $\theta $, $m$ is the dimension of $\theta $. Let $V(S_k^m(\theta ))$ denote the volume of $S_k^m(\theta )$. By Assumption 5, the artificial noise $\Gamma _k$ is uniformly distributed on $[-\delta _k,\delta _k]^m$, then the p.d.f. $p(\theta |\theta _k,y_{1:k})$ is

$$\begin{aligned} p(\theta |\theta _k,y_{1:k})=\frac{\mathbb {I}_{\{\theta _k\in S_k^m(\theta )\}}}{V(S_k^m(\theta ))}. \end{aligned}$$

(21)

Plugging (21) into (20), we have

$$\begin{aligned} \tilde{b}_k(x,\theta )=p(x|\theta ,y_{1:k})\int _{S_k^m(\theta )}\frac{1}{V(S_k^m(\theta ))}b_k(\theta _k)d\theta _k. \end{aligned}$$

The joint posterior p.d.f. $b_k(x,\theta )$ can be represented by

$$\begin{aligned} b_k(x,\theta )=p(x|\theta ,y_{1:k})b_k(\theta ). \end{aligned}$$

Then,

$$\begin{aligned} |b_k(x,\theta )-\tilde{b}_k(x,\theta )|= & {} p(x|\theta ,y_{1:k})\left| b_k(\theta )-\int _{S_k^m(\theta )}\frac{1}{V(S_k^m(\theta ))}b_k(\theta _k)d\theta _k\right| \\\le & {} \frac{p(x|\theta ,y_{1:k})}{V(S_k^m(\theta ))}\int _{S_k^m(\theta )}\left| b_k(\theta )-b_k(\theta _k)\right| d\theta _k. \end{aligned}$$

By Assumption 6, $b_k(\theta )$ is continuous on $S_k^m(\theta )$ and differentiable on the open set $\{\theta _k \in \Theta :|\theta _k^i-\theta ^i|<\delta _k,\ i=1,\ldots ,m\}$. By the mean value theorem, $\exists \ \xi \in S_k^m(\theta )$, such that

$$\begin{aligned} \left| b_k(\theta )-b_k(\theta _k)\right| \le \left\| \nabla _\theta b_k(\xi )\right\| _2\left\| \theta -\theta _k\right\| _2\le \left\| \nabla _\theta b_k(\xi )\right\| m\delta _k, \end{aligned}$$

where $\Vert \cdot \Vert _2$ denotes the Euclidean norm and $\Vert \cdot \Vert $ denotes the maximum norm.

Define

$$\begin{aligned} d_k=\sup _{\xi \in \Theta }\left( \left\| \nabla _\theta b_k(\xi )\right\| \right) m\delta _k, \end{aligned}$$

and $V(\Theta )$ is the volume of $\Theta $, which is bounded. Now, since

$$\begin{aligned} |b_k(x)-\tilde{b}_k(x)|\le \int \,_{\Theta }|b_k(x;\theta )-\tilde{b}_k(x;\theta )|d\theta , \end{aligned}$$

to prove $\sum _{k=1}^{\infty }|b_k(x)-\tilde{b}_k(x)|<\infty $ almost everywhere in $\mathcal {X}$, it is sufficient to show $\sum _{k=1}^{\infty } d_k <\infty $.

By (6) and (17), we have

$$\begin{aligned} b_k(\theta )=\frac{\int _{\mathcal {X}}\tilde{b}_{k-1}(x,\theta )\varphi (H(x)-y_k)dx}{\mathbb {E}_{\tilde{b}_{k-1}}[\varphi (H(X)-y_k)]}. \end{aligned}$$

The gradient of $b_k(\theta )$ is

$$\begin{aligned} \nabla _\theta b_k(\theta )= & {} \frac{\nabla _\theta \int _{\mathcal {X}}\tilde{b}_{k-1}(x,\theta )\varphi (H(x)-y_k)dx}{\mathbb {E}_{\tilde{b}_{k-1}}[\varphi (H(X)-y_k)]}\\= & {} \frac{\int _{\mathcal {X}}\nabla _\theta \tilde{b}_{k-1}(x,\theta )\varphi (H(x)-y_k)dx}{\mathbb {E}_{\tilde{b}_{k-1}}[\varphi (H(X)-y_k)]}. \end{aligned}$$

Since there exists $K<\infty $ such that $y_k=y_K$, $\forall k\ge K$, the gradient of $b_k(\theta )$ is upper bounded by

$\forall k\ge K,$

$$\begin{aligned} \nonumber \nabla _\theta b_k(\theta )\le & {} \frac{\varphi (H^u-y_K)}{\varphi (0)}\int _{\mathcal {X}}\nabla _\theta \tilde{b}_{k-1}(x,\theta )dx\\= & {} \frac{\varphi (H^u-y_K)}{\varphi (0)}\nabla _\theta \tilde{b}_{k-1}(\theta ), \end{aligned}$$

(22)

$\forall k< K,$

$$\begin{aligned} \nonumber \nabla _\theta b_k(\theta )\le & {} \frac{\varphi (H^u-H^l)}{\varphi (0)}\int _{\mathcal {X}}\nabla _\theta \tilde{b}_{k-1}(x,\theta )dx\\= & {} \frac{\varphi (H^u-H^l)}{\varphi (0)}\nabla _\theta \tilde{b}_{k-1}(\theta ), \end{aligned}$$

(23)

where the inequalities are because of $\varphi (0)\le \varphi (H(x)-y_k)\le \varphi (H^u-y_K)$, $\forall k\ge K$, and $\varphi (0)\le \varphi (H(x)-y_k)\le \varphi (H^u-H^l)$, $\forall k< K$. Taking the maximum norm on both sides of (22) and (23), we have the following inequalities

$\forall k\ge K,$

$$\begin{aligned} \Vert \nabla _\theta b_{k}(\theta )\Vert \le \frac{\varphi (H^u-y_K)}{\varphi (0)}\Vert \nabla _\theta \tilde{b}_{k-1}(\theta )\Vert , \end{aligned}$$

(24)

$\forall k< K,$

$$\begin{aligned} \Vert \nabla _\theta b_{k}(\theta )\Vert \le \frac{\varphi (H^u-H^l)}{\varphi (0)}\Vert \nabla _\theta \tilde{b}_{k-1}(\theta )\Vert . \end{aligned}$$

(25)

Next, we prove $\exists \ \eta _k\in \Theta $, such that $\Vert \nabla _\theta \tilde{b}_{k}(\theta )\Vert \le \Vert \nabla _\theta b_{k}(\eta _k)\Vert $, where $\eta _k$ is dependent on $\theta $.

Let $\vec {\varepsilon }^i=(0,0,\ldots ,0,\varepsilon ,0,\ldots ,0)$, where the i-th element of $\vec {\varepsilon }^i$ is $\varepsilon $ and other elements are 0. We denote $\theta =(\theta ^1,\theta ^2,\ldots ,\theta ^m)$, and $\bar{\theta }^i=(\theta ^1,\theta ^2,\ldots ,\theta ^{i-1},\theta ^{i+1},\ldots ,\theta ^m)$. With the definition of $\bar{\theta }^i$, $\tilde{b}_k(\theta )$ can be alternatively represented by

$$\begin{aligned} \tilde{b}_k(\theta )=\int _{S_k^m(\theta )}\frac{b_k(\theta _k)}{V(S_k^m(\theta ))}d\theta _k=\int _{\theta ^i-\delta _k}^{\theta ^i+\delta _k}\int _{S_k^{m-1}(\bar{\theta }^i)}\frac{b_k(\theta _k^i,\bar{\theta }_k^i)}{V(S_k^m(\theta ))}d\bar{\theta }_k^id\theta _k^i, \end{aligned}$$

where $S_k^{m-1}(\bar{\theta }^i)=\{\bar{\theta }^i_k\in \Theta : |\theta ^i_k-\theta ^i|<\delta _k, i=1,\ldots ,i-1,i+1,\ldots ,m\}$. Then,

$$\begin{aligned} |\tilde{b}_k(\theta +\vec {\varepsilon }^i)-\tilde{b}_k(\theta )|= & {} \left| \int _{\theta ^i-\delta _k+\varepsilon }^{\theta ^i+\delta _k+\varepsilon }\int _{S_k^{m-1}(\bar{\theta }^i)}\frac{b_k(\theta _k^i,\bar{\theta }_k^i)}{V(S_k^m(\theta ))}d\theta _k-\int _{\theta ^i-\delta _k}^{\theta ^i+\delta _k}\int _{S_k^{m-1}(\bar{\theta }^i)}\frac{b_k(\theta _k^i,\bar{\theta }_k^i)}{V(S_k^m(\theta ))}d\theta _k \right| \\= & {} \left| \int _{\theta ^i+\delta _k}^{\theta ^i+\delta _k+\varepsilon }\int _{S_k^{m-1}(\bar{\theta }^i)}\frac{b_k(\theta _k^i,\bar{\theta }_k^i)}{V(S_k^m(\theta ))}d\theta _k-\int _{\theta ^i-\delta _k}^{\theta ^i-\delta _k+\varepsilon }\int _{S_k^{m-1}(\bar{\theta }^i)}\frac{b_k(\theta _k^i,\bar{\theta }_k^i)}{V(S_k^m(\theta ))}d\theta _k \right| \\= & {} \left| \int _{\theta ^i}^{\theta ^i+\varepsilon }\int _{S_k^{m-1}(\bar{\theta }^i)}\frac{b_k(\theta _k^i+\delta _k,\bar{\theta }_k^i)-b_k(\theta _k^i-\delta _k,\bar{\theta }_k^i)}{V(S_k^m(\theta ))}d\theta _k \right| \\\le & {} \int _{\theta ^i}^{\theta ^i+\varepsilon }\int _{S_k^{m-1}(\bar{\theta }^i)}\frac{\left| b_k(\theta _k^i+\delta _k,\bar{\theta }_k^i)-b_k(\theta _k^i-\delta _k,\bar{\theta }_k^i)\right| }{V(S_k^m(\theta ))}d\theta _k. \end{aligned}$$

Because $S_k(\theta )$ is compact, $\exists \ t\in S_k(\theta )$, such that $\forall \theta _k\in S_k(\theta )$, we have

$$\begin{aligned} |b_k(\theta _k^i+\delta _k,\bar{\theta }_k^i)-b_k(\theta _k^i-\delta _k,\bar{\theta }_k^i)|\le |b_k(t^i+\delta _k,\bar{t}^i)-b_k(t^i-\delta _k,\bar{t}^i)|. \end{aligned}$$

Thus,

$$\begin{aligned} |\tilde{b}_k(\theta +\vec {\varepsilon }^i)-\tilde{b}_k(\theta )|\le \int _{\theta ^i}^{\theta ^i+\varepsilon }\int _{S_k^{m-1}(\bar{\theta }^i)}\frac{|b_k(t^i+\delta _k,\bar{t}^i)-b_k(t^i-\delta _k,\bar{t}^i)|}{V(S_k^m(\theta ))}d\theta _k. \end{aligned}$$

By mean value theorem, $\exists \ \tau \in \Theta $, such that

$$\begin{aligned} |b_k(t^i+\delta _k,\bar{t}^i)-b_k(t^i-\delta _k,\bar{t}^i)|=\left| \frac{\partial b_k}{\partial \theta ^i}(\tau )\right| 2\delta _k. \end{aligned}$$

Thus,

$$\begin{aligned} |\tilde{b}_k(\theta +\vec {\varepsilon }^i)-\tilde{b}_k(\theta )|\le \frac{\varepsilon }{2\delta _k}\left| \frac{\partial b_k}{\partial \theta ^i}(\tau )\right| 2\delta _k=\varepsilon \left| \frac{\partial b_k}{\partial \theta ^i}(\tau )\right| . \end{aligned}$$

By the definition of derivative, we have

$$\begin{aligned} \left| \frac{\partial \tilde{b}_k(\theta )}{\partial \theta ^i}\right| =\lim _{\varepsilon \rightarrow 0}\frac{|\tilde{b}_k(\theta +\vec {\varepsilon }^i)-\tilde{b}_k(\theta )|}{\varepsilon }\le \left| \frac{\partial b_k}{\partial \theta ^i}(\tau )\right| . \end{aligned}$$

It is easy to observe from the above inequality that $\exists \eta _k \in \Theta $, such that

$$\begin{aligned} \Vert \nabla _\theta \tilde{b}_k(\theta )\Vert \le \Vert \nabla _\theta b_k(\eta _k)\Vert . \end{aligned}$$

(26)

By (24)–(26), we may bound $\Vert \nabla _\theta b_k(\theta )\Vert $ in terms of $\Vert \nabla _\theta b_{k-1}(\cdot )\Vert $. Therefore, $\exists \ \eta _{k-1}\in \Theta $, such that

$$\begin{aligned} \Vert \nabla _\theta b_k(\theta )\Vert \le \frac{\varphi (H^u-y_K)}{\varphi (0)}\Vert \nabla _\theta \tilde{b}_{k-1}(\theta )\Vert \le \frac{\varphi (H^u-y_K)}{\varphi (0)}\Vert \nabla _\theta b_{k-1}(\eta _{k-1})\Vert ,\quad \forall k\ge K, \end{aligned}$$

and

$$\begin{aligned} \Vert \nabla _\theta b_k(\theta )\Vert \le \frac{\varphi (H^u-H^l)}{\varphi (0)}\Vert \nabla _\theta \tilde{b}_{k-1}(\theta )\Vert \le \frac{\varphi (H^u-H^l)}{\varphi (0)}\Vert \nabla _\theta b_{k-1}(\eta _{k-1})\Vert ,\quad \forall k< K. \end{aligned}$$

By induction, we have

$$\begin{aligned} \Vert \nabla _\theta b_k(\theta )\Vert \le \left( \frac{\varphi (H^u-y_K)}{\varphi (0)}\right) ^{k-K}\left( \frac{\varphi (H^u-H^l)}{\varphi (0)}\right) ^{K}\Vert \nabla _\theta b_0(\eta _0)\Vert ,\quad \forall k\ge K. \end{aligned}$$

By Assumption 7, we have $\Vert \nabla _\theta b_0(\theta )\Vert \le A$; hence the upper bound of $d_k$ is

$$\begin{aligned} d_k\le A\left( \frac{\varphi (H^u-y_K)}{\varphi (0)}\right) ^{k-K}\left( \frac{\varphi (H^u-H^l)}{\varphi (0)}\right) ^{K}m\delta _k, \quad \forall k\ge K. \end{aligned}$$

If $\delta _k=\delta \alpha ^k$ and $\alpha <\frac{\varphi (0)}{\varphi (H^u-y_K)}$, we have $\sum _{k=1}^\infty d_k<\infty $, which implies that

$$\begin{aligned} \sum _{k=1}^{\infty }|b_k(x)-\tilde{b}_k(x)|\le \sum _{k=1}^{\infty }\int _{\Theta }|b_k(x;\theta )-\tilde{b}_k(x;\theta )|d\theta <\infty , \end{aligned}$$

as $k$ goes to infinity. Therefore, $\sum _{k=1}^{\infty }|b_k(x)-\tilde{b}_k(x)|<\infty $ almost everywhere in $\mathcal {X}$, which is Assumption 4. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, X., Zhou, E. Population model-based optimization. J Glob Optim 63, 125–148 (2015). https://doi.org/10.1007/s10898-015-0288-1

Download citation

Received: 27 December 2013
Accepted: 03 March 2015
Published: 12 March 2015
Issue Date: September 2015
DOI: https://doi.org/10.1007/s10898-015-0288-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Population model-based optimization

Abstract

Access this article

Similar content being viewed by others

Particle swarm optimization algorithm: an overview

A practical guide to multi-objective reinforcement learning and planning

Machine Learning Optimization Techniques: A Survey, Classification, Challenges, and Future Research Issues

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

1.1 Proof of Lemma 1

Proof

1.2 Proof of Theorem 1

Proof

1.3 Proof of Theorem 2

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Abstract

Access this article

Similar content being viewed by others

Particle swarm optimization algorithm: an overview

A practical guide to multi-objective reinforcement learning and planning

Machine Learning Optimization Techniques: A Survey, Classification, Challenges, and Future Research Issues

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

1.1 Proof of Lemma 1

Proof

1.2 Proof of Theorem 1

Proof

1.3 Proof of Theorem 2

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation