Abstract
A common approach to global optimization is to combine local optimization methods with random restarts. Restarts have been used as a performance boosting approach. They can be a means to avoid “slow progress” by exploiting a potentially good solution, and restarts can enable the potential discovery of multiple local solutions, thus improving the overall quality of the returned solution. A multi-start method is a way to integrate local and global approaches; where the global search itself can be used to restart a local search. Bayesian optimization methods aim to find global optima of functions that can only be point-wise evaluated by means of a possibly expensive oracle. We propose the stochastic optimization with adaptive restart (SOAR) framework, that uses the predictive capability of Gaussian process models as a means to adaptively restart local search and intelligently select restart locations with current information. This approach attempts to balance exploitation with exploration of the solution space. We study the asymptotic convergence of SOAR to a global optimum, and empirically evaluate SOAR performance through a specific implementation that uses the Trust Region method as the local search component. Numerical experiments show that the proposed algorithm outperforms existing methodologies over a suite of test problems of varying problem dimension with a finite budget of function evaluations.




Similar content being viewed by others
References
Ankenman, B., Nelson, B.L., Staum, J.: Stochastic Kriging for simulation metamodeling. Oper. Res. 58(2), 371–382 (2010)
Atkinson, A.: A segmented algorithm for simulated annealing. Stat. Comput. 31, 635–672 (1992)
Betrò, B., Schoen, F.: Sequential stopping rules for the multistart algorithm in global optimisation. Math. Prog. 38(3), 271–286 (1987)
Betrò, B., Schoen, F.: A stochastic technique for global optimization. Comput. Math. Appl. 21(6–7), 127–133 (1991)
Betrò, B., Schoen, F.: Optimal and sub-optimal stopping rules for the multistart algorithm in global optimization. Math. Prog. 57(1–3), 445–458 (1992)
Bouhlel, M.A., Bartoli, N., Regis, R.G., Otsmane, A., Morlier, J.: Efficient global optimization for high-dimensional constrained problems by using the Kriging models combined with the partial least squares method. Eng. Optim. 50(12), 2038–2053 (2018)
Bouhmala, N.: Combining simulated annealing with local search heuristic for max-sat. J. Heur. 25(1), 47–69 (2019)
Calvin, J., Žilinskas, A.: On the convergence of the p-algorithm for one-dimensional global optimization of smooth functions. J. Optim. Theory Appl. 102(3), 479–495 (1999)
Chang, K.H., Hong, L.J., Wan, H.: Stochastic trust-region response-surface method (STRONG): a new response-surface framework for simulation optimization. INFORMS J. Comput. 25(2), 230–243 (2013)
Chen, C.H.: Stochastic Simulation Optimization: An Optimal Computing Budget Allocation, vol. 1. World Scientific, Singapore (2010)
Conn, A.R., Gould, N.I., Toint, P.L.: Trust Region Methods. SIAM, Philadelphia (2000)
Efron, B., Tibshirani, R.: Improvements on cross-validation: the 632+ bootstrap method. J. Am. Stat. Assoc. 92(438), 548–560 (1997)
Fu, M.C.: Handbook of Simulation Optimization, vol. 216. Springer, Berlin (2015)
Glidewell, M., Ng, K., Hensel, E.: A combinatorial optimization approach as a pre-processor for impedance tomography. In: Proceedings of the Annual Conference of the IEEE/Engineering in Medicine and Biology Society (1991)
Hart, W.E.: Sequential stopping rules for random optimization methods with applications to multistart local search. SIAM J. Optim. 9(1), 270–290 (1998)
Hu, X., Shonkwiler, R., Spruill, M.: Random restarts in global optimization. Technical represent, Georgia Institute of Technology (1994)
Jones, D.R., Schonlau, M., Welch, W.J.: Efficient global optimization of expensive black-box functions. J. Global Optim. 13(4), 455–492 (1998)
Krityakierne, T., Shoemaker, C.A.: SOMS: surrogate multistart algorithm for use with nonlinear programming for global optimization. Int. Trans. Oper. Res. 24(5), 1139–1172 (2017)
Lagaris, I.E., Tsoulos, I.G.: Stopping rules for box-constrained stochastic global optimization. Appl. Math. Comput. 197(2), 622–632 (2008)
Li, H., Lim, A.: A meta-heruistic for the pickup and delivery problem with time windows. In: Proceedings of the 13th IEEE International Conference on Tools with Artificial Intelligence, pp 160–167 (2001)
Locatelli, M.: Bayesian algorithms for one-dimensional global optimization. J. Global Optim. 10(1), 57–76 (1997)
Locatelli, M.: A note on the Griewank test function. J. Global Optim. 25(2), 169–174 (2003)
Locatelli, M., Schoen, F.: Global optimization based on local searches. Ann. Oper. Res. 240(1), 251–270 (2016)
Luby, M., Sinclair, A., Zuckerman, D.: Optimal speedup of Las Vegas algorithms. Inf. Process. Lett. 47(4), 173–180 (1993)
Luersen, M.A., Le Riche, R.: Globalized nelder-mead method for engineering optimization. Comput. Struct. 82(23), 2251–2260 (2004)
Mahinthakumar, G., Sayeed, M.: Hybrid genetic algorithm–local search methods for solving groundwater source identification inverse problems. J. Water Resour. Plan. Manag. 131(1), 45–57 (2005)
Martí, R., Lozano, J.A., Mendiburu, A., Hernando, L.: Multi-start methods. Handbook of Heuristics pp 1–21 (2016)
Martin, O.C., Otto, S.W.: Combining simulated annealing with local search heuristics. Ann. Oper. Res. 63(1), 57–75 (1996)
Mathesen, L., Pedrielli, G., Ng, S.H.: Trust region based stochastic optimization with adaptive restart: a family of global optimization algorithms. In: 2017 Winter Simulation Conference (WSC), pp 2104–2115 (2017). https://doi.org/10.1109/WSC.2017.8247943
Müller, J., Day, M.: Surrogate optimization of computationally expensive black-box problems with hidden constraints. INFORMS J. Comput. 31(4), 689–702 (2019)
Murphy, M., Baker, E.: GLO: Global local optimizer. LLNL unclassified code 960007 (1995)
Neumann, F., Witt, C.: Runtime analysis of a simple ant colony optimization algorithm. Algorithmica 54(2), 243 (2009)
Nocedal, J., Wright, S.J.: Trust-region methods. Numerical Optimization, pp 66–100 (2006)
O’Donoghue, B., Candès, E.: Adaptive restart for accelerated gradient schemes. Found. Comput. Math. 15(3), 715–732 (2015)
Ohsaki, M., Yamakawa, M.: Stopping rule of multi-start local search for structural optimization. Struct. Multidiscip. Optim. 57(2), 595–603 (2018)
Okamoto, M., Nonaka, T., Ochiai, S., Tominaga, D.: Nonlinear numerical optimization with use of a hybrid genetic algorithm incorporating the modified Powell method. Appl. Math. Comput. 91(1), 63–72 (1998)
Pardalos, P.M., Romeijn, H.E.: Handbook of Global Optimization, vol. 2. Springer, Berlin (2013)
Peri, D., Tinti, F.: A multistart gradient-based algorithm with surrogate model for global optimization. Commun. Appl. Ind. Math. 3(1), 393 (2012)
Ranjan, P., Haynes, R., Karsten, R.: A computationally stable approach to Gaussian process interpolation of deterministic computer simulation data. Technometrics 53(4), 366–378 (2011)
Regis, R.G., Shoemaker, C.A.: Improved strategies for radial basis function methods for global optimization. J. Global Optim. 37(1), 113–135 (2007)
Regis, R.G., Shoemaker, C.A.: Parallel radial basis function methods for the global optimization of expensive functions. Eur. J. Oper. Res. 182(2), 514–535 (2007)
Regis, R.G., Shoemaker, C.A.: Parallel stochastic global optimization using radial basis functions. INFORMS J. Comput. 21(3), 411–426 (2009)
Regis, R.G., Shoemaker, C.A.: A quasi-multistart framework for global optimization of expensive functions using response surface models. J. Global Optim. 56(4), 1719–1753 (2013)
Santner, T.J., Williams, B.J., Notz, W.I.: The Design and Analysis of Computer Experiments. Springer, Berlin (2013)
Schoen, F.: Stochastic techniques for global optimization: a survey of recent advances. J. Global Optim. 1(3), 207–228 (1991)
Schoen, F.: Two-phase methods for global optimization. In: Handbook of Global Optimization. Springer, pp 151–177 (2002)
Schoen, F.: Two-phase methods for global optimization. In: Pardalos, P., Romeijn, H. (eds.) Handbook of Global Optimization, vol. 2, pp. 151–177. Kluwer Academic Publishers, Dordrecht (2015)
Shang, Y., Wan, Y., Fromherz, M.P., Crawford, L.S.: Toward adaptive cooperation between global and local solvers for continuous constraint problems. In: Proceedings of the CP’01 Workshop on Cooperative Solvers in Constraint Programming (2001)
Spall, J.C.: Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control, vol. 65. Wiley, New York (2005)
Spall, J.C.: Stochastic optimization. In: Handbook of Computational Statistics. Springer, pp 173–201 (2012)
Theodosopoulos, T.: Some remarks on the optimal level of randomization in global optimization. (2004) arXiv preprint arXiv:math/0406095
Torii, A.J., Lopez, R.H., Luersen, M.A.: A local-restart coupled strategy for simultaneous sizing and geometry truss optimization. Latin Am. J. Solids Struct. 8(3), 335–349 (2011)
Van Harmelen, F., Lifschitz, V., Porter, B.: Handbook of Knowledge Representation, vol. 1. Elsevier, London (2008)
Vehtari, A., Gelman, A., Gabry, J.: Practical Bayesian model evaluation using leave-one-out cross-validation and waic. Stat. Comput. 27(5), 1413–1432 (2017)
Voglis, C., Lagaris, I.E.: Towards ideal multistart: a stochastic approach for locating the minima of a continuous function inside a bounded domain. Appl. Math. Comput. 213(1), 216–229 (2009)
Yang, X.S.: Nature-Inspired Optimization Algorithms. Elsevier, London (2014)
Zabinsky, Z.B.: Stochastic Adaptive Search for Global Optimization. Kluwer Academic Publishers, Berlin (2003)
Zabinsky, Z.B.: Stochastic search methods for global optimization. In: Wiley Encyclopedia of Operations Research and Management Science. Wiley (2011)
Zabinsky, Z.B., Bulger, D., Khompatraporn, C.: Stopping and restarting strategy for stochastic sequential search in global optimization. J. Global Optim. 46(2), 273–286 (2010)
Zafar, A., Ghafoor, U., Yaqub, M.A., Hong, K.: Determination of the parameters in the designed hemodynamic response function using Nelder-Mead algorithm. In: 2018 18th International Conference on Control, Automation and Systems (ICCAS), pp 1135–1140 (2018)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: details of Gaussian processes
The basic idea behind meta-models such as Gaussian processes is that any smooth function \(f(\textit{\textbf{x}})\) can be interpreted as a realization from a stationary Gaussian process. Given n points in \( \mathbb {X}\), \(\mathbb {S}=\{\textit{\textbf{x}}_1, \ldots , \textit{\textbf{x}}_n\}\), and associated function values, \(f(\textit{\textbf{x}}_i\)), for \(i=1, \ldots , n\), we construct a stationary Gaussian process \(F(\textit{\textbf{x}})=\mu + Z(\textit{\textbf{x}})\) around the smooth function \(f(\textit{\textbf{x}})\) using the n observed points. Here, \(\mu \) is the constant mean (more complex models can be considered [44]), and \(Z(\textit{\textbf{x}})\) is the Gaussian process \(Z(\textit{\textbf{x}})\sim GP(0,\tau ^2\mathbf {R})\) with spatial correlation matrix \(\mathbf {R}\) and overall process variance \(\tau ^2\).
We adopt the Gaussian correlation function such that \(R_{ij} = \prod _{l=1}^d e^{-(\theta _l|x_{il}-x_{jl}|)^2}\), for \(i,j = 1, \ldots , n\), and where d is the dimension of the vector of hyper parameters \(\varvec{\theta }\) that collects the correlation factors in charge of smoothing the predictor with varying intensity. The two parameters \(\mu \) and \(\tau ^2\) can be estimated through the following maximum likelihood estimators [44],
where \(\textit{\textbf{f}}\) represents the n-dimensional vector of the function evaluations at the sampled points within the set \(\mathbb {S}=\left\{ \textit{\textbf{x}}_{1},\textit{\textbf{x}}_{2},\ldots ,\textit{\textbf{x}}_{n}\right\} \), and \( \textit{\textbf{1}}_n\) is an n-vector of ones.
Thus, we can build a predictor function \(\hat{f}(\textit{\textbf{x}})\) and model variance \(\hat{s}^2(\textit{\textbf{x}})\) for \(\textit{\textbf{x}}\in \mathbb {X}\) following [44].
The predictor function is defined for any \(\textit{\textbf{x}} \in {\mathbb {X}}\):
with a corresponding predictive model variance of:
where \(\mathbf {r}\) is the n-dimensional vector of the correlations between the predicted variance at location \(\textit{\textbf{x}}\) and the observed model error at the sampled locations \(\textit{\textbf{x}}\in \mathbb {S}\):
The interested reader can refer to [44] and references.
Appendix B: details of the trust region algorithm
The implementation of SOAR in this paper uses the trust region method [11, 33] as its local search, an iterative derivative-free optimization approach that, under appropriate conditions, is guaranteed to converge to a local minimum [33]. An iteration of the trust region algorithm approximates the objective function (adopting usually a linear or quadratic model) around a location referred to as centroid of the search, and proceeds to minimize the approximating function over a “trust region” (usually a simplex or ellipsoid). The approximation allows to easily solve the minimization problem. The solution is used as the centroid location for the next iteration if the linear/quadratic surface is a good approximation for the true function within the trust region.
More specifically, the \(\ell _{k}^\mathrm{th}\) iteration of our implementation of the trust region method starts with a centroid \(\tilde{\textit{\textbf{x}}}^{c}_{\ell _{k}}\) of an associated hypercube as a trust region, denoted \(\mathbb {R}_{\ell _{k}}\), with length of the side of the hypercube equal to \(r_{\ell _{k}}\). An iteration of the algorithm results in a centroid update and/or a trust region size update, achieved by minimizing a quadratic model, \(\hat{f}_Q(\textit{\textbf{x}})\). The quadratic model is given by:
where \(\nabla f(\varvec{\tilde{x}}^{c}_{\ell _{k}})\) is an approximation of the gradient at the centroid, and \(\nabla ^2 f(\varvec{\tilde{x}}^{c}_{\ell _{k}})\) is an approximation of the Hessian matrix. Hence, the Trust Region Sub-problem (TRS):
where \({\tilde{\textit{\textbf{x}}}}^{*}_{\ell _{k}}\) becomes the candidate centroid for the next trust region iteration.
Two decisions need to be made by the algorithm to progress to the next iteration: (1) whether to accept the candidate centroid \({\tilde{\textit{\textbf{x}}}}^{*}_{\ell _{k}}\); and (2) whether to shrink or expand the size parameter \(r_{\ell _{k}}\). The ratio-comparison test answers these questions by comparing the true function value \(f(\tilde{\textit{\textbf{x}}}^{*}_{\ell _{k}})\) to the predicted value from the quadratic model \(\hat{f}_Q(\tilde{\textit{\textbf{x}}}^{*}_{\ell _{k}})\), and constructing the following statistic [9, 33],
when \(\rho > 1\), then we have produced a better than predicted reduction over the objective function, while \(\rho < 0\) indicates that the current model is inadequate over the current trust region, and while a reduction in the objective value was predicted, the trust region step produced a worsening solution, i.e., \(f(\varvec{\tilde{x}}^{c}_{\ell _{k}})<f(\tilde{\textit{\textbf{x}}}^{*}_{\ell _{k}})\).
Comparing \(\rho \) with the user defined threshold values \(\eta _{1},\eta _{2}\), \(0<\eta _1\le \eta _2<1\), we have three possible cases:
- Case 1:
-
\(\rho \le \eta _1 \implies \) keep current centroid \(\tilde{\textit{\textbf{x}}}^{c}_{\ell _{k}+1} \leftarrow \tilde{\textit{\textbf{x}}}^{c}_{\ell _{k}}\); reduce trust region size \(r_{\ell _{k}+1} \leftarrow r_{k_\ell } \cdot \omega \).
- Case 2:
-
\(\eta _1 < \rho \le \eta _2 \implies \) accept candidate centroid \(\tilde{\textit{\textbf{x}}}^{c}_{\ell _{k}+1} \leftarrow \varvec{\tilde{x}}^{*}_{\ell _{k}}\); keep current trust region size \(r_{\ell _{k}+1}\leftarrow r_{\ell _{k}}\).
- Case 3:
-
\(\rho > \eta _2 \implies \) accept candidate centroid \(\tilde{\textit{\textbf{x}}}^{c}_{\ell _{k}+1} \leftarrow \varvec{\tilde{x}}^{*}_{\ell _{k}}\); expand trust region size \(r_{\ell _{k}+1} \leftarrow r_{\ell _{k}} \cdot \gamma \).
where \(\omega \in (0,1)\) is the trust region reduction rate and \(\gamma > 1\) is the trust region expansion rate.
Under Case 1, we reformulate the subproblem (TRS) with the same local quadratic model but under more restrictive trust region constraints. Under Case 2 or Case 3, we recompute \(\nabla f(\tilde{\textit{\textbf{x}}}^{c}_{\ell _{k}})\) and \(\nabla ^2 f(\tilde{\textit{\textbf{x}}}^{c}_{\ell _{k}})\) in order to update the quadratic model \(\hat{f}_Q(\textit{\textbf{x}})\), since the trust region centroid has moved.
The following stopping criteria are used (in addition to the restart-related conditions):
- Criterion 1:
-
\(\Vert \nabla f(\tilde{\textit{\textbf{x}}}^{*}_{\ell _{k}}) \Vert < \epsilon _1\)
- Criterion 2:
-
\(\Vert \varvec{\delta }_{\ell _{k}} \Vert < \epsilon _2\)
where \(\Vert \cdot \Vert \) is the Euclidean norm, \(\epsilon _{1}, \epsilon _{2} > 0\) are user-defined parameters, and \(\varvec{\delta }_{\ell _{k}} = \tilde{\textit{\textbf{x}}}^{*}_{\ell _{k}} - \tilde{\textit{\textbf{x}}}^{c}_{\ell _{k}}\) represents the step size between the current and proposed centroid. Note that \(\mathbb {R}_{\ell _{k}} \rightarrow \mathbb {X}\) as \(\ell _{k} \rightarrow \infty \), and \(||\varvec{\delta }_{\ell _{k}}||_{2} \rightarrow 0\) as \(\ell _{k} \rightarrow \infty \).
It was shown in [33] (Theorem 4.8) that, when the iterations grow \(\ell _{k}\rightarrow \infty \):
regardless of the initial starting location of the trust region search, where \(\tilde{\textit{\textbf{x}}}^{c}_{\infty }\) is the centroid in the limit, and \(\mathcal {N}_\epsilon (\tilde{\textit{\textbf{x}}}^{c}_{\infty })\) is an \(\epsilon \) radius ball centered at \(\tilde{\textit{\textbf{x}}}^{c}_{\infty }\). The result requires (1) Lipschitz continuity for f, (2) boundedness of f, (3) \(\eta _1 \in (0,0.25)\), and (4) uniform boundedness in norm of the Hessian of f by a constant \(\beta \).
Rights and permissions
About this article
Cite this article
Mathesen, L., Pedrielli, G., Ng, S.H. et al. Stochastic optimization with adaptive restart: a framework for integrated local and global learning. J Glob Optim 79, 87–110 (2021). https://doi.org/10.1007/s10898-020-00937-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10898-020-00937-5