A data-driven robust optimization algorithm for black-box cases: An application to hyper-parameter optimization of machine learning algorithms

doi:10.1016/j.cie.2021.107581

Computers & Industrial Engineering

Volume 160, October 2021, 107581

https://doi.org/10.1016/j.cie.2021.107581 Get rights and content

Highlights

•
A novel Black-Box data-driven robust optimization approach is proposed.
•
A Gaussian process is used in a Bayesian optimization framework to design the approach.
•
The approach is consistent with the data in a predefined confidence level.
•
A hyper-parameter optimization for deep learning is investigated as an application.
•
The optimal hyper-parameters are robust with respect to noise.

Abstract

The huge availability of data in the last decade has raised the opportunity for the better use of data in decision-making processes. The idea of using the existing data to achieve a more coherent reality solution has led to a branch of optimization called data-driven optimization. On the one hand, the presence of uncertain variables in these datasets makes it crucial to design robust optimization methods in this area. On the other hand, in many real-world problems, the closed-form of the objective function is not available and a meta-model based framework is necessary. Motivated by the above points, in this paper a Gaussian process is used in a Bayesian optimization framework to design a method that is consistent with the data in a predefined confidence level. The advantage of the proposed method is that it is computationally tractable in addition to being robust and independent of the objective function’s form. As one of the applications of the proposed algorithm, hyper-parameter optimization for deep learning is investigated. The proposed method can help find the optimal hyper-parameters that are robust with respect to noise.

Graphical abstract

Introduction

One of the methods to cope with the uncertainty involved in data/parameters of the optimization problems that lead to violation of feasibility and optimality conditions is the so-called robust optimization (RO). Soyster (1973) proposed a linear programming (LP) model in which noisy input data belong to a convex set. This approach is too conservative since it ensures feasibility for all uncertain realizations and causes a large optimality cost. As such, vital progress in developing less conservative approaches was made independently by El Ghaoui and Lebret, 1997, Ben-Tal and Nemirovski, 1998. Intheir approach, uncertainty sets were assumed ellipsoidal based on which a counterpart model was solved with deterministic parameters. Ben-Tal & Nemirovski (1998) proposed replacing an uncertain LP problem by its robust counterpart and showed that the robust counterpart of an LP problem with ellipsoidal uncertainty set is a conic quadratic program that can be solved in polynomial time. However, such method cannot be directly applied to discrete optimization (Bertsimas & Sim, 2003). Another drawback of this method is that it leads to nonlinear, although convex models, which are more computationally demanding than the earlier linear models. Later, Bertsimas & Sim (2004) suggested using intervals as uncertainty sets for which the model is transformed into a deterministic linear model of an LP through duality theory. The main advantage of their proposal was to budget the uncertainty, i.e. ensuring the feasibility by controlling the number of active uncertain parameters.

Stochastic Programming (SP) is another powerful modelling paradigm for optimization under uncertainty. A generic single-stage SP is $\underset{x \in R^{n}}{Min} E P [h (x, ξ)]$ , where the expectation is on the distribution of the random vector $ξ \in R^{m}$ and $h$ is the cost function depending on the solution $x$ as well. However, classical SP for large-scale decision-making problems is not well-suited (Esfahani & Kuhn 2018). As a remedy, an intermediate approach between SP and RO called distributionally robust optimization (DRO) was proposed in the literature in which uncertain data is governed by a distribution that is itself subject to uncertainty. This distribution belongs to an ambiguity set comprising all distributions that are compatible with the prior knowledge (Wiesemann et al., 2014). The motivation for this approach is the availability of rich and extensive historical data in recent years. The first study in this field was proposed by Scarf (1957) in the context of inventory control problem. Esteban-Pérez & Morales (2019) categorized all methods available on DRO in three major classes as follows:

•
Studies such as Dupačová, 1987, Prékopa, 1995, Bertsimas and Sethuraman, 2000, Delage and Ye, 2010, Zymler et al., 2013, Xin and Goldberg, 2013, Mehrotra and Papp, 2014, Gao and Kleywegt, 2016, Nakao et al., 2017, and Liu et al. (2018) that consider sets based on the distribution moments.
•
Works in which the ambiguity set is defined as the set of all distributions whose dissimilarity with a prescribed distribution is less than or equal to a given value. This class has the following three subclasses:
- (I)
  The Wasserstein ambiguity set is used in Shafieezadeh-Abadeh et al., 2017, Gao and Kleywegt, 2016, Gao and Kleywegt, 2017, Blanchet et al., 2017a, Blanchet et al., 2017b, Esfahani and Kuhn, 2018.
- (II)
  The $φ$ -divergence is utilized in Ben-Tal et al., 2013, Bayraksan and Love, 2015, Moghaddam and Mahlooji, 2016, Namkoong and Duchi, 2016.
- (III)
  Nilim and El Ghaoui, 2005, Iyengar, 2005, Wang et al., 2016, and Duchi et al. (2016) used the likelihood ratio with the historical data.
•
In the third class, the ambiguity set is based on all distributions that, given a sample, pass a prescribed hypothesis test, examples are Marla et al., 2018, Bertsimas et al., 2018, and Chen et al. (2019) who used this approach.

Bertsimas et al. (2018) proposed using statistical hypothesis tests which is flexible, widely applicable, and tractable, both theoretically and practically. In addition, their optimal solution enjoys a strong, finite-sample guarantee in the case that the constraints and the objective function are concave in the uncertainty. They described how to choose an appropriate set and applied their approach to multiple uncertain constraints. Nevertheless, their method requires a closed-form objective function which is rarely available in real-world problems. As such, in this paper, their approach is extended to manage objective functions that are black boxes and are not given in closed form. To do this, a Gaussian model is used in lieu of the function, based on which the model proposed by Bertsimas et al. (2018) is applied to it.

The rest of the paper is organized as follows. Section 2 provides a short review of robust optimization, Gaussian processes, data-driven robust optimization, and Bayesian optimization method. The proposed data-driven robust optimization approach and the computational complexity of proposed algorithm are explored in detail in Section 3 Section 4. In Section 4, the performance of the suggested approach is demonstrated using test functions. Robust hyper-parameter optimization of machine learning algorithms, as one of the applications of the proposed algorithm, is addressed in Section 5. Finally, conclusion and future research directions are provided in Section 6.

Section snippets

Background

This section provides a brief background on robust optimization, Gaussian processes, data-driven robust optimization, and Bayesian optimization method.

Proposed algorithm

Putting all the above preliminaries together, the following algorithm is proposed for minimization:

Start

(1)
Set the problem parameter $α_{1}, α_{2}, α_{3}, ∊, n_{u}, n_{c}$ .
(2)
While the cross-validation criterion is not satisfied, do:
- (a)
  Set n (number of design points).
- (b)
  Set the location of design points (X) by the Latin Hypercube Sampling (LHS) design.
(3)
Set $X^{c}, X_{\mod}^{C}, X^{E}, X_{\mod}^{E}$ .
(4)
Normalize X.
(5)
Calculate $K (X, X)$ .
(6)
For i = 1,…, $n_{u}$
- (a)
  Do Kolmogorov–Smirnov test and find a distribution for each uncertain parameter in a predefined level $Γ^{KS}$ .
- (b)
  Define a

Test functions

In order to evaluate the performance of the proposed approach, some cases for data-driven robust optimization are required. The designs and the noise values of the test cases in this paper are similar to the ones used in Azizi et al. (2019) who employed the 12 test functions of Marzat et al. (2013) as:

•
$f_{1} = 5 {(x_{1}^{2} + x_{2}^{2})}^{2} - {(e_{1}^{2} + e_{2}^{2})}^{2} + x_{1} (- e_{1} + e_{2} + 5) + x_{2} (e_{1} - e_{2} + 3)$
•
$f_{2} = 4 {(x_{1} - 2)}^{2} - {2 e}_{1}^{2} + x_{1}^{2} e_{1} - e_{2}^{2} + {2 x}_{2}^{2} e_{2}$
•
$f_{3} = x_{1}^{4} e_{2} + {2 x}_{1}^{3} e_{1} - x_{2}^{2} e_{2} (e_{2} - 3) - {2 x}_{2} {(e_{1} - 3)}^{2}$
•
$f_{4} = - \sum_{i = 1}^{3} {(e_{i} - 1)}^{2} + \sum_{i = 1}^{2} {(x_{i} - 1)}^{2} + e_{3} (x_{2} - 1) + e_{1} (x_{1} - 1) + e_{2} x_{1} x_{2}$
•
$f_{5} = {- e}_{1} (x_{1} - 1) - e_{2} (x_{2} - 2) - e_{3} (x_{3} - 1) + {2 x}_{1}^{2} + 3 x_{2}^{2} + x$

Case studies

Machine learning algorithms have achieved a prominent position in many scientific and practical applications. This position has led to an ever-growing demand for machine learning systems. This prominent position is owed to their good performances, yet it is important to consider that these good performances heavily rely on choosing proper internal hyper-parameters (Feurer et al., 2015) (Falkner et al., 2018). So, we design a way to automatically set these hyper-parameters to optimize the

Conclusion

The prevalence of high-quality data and lack of a closed-form objective function in many problems led us to design a robust data-driven optimization method in this paper. This method was founded on the uncertainty sets proposed by Bertsimas et al. (2018) and uses a Gaussian meta-model. One of the advantages of the designed method is the ability to determine the robustness degree for the answer provided by the user. DRSO algorithm is computationally tractable and was shown to have a complexity

CRediT authorship contribution statement

Farshad Seifi: Conceptualization, Validation, Investigation, Data curation, Writing – original draft, Visualization. Mohammad Javad Azizi: Methodology, Software, Formal analysis. Seyed Taghi Akhavan Niaki: Resources, Writing – review & editing, Supervision, Project administration.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (55)

L. Marla et al.
Robust optimization: Lessons learned from aircraft routing
Computers & Operations Research
(2018)
H. Nakao et al.
Network design in scarce data environment using moment-based distributionally robust optimization
Computers & Operations Research
(2017)
M.J. Azizi et al.
A robust simulation optimization algorithm using kriging and particle swarm optimization: Application to surgery room optimization
Communications in Statistics-Simulation and Computation
(2019)
Bayraksan, G., Love, D. K. (2015). Data-driven stochastic programming using phi-divergences. In The Operations Research...
A. Ben-Tal et al.
Robust convex optimization
Mathematics of Operations Research
(1998)
A. Ben-Tal et al.
Robust optimization
(2009)
A. Ben-Tal et al.
Robust solutions of optimization problems affected by uncertain probabilities
Management Science
(2013)
D. Bertsimas et al.
Moment problems and semidefinite optimization
Handbook of Semidefinite Programming
(2000)
D. Bertsimas et al.
Robust discrete optimization and network flows
Mathematical Programming
(2003)
D. Bertsimas et al.
The price of robustness
Operations Research
(2004)

D. Bertsimas et al.

Data-driven robust optimization

Mathematical Programming

(2018)

Blanchet, J., Kang, Y., Zhang, F., He, F., Hu, Z. (2017a). Doubly robust data-driven distributionally robust...

Blanchet, J., Kang, Y., Zhang, F., Murthy, K. (2017b). Data-driven optimal transport cost selection for...

S. Boyd et al.

Convex optimization

(2004)

Chen, X., Lin, Q., Xu, G. (2019). Distributionally robust optimization with confidence bands for probability density...

E. Delage et al.

Distributionally robust optimization under moment uncertainty with application to data-driven problems

Operations Research

(2010)

Duchi, J., Glynn, P., Namkoong, H. (2016). Statistics of robust optimization: A generalized empirical likelihood...

J. Dupačová

The minimax approach to stochastic programming and an illustrative application

Stochastics: An International Journal of Probability and Stochastic Processes

(1987)

L. El Ghaoui et al.

Robust solutions to least-squares problems with uncertain data

SIAM Journal on Matrix Analysis and Applications

(1997)

P.M. Esfahani et al.

Data-driven distributionally robust optimization using the Wasserstein metric: Performance guarantees and tractable reformulations

Mathematical Programming

(2018)

Esteban-Pérez, A., Morales, J. M. (2019). Data-driven distributionally robust optimization via optimal transport with...

Falkner, S., Klein, A., and Hutter, F. (2018). BOHB: Robust and efficient hyperparameter optimization at scale. arXiv...

M. Feurer et al.

Efficient and robust automated machine learning

Advances in Neural Information Processing Systems

(2015)

M. Feurer et al.

Hyperparameter optimization

Automated Machine Learning

(2019)

Gao, R., Kleywegt, A. J. (2016). Distributionally robust stochastic optimization with Wasserstein distance. arXiv...

Gao, R., Kleywegt, A. J. (2017). Distributionally robust stochastic optimization with dependence structure. arXiv...

C. Igel

Multi-objective model selection for support vector machines

Cited by (5)

Bayesian-optimized deep learning model to segment deterioration patterns underneath bridge decks photographed by unmanned aerial vehicle
2023, Automation in Construction
Citation Excerpt :
The BOA has been used to tune the hyper-parameters of long short-term memory (LSTM) models [44]. Several studies have used the BOA to tune the parameters of machine learning models and thereby to greatly reduce the necessary computing power [45,46]. Researchers have found that the BOA has the potential to become the standard approach to parameter tuning in the fields of optoelectronic science and engineering [47].
In recent years, bridge collapses and fractures have occurred in various countries mostly following a lack of inspection and maintenance. External inspection processes can be very time-consuming and pose labor safety hazards. Terrain obstacles may also prevent the thorough inspection of some structures. The use of artificial intelligence instead of visual inspection by bridge inspectors is state-of-the-art. This study develops a Bayesian-optimized deep learning model for use on an unmanned aerial vehicle (UAV) to identify the deterioration patterns and segment areas of composite decks under bridges by computer vision-based techniques. The proposed module alters traditional labor-intensive methods of visual bridge inspections, reduces labor safety hazards, and increases inspection accuracy. It can be embedded in an artificial intelligence chip, which is then installed in a consumer-grade UAV, making it a dedicated drone for the external inspection of composite bridges.
A type-II maximum-likelihood approach to Gaussian scale mixture-based sparse regression Kriging
2022, Computers and Industrial Engineering
Citation Excerpt :
Nowadays, metamodeling, also known as surrogate modeling, has earned intensive attentions since the seminal work of Sacks et al. (Sacks et al., 1989). Among various kinds of metamodeling techniques in the past decades, Kriging is one of the most popular candidates due to its capability of quantifying prediction uncertainty, which has benefited various downstream tasks in industrial engineering (Junior et al., 2019), such as black-box global optimization (Gu et al., 2019; Jeong & Shin, 2021; Palar & Shimoyama, 2018; Seifi et al., 2021; Zhan & Xing, 2021), robust design optimization (Han & Tan, 2016; Jiang et al., 2021; Park & Leeds, 2016; Parnianifard et al., 2020); and so on. Classical Kriging.
In this paper, a novel sparse regression Kriging method termed SRK is proposed, putting an emphasis on efficiently identifying an adaptive overall trend. The main idea underlying SRK is that, by applying a Cholesky decomposition on the correlation matrix, a general Gaussian scale mixture prior- based sparse Bayesian learning scheme can be naturally incorporated into Gaussian process regression, thus facilitating determination of the adaptive trend and correlation functions in an iterative manner. In particular, two sparsity-inducing distributions including Laplacian and Student’s T are implemented as special cases of the Gaussian scale mixture prior, and it is found that their influence to SRK just differs in the estimating formula of a common hyper-parameter. Metamodeling experiments are performed on practical engineering design problems with very limited training data points. Results demonstrate that the Laplacian-based SRK is not only more sensible than T-based SRK, but also achieves comparable or even better performance than benchmark approaches in terms of either computational cost or prediction precision.
Computer-aided food engineering
2022, Nature Food
Dynamic Meta-Learning Acquisition Function Method for Bayesian Optimization with Early Stopping Criteria for Hyperparameter Optimization
2022, SSRN
Extending the hypergradient descent technique to reduce the time of optimal solution achieved in hyperparameter optimization algorithms
2023, International Journal of Industrial Engineering Computations

View full text

A data-driven robust optimization algorithm for black-box cases: An application to hyper-parameter optimization of machine learning algorithms

Highlights

Abstract

Graphical abstract

Introduction

Section snippets

Background

Proposed algorithm

Test functions

Case studies

Conclusion

CRediT authorship contribution statement

Declaration of Competing Interest

Computers & Operations Research

Computers & Operations Research

A robust simulation optimization algorithm using kriging and particle swarm optimization: Application to surgery room optimization

Communications in Statistics-Simulation and Computation

Robust convex optimization

Mathematics of Operations Research

Robust optimization

Robust solutions of optimization problems affected by uncertain probabilities

Management Science

Moment problems and semidefinite optimization

Handbook of Semidefinite Programming

Robust discrete optimization and network flows

Mathematical Programming

The price of robustness

Operations Research

Data-driven robust optimization

Mathematical Programming

Convex optimization

Distributionally robust optimization under moment uncertainty with application to data-driven problems

Operations Research

The minimax approach to stochastic programming and an illustrative application

Stochastics: An International Journal of Probability and Stochastic Processes

Robust solutions to least-squares problems with uncertain data

SIAM Journal on Matrix Analysis and Applications

Data-driven distributionally robust optimization using the Wasserstein metric: Performance guarantees and tractable reformulations

Mathematical Programming

Efficient and robust automated machine learning

Advances in Neural Information Processing Systems

Hyperparameter optimization

Automated Machine Learning

Multi-objective model selection for support vector machines