Abstract
Tuning hyperparameters of machine learning models is important for their performance. Bayesian optimization has recently emerged as a de-facto method for this task. The hyperparameter tuning is usually performed by looking at model performance on a validation set. Bayesian optimization is used to find the hyperparameter set corresponding to the best model performance. However, in many cases, the function representing the model performance on the validation set contains several spurious sharp peaks due to limited datapoints. The Bayesian optimization, in such cases, has a tendency to converge to sharp peaks instead of other more stable peaks. When a model trained using these hyperparameters is deployed in the real world, its performance suffers dramatically. We address this problem through a novel stable Bayesian optimization framework. We construct two new acquisition functions that help Bayesian optimization to avoid the convergence to the sharp peaks. We conduct a theoretical analysis and guarantee that Bayesian optimization using the proposed acquisition functions prefers stable peaks over unstable ones. Experiments with synthetic function optimization and hyperparameter tuning for support vector machines show the effectiveness of our proposed framework.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs41060-018-0119-9/MediaObjects/41060_2018_119_Fig1_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs41060-018-0119-9/MediaObjects/41060_2018_119_Fig2_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs41060-018-0119-9/MediaObjects/41060_2018_119_Fig3_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs41060-018-0119-9/MediaObjects/41060_2018_119_Fig4_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs41060-018-0119-9/MediaObjects/41060_2018_119_Fig5_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs41060-018-0119-9/MediaObjects/41060_2018_119_Fig6_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs41060-018-0119-9/MediaObjects/41060_2018_119_Fig7_HTML.gif)
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
The highest stable peak region is \(0\le \mathbf {x}\le 0.125\).
References
Azimi, J., Fern, A., Fern, X.Z.: Batch bayesian optimization via simulation matching. Adv. Neural Inf. Process. Syst. 1, 109–117 (2010)
Brochu, E., Cora, V.M., De Freitas, N.: A tutorial on bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv preprint arXiv:1012.2599 (2010)
Bull, A.D.: Convergence rates of efficient global optimization algorithms. J. Mach. Learn. Res. 12, 2879–2904 (2011)
Chen, B., Castro, R., Krause, A.: Joint optimization and variable selection of high-dimensional gaussian processes. arXiv preprint arXiv:1206.6396 (2012)
Garnett, R., Osborne, M.A., Roberts, S.J.: Bayesian optimization for sensor set selection. In: IPSN (2010)
Gelbart, M.A., Snoek, J., Adams, R.P.: Bayesian optimization with unknown constraints. arXiv preprint arXiv:1403.5607 (2014)
Girard, A., Murray-Smith, R.: Gaussian processes: prediction at a noisy input and application to iterative multiple-step ahead forecasting of time-series. In: Murray-Smith, R, Shorten, R (eds) Switching and Learning in Feedback Systems, pp. 158–184. Springer, Berlin (2005)
Jones, D.R., Perttunen, C.D., Stuckman, B.E.: Lipschitzian optimization without the lipschitz constant. J. Optim. Theory Appl. 79(1), 157–181 (1993)
Jones, D.R., Schonlau, M., Welch, W.J.: Efficient global optimization of expensive black-box functions. J. Glob. Optim. 13(4), 455–492 (1998)
Joy, T.T., Rana, S., Gupta, S.K., Venkatesh, S.: Flexible transfer learning framework for bayesian optimisation. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 102–114. Springer, Berlin (2016)
Laumanns, M., Ocenasek, J.: Bayesian optimization algorithms for multi-objective optimization. In: PPSN (2002)
Lizotte, D.J., Wang, T., Bowling, M.H., Schuurmans, D.: Automatic gait optimization with gaussian process regression. IJCAI 7, 944–949 (2007)
Martinez-Cantin, et al.: A bayesian exploration-exploitation approach for optimal online sensing and planning with a visually guided mobile robot. Auton. Robots 27(2), 93-103 (2009)
Mockus, J., Tiesis, V., Zilinskas, A.: The application of bayesian methods for seeking the extremum. Towards Glob. Optim. 2(117–129), 2 (1978)
Nguyen, T.D., Gupta, S., Rana, S., Venkatesh, S.: Stable bayesian optimization. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 578–591. Springer, Berlin (2017)
Nguyen, V., Rana, S., Gupta, S.K., Li, C., Venkatesh, S.: Budgeted batch Bayesian optimization. In: 16th International Conference on IEEE Data Mining (ICDM), 2016, pp. 1107–1112. IEEE (2016)
Rasmussen, C.E.: Gaussian processes for machine learning. Citeseer, (2006)
Snoek, J., Larochelle, H., Adams, R.P.: Practical bayesian optimization of machine learning algorithms. In: NIPS, pp. 2951–2959 (2012)
Snoek, J., Rippel, O., Swersky, K., Kiros, R., Satish, N., Sundaram, N., Patwary, M., Prabhat, M., Adams, R.: Scalable Bayesian optimization using deep neural networks. In: Proceedings of the 32nd International Conference on Machine Learning (ICML-15), pp. 2171–2180 (2015)
Srinivas, N., Krause, A., Seeger, M., Kakade, S.M.: Gaussian process optimization in the bandit setting: no regret and experimental design. In: ICML (2010)
Thornton, C., Hutter, F., Hoos, H.H., Leyton-Brown, K.: Auto-weka: Combined selection and hyperparameter optimization of classification algorithms. In: ACM SIGKDD (2013)
Wang, Z., de Freitas, N.: Theoretical analysis of bayesian optimisation with unknown gaussian process hyper-parameters. arXiv preprint arXiv:1406.7758 (2014)
Xue, D., et al.: Accelerated search for materials with targeted properties by adaptive design. Nat. Commun. 7, 11241 (2016)
Acknowledgements
This research was partially funded by the Australian Government through the Australian Research Council (ARC) and the Telstra-Deakin Centre of Excellence in Big Data and Machine Learning. Prof Venkatesh is the recipient of an ARC Australian Laureate Fellowship (FL170100006).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
All the authors declare that they have no conflict of interest.
Additional information
This paper is an extension version of the PAKDD’2017 Long Presentation paper “Stable Bayesian Optimization” [15].
Rights and permissions
About this article
Cite this article
Nguyen, T.D., Gupta, S., Rana, S. et al. Stable Bayesian optimization. Int J Data Sci Anal 6, 327–339 (2018). https://doi.org/10.1007/s41060-018-0119-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41060-018-0119-9