Batch Bayesian optimization via adaptive local search

Liu, Jingfei; Jiang, Chao; Zheng, Jing

doi:10.1007/s10489-020-01790-5

Batch Bayesian optimization via adaptive local search

Published: 19 September 2020

Volume 51, pages 1280–1295, (2021)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

1217 Accesses
7 Citations
Explore all metrics

Abstract

Bayesianoptimization (BO) provides an efficient tool for solving the black-box global optimization problems. Under situations where multiple points can be evaluated simultaneously, batch Bayesian optimization has been a popular extension by taking full use of the computational and experimental resources. In this paper, an adaptive local search strategy is investigated to select batch points for Bayesian optimization. First, multi-start strategy and gradient-based optimization method are combined to maximize the acquisition function. Then, an automatic cluster approach (e.g., X-means) is applied to adaptively identify the acquisition function’s local maxima from the gradient-based optimization results. Third, the Bayesian stopping criterion is utilized to guarantee all the local maxima can be obtained theoretically. Moreover, the lower bound confidence criterion and frontend truncation operation are employed to select the most promising local maxima as batch points. Extensive evaluations on various synthetic functions and two hyperparameter tuning problems for deep learning models are utilized to verify the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Empirically Explaining SGD from a Line Search Perspective

A Stochastic Modified Limited Memory BFGS for Training Deep Neural Networks

Resolving learning rates adaptively by locating stochastic non-negative associated gradient projection points using line searches

Article 28 July 2020

Notes

https://github.com/keras-team/keras/blob/master/examples/cifar10_cnn.py

References

Abdel-Hamid O, Mohamed Ar, Jiang H, Deng L, Penn G, Yu D (2014) Convolutional neural networks for speech recognition. IEEE/ACM Transactions on audio, speech, and language processing 22(10):1533–1545
Article Google Scholar
Acharya UR, Fujita H, Oh SL, Hagiwara Y, Tan JH, Adam M (2017) Application of deep convolutional neural network for automated detection of myocardial infarction using ecg signals. Inf Sci 415:190–198
Article Google Scholar
Arora JS, Elwakeil OA, Chahande AI, Hsieh CC (1995) Global optimization methods for engineering applications: a review. Struc Optim 9:137–159. Sourced from Microsoft Academic - https://academic.microsoft.com/paper/2086877131
Article Google Scholar
Auer P (2002) Using confidence bounds for exploitation-exploration trade-offs. J Mach Learn Res 3:397–422
MathSciNet MATH Google Scholar
Azimi J, Jalali A, Fern X (2011) Dynamic batch bayesian optimization. arXiv:1110.3347
Azimi J, Jalali A, Fern X (2012) Hybrid batch bayesian optimization. arXiv:1202.5597
Balandat M, Karrer B, Jiang DR, Daulton S, Letham B, Wilson AG, Bakshy E (2019) Botorch: programmable bayesian optimization in pytorch. arXiv:1910.06403
Bergstra JS, Bardenet R, Bengio Y, Kégl B. (2011) Algorithms for hyper-parameter optimization. In: Advances in neural information processing systems 24, pp. 2546–2554
Binder P, Muma M, Zoubir AM (2018) Gravitational clustering: a simple, robust and adaptive approach for distributed networks. Signal Process 149:36–48
Article Google Scholar
Bishop CM, et al. (2006) Pattern recognition and machine learning. Sourced from Microsoft Academic - https://academic.microsoft.com/paper/166397329
Brochu E, Cora VM, De Freitas N (2010) A tutorial on bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv:1012.2599
Bui DM, Nguyen HQ, Yoon Y, Jun S, Amin MB, Lee S (2015) Gaussian process for predicting cpu utilization and its application to energy efficiency. Appl Intell 43(4):874–891
Article Google Scholar
Buja A, Tibshirani R, Hastie T, Simard P, Sackinger E, Duda ro, Hart pe (1973) Pattern classification and scene analysis, Wiley, New York. Friedman, J. (1994), Flexible metric nearest neighbour classification, technical report, Stan-Ford University
Chevalier C, Ginsbourger D (2013) Fast computation of the multi-points expected improvement with applications in batch selection. In: Learning and intelligent optimization, pp 59–69
Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24(5):603–619
Article Google Scholar
Contal E, Buffoni D, Robicquet A, Vayatis N (2013) Parallel gaussian process optimization with upper confidence bound and pure exploration. In: Machine learning and knowledge discovery in databases, pp 225–240
Daxberger EA, Low BKH (2017) Distributed batch gaussian process optimization. In: Proceedings of the 34th international conference on machine learning - volume 70, ICML’17, pp 951–960
Desautels T, Krause A, Burdick JW (2014) Parallelizing exploration-exploitation tradeoffs in gaussian process bandit optimization. The Journal of Machine Learning Research 15(1):3873–3923
MathSciNet MATH Google Scholar
Feng Y, Hamerly G (2007) Pg-means: learning the number of clusters in data. In: Advances in neural information processing systems, pp 393–400
Fujita H, Cimr D (2019) Decision support system for arrhythmia prediction using convolutional neural network structure without preprocessing. Appl Intell 49(9):3383–3391
Article Google Scholar
Ginsbourger D, Le Riche R, Carraro L (2008) A multi-points criterion for deterministic parallel global optimization based on gaussian processes. Tech rep
González J, Longworth J, James DC, Lawrence ND (2015) arXiv:1505.01627. Sourced from Microsoft Academic - https://academic.microsoft.com/paper/2291718609
Gonzalez J, Dai Z, Hennig P, Lawrence ND (2016) Batch Bayesian optimization via local penalization. In: Proceedings of the nineteenth international workshop on artificial intelligence and statistics. Sourced from Microsoft Academic, vol 51, pp 648–657 - https://academic.microsoft.com/paper/2409689189
González J, Dai Z, Damianou AC, Lawrence ND (2017) Preferential Bayesian optimization. In: Proceedings of the 34th international conference on machine learning. Sourced from Microsoft Academic, vol 70, pp 1282–1291 - https://academic.microsoft.com/paper/2964168155
György A, Kocsis L (2011) Efficient multi-start strategies for local search algorithms. J Artif Intell Res 41:407–444
Article MathSciNet Google Scholar
Hernández-Lobato JM, Hoffman MW, Ghahramani Z (2014) Predictive entropy search for efficient global optimization of black-box functions. In: Advances in neural information processing systems 27. Sourced from Microsoft Academic, pp 918–926 - https://academic.microsoft.com/paper/2167789032
Hinton G, Deng L, Yu D, Dahl GE, Mohamed Ar, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath TN, et al. (2012) Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal processing magazine 29(6):82–97
Article Google Scholar
Hoffman M, Shahriari B, Freitas N (2014) On correlation and budget constraints in model-based bandit optimization with application to automatic machine learning. In: Artificial intelligence and statistics, pp 365–374
Jones DR, Schonlau M, Welch WJ (1998) Efficient global optimization of expensive black-box functions. J Glob Optim 13(4):455–492
Article MathSciNet Google Scholar
Kandasamy K, Schneider J, Póczos B. (2015) High dimensional bayesian optimisation and bandits via additive models. In: International conference on machine learning, pp 295–304
Kass RE, Wasserman L (1995) A reference bayesian test for nested hypotheses and its relationship to the schwarz criterion. Journal of the American Statistical Association 90(431):928–934
Article MathSciNet Google Scholar
Kathuria T, Deshpande A, Kohli P (2016) Batched gaussian process bandit optimization via determinantal point processes. In: Lee DD, Sugiyama M, Luxburg UV, Guyon I, Garnett R (eds) Advances in neural information processing systems 29, pp 4206–4214
Kelley CT (1987) Iterative methods for optimization. Sourced from Microsoft Academic - https://academic.microsoft.com/paper/2123224804
Kim Y (2014) Convolutional neural networks for sentence classification. arXiv:1408.5882
Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90. Sourced from Microsoft Academic - https://academic.microsoft.com/paper/2618530766
Article Google Scholar
Kushner HJ (1964) A New Method of Locating the Maximum Point of an Arbitrary Multipeak Curve in the Presence of Noise. Journal of Fluids Engineering 86(1):97–106
Google Scholar
Lai T, Chen R, Yang C, Li Q, Fujita H, Sadri A, Wang H (2020) Efficient robust model fitting for multistructure data using global greedy search. IEEE Trans Cybern 50(7):3294–3306. Sourced from Microsoft Academic - https://academic.microsoft.com/paper/2921794350
Article Google Scholar
Lai T, Fujita H, Yang C, Li Q, Chen R (2019) Robust model fitting based on greedy search and specified inlier threshold. IEEE Trans Ind Electron 66(10):7956–7966
Article Google Scholar
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Article Google Scholar
Li L, Jamieson K, DeSalvo G, Rostamizadeh A, Talwalkar A (2016) Hyperband: A novel bandit-based approach to hyperparameter optimization. arXiv:1603.06560
Li X, Lai T, Wang S, Chen Q, Yang C, Chen R, Lin J, Zheng F (2019) Weighted feature pyramid networks for object detection. In: 2019 IEEE Intl conf on parallel distributed processing with applications, big data cloud computing, sustainable computing communications, social computing networking (ISPA/BDCloud/socialcom/sustaincom), pp 1500–1504
Liu DC, Nocedal J (1989) On the limited memory bfgs method for large scale optimization. Math Program 45(1):503–528
Article MathSciNet Google Scholar
Lizotte DJ, Wang T, Bowling MH, Schuurmans D (2007) Automatic gait optimization with gaussian process regression. In: IJCAI, vol 7, pp 944–949
Lyons R (2003) Determinantal probability measures. Publications Mathématiques de l’IHÉS 98:167–212
Article MathSciNet Google Scholar
Lyu W, Yang F, Yan C, Zhou D, Zeng X (2018) Batch bayesian optimization via multi-objective acquisition ensemble for automated analog circuit design. In: International conference on machine learning, pp. 3306–3314
Marchant R, Ramos F (2012) Bayesian optimisation for intelligent environmental monitoring. In: 2012 IEEE/RSJ international conference on intelligent robots and systems, pp 2242– 2249
Martí R, Aceves R, León MT, Moreno-Vega JM, Duarte A (2019) Intelligent multi-start methods, 221–243. Sourced from Microsoft Academic - https://academic.microsoft.com/paper/2890433140
Martinez-Cantin R, de Freitas N, Brochu E, Castellanos J, Doucet A (2009) A bayesian exploration-exploitation approach for optimal online sensing and planning with a visually guided mobile robot. Auton Robot 27(2):93–103
Article Google Scholar
McKay MD, Beckman RJ, Conover WJ (1979) Comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics 21(2):239– 245
MathSciNet MATH Google Scholar
Močkus J (1975) On bayesian methods for seeking the extremum. In: Optimization techniques IFIP technical conference, pp 400–404. Springer
Morris MD, Mitchell TJ (1995) Exploratory designs for computational experiments. Journal of Statistical Planning and Inference 43(3):381–402
Article Google Scholar
Nguyen V, Gupta S, Rana S, Li C, Venkatesh S (2018) Practical batch bayesian optimization for less expensive functions. arXiv:1811.01466
Nguyen V, Rana S, Gupta S, Li C, Venkatesh S (2017) Budgeted batch bayesian optimization with unknown batch sizes. arXiv:1703.04842
Nguyen V, Gupta S, Rana S, Li C, Venkatesh S (2019) Filtering bayesian optimization approach in weakly specified search space. Knowledge and Information Systems 60(1):385–413
Article Google Scholar
Ning B, Han QL, Zuo Z (2019) Distributed optimization for multiagent systems: An edge-based fixed-time consensus approach. IEEE Transactions on Systems Man, and Cybernetics 49(1):122–132
Google Scholar
Park JS (1994) Optimal latin-hypercube designs for computer experiments. Journal of statistical planning and inference 39(1):95–111
Article MathSciNet Google Scholar
Pelleg D, Moore AW, et al. (2000) X-means: Extending k-means with efficient estimation of the number of clusters. In: Icml, vol 1, pp 727–734
Peng X, Feng J, Xiao S, Yau W, Zhou JT, Yang S (2018) Structured autoencoders for subspace clustering. IEEE Trans Image Process 27(10):5076–5086
Article MathSciNet Google Scholar
Peng X, Zhu H, Feng J, Shen C, Zhang H, Zhou JT (2019) Deep clustering with sample-assignment invariance prior. IEEE Transactions on Neural Networks and Learning Systems, 1–12
Picheny V, Wagner T, Ginsbourger D (2013) A benchmark of kriging-based infill criteria for noisy optimization. Struct Multidiscip Optim 48(3):607–626
Article Google Scholar
Rasmussen CE (2000) The infinite gaussian mixture model. In: Advances in neural information processing systems, pp 554– 560
Rinnooy Kan AHG, Timmer GT (1987) Stochastic global optimization methods part i: Clustering methods. Math Program 39(1):27–56
Article Google Scholar
Shah A, Ghahramani Z (2015) Parallel predictive entropy search for batch global optimization of expensive objective functions. In: Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R (eds) Advances in neural information processing systems 28, pp 3330–3338
Shahriari B, Swersky K, Wang Z, Adams RP, De Freitas N (2015) Taking the human out of the loop: a review of bayesian optimization. Proc IEEE 104(1):148–175
Article Google Scholar
Shirai T, Takahashi Y (2003) Random point fields associated with certain fredholm determinants i: fermion, poisson and boson point processes. J Funct Anal 205(2):414–463
Article MathSciNet Google Scholar
Snoek J, Larochelle H, Adams RP (2012) Practical bayesian optimization of machine learning algorithms. In: Advances in neural information processing systems 25, pp 2951–2959
Srinivas N, Krause A, Kakade SM, Seeger M (2009) Gaussian process optimization in the bandit setting: No regret and experimental design. arXiv:0912.3995
Teklehaymanot FK, Muma M, Liu J, Zoubir AM (2016) In-network adaptive cluster enumeration for distributed classification and labeling. In: 2016 24Th european signal processing conference (EUSIPCO), pp. 448–452
Vu D, Georgievska S, Szoke S, Kuzniar A, Robert V (2017) fMLC: fast multi-level clustering and visualization of large molecular datasets. Bioinformatics 34(9):1577–1579
Article Google Scholar
Wang L, Xi J, He M, Liu G (2020) Robust time-varying formation design for multiagent systems with disturbances: Extended-state-observer method. International Journal of Robust and Nonlinear Control 30(7):2796–2808
Article MathSciNet Google Scholar
Wang Z, Jegelka S, Kaelbling LP, Lozano-Pérez T (2017) Focused model-learning and planning for non-gaussian continuous state-action systems. In: 2017 IEEE International conference on robotics and automation (ICRA), pp 3754–3761
Wang Z, Li C, Jegelka S, Kohli P (2017) Batched high-dimensional bayesian optimization via structural kernel learning. In: Proceedings of the 34th international conference on machine learning - volume 70, ICML’17, pp 3656–3664
Wang Z, Shakibi B, Jin L, de Freitas N (2014) Bayesian multi- scale optimistic optimization
Williams CK, Rasmussen CE (2005) Gaussian processes for machine learning. Sourced from Microsoft Academic - https://academic.microsoft.com/paper/1746819321
Wilson J, Hutter F, Deisenroth M (2018) Maximizing acquisition functions for bayesian optimization. In: Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R (eds) Advances in neural information processing systems 31, pp 9884–9895
Wu J, Frazier P (2016) The parallel knowledge gradient method for batch bayesian optimization. In: Lee DD, Sugiyama M, Luxburg UV, Guyon I, Garnett R (eds) Advances in neural information processing systems 29, pp 3126–3134
Xiao H, Rasul K, Vollgraf R (2017) Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv:1708.07747
Van Stein B, Wang H, Kowalczyk W, Emmerich M, Bäck T (2019) Cluster-based kriging approximation algorithms for complexity reduction. Appl Intell 50(3):1–14
Google Scholar

Download references

Acknowledgments

The authors sincerely thank the three reviewers and the associate editors for their enthusiasm and thoughtful feedbacks, which helps a lot to improve this paper.

Author information

Authors and Affiliations

School of Mechanical and Vehicle Engineering, Hunan University, ChangSha, 410082, China
Jingfei Liu, Chao Jiang & Jing Zheng

Authors

Jingfei Liu
View author publications
You can also search for this author in PubMed Google Scholar
Chao Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Jing Zheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chao Jiang.

Ethics declarations

Conflict of interests

There are neither financial interests nor relationships that will influence the content reported in this paper.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work is financially supported by the National Key R&D Program of China (2018YFB1701400)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, J., Jiang, C. & Zheng, J. Batch Bayesian optimization via adaptive local search. Appl Intell 51, 1280–1295 (2021). https://doi.org/10.1007/s10489-020-01790-5

Download citation

Published: 19 September 2020
Issue Date: March 2021
DOI: https://doi.org/10.1007/s10489-020-01790-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Batch Bayesian optimization via adaptive local search

Abstract

Access this article

Similar content being viewed by others

Empirically Explaining SGD from a Line Search Perspective

A Stochastic Modified Limited Memory BFGS for Training Deep Neural Networks

Resolving learning rates adaptively by locating stochastic non-negative associated gradient projection points using line searches

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Batch Bayesian optimization via adaptive local search

Abstract

Access this article

Similar content being viewed by others

Empirically Explaining SGD from a Line Search Perspective

A Stochastic Modified Limited Memory BFGS for Training Deep Neural Networks

Resolving learning rates adaptively by locating stochastic non-negative associated gradient projection points using line searches

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation