Abstract
Bayesian Optimization (BO) is a surrogate-based global optimization strategy that relies on a Gaussian Process regression (GPR) model to approximate the objective function and an acquisition function to suggest candidate points. It is well-known that BO does not scale well for high-dimensional problems because the GPR model requires substantially more data points to achieve sufficient accuracy and acquisition optimization becomes computationally expensive in high dimensions. Several recent works aim at addressing these issues, e.g., methods that implement online variable selection or conduct the search on a lower-dimensional sub-manifold of the original search space. Advancing our previous work of PCA-BO that learns a linear sub-manifold, this paper proposes a novel kernel PCA-assisted BO (KPCA-BO) algorithm, which embeds a non-linear sub-manifold in the search space and performs BO on this sub-manifold. Intuitively, constructing the GPR model on a lower-dimensional sub-manifold helps improve the modeling accuracy without requiring much more data from the objective function. Also, our approach defines the acquisition function on the lower-dimensional sub-manifold, making the acquisition optimization more manageable.
We compare the performance of KPCA-BO to a vanilla BO and to PCA-BO on the multi-modal problems of the COCO/BBOB benchmark suite. Empirical results show that KPCA-BO outperforms BO in terms of convergence speed on most test problems, and this benefit becomes more significant when the dimensionality increases. For the 60D functions, KPCA-BO achieves better results than PCA-BO for many test cases. Compared to the vanilla BO, it efficiently reduces the CPU time required to train the GPR model and to optimize the acquisition function compared to the vanilla BO.
K. Antonov—Work done while visiting Sorbonne Université in Paris.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The outer product is a linear operator defined as \(\forall h\in \mathcal {H}, [\phi (\mathbf {x})\phi (\mathbf {x})^\top ](h):h\mapsto \langle \phi (\mathbf {x}), h \rangle _{\mathcal {H}}\phi (\mathbf {x})\). Hence, the sample covariance is also a linear operator \(C:\mathcal {H} \rightarrow \mathcal {H} \).
References
Bellman, R.: Dynamic programming. Science 153(3731), 34–37 (1966). https://doi.org/10.1126/science.153.3731.34
Ben Salem, M., Bachoc, F., Roustant, O., Gamboa, F., Tomaso, L.: Sequential dimension reduction for learning features of expensive black-box functions (2019). https://hal.archives-ouvertes.fr/hal-01688329, preprint
Binois, M., Wycoff, N.: A survey on high-dimensional Gaussian process modeling with application to Bayesian optimization. arXiv:2111.05040 [math], November 2021
Bull, A.D.: Convergence rates of efficient global optimization algorithms. J. Mach. Learn. Res. 12, 2879–2904 (2011). http://dl.acm.org/citation.cfm?id=2078198
Byrd, R.H., Lu, P., Nocedal, J., Zhu, C.: A limited memory algorithm for bound constrained optimization. SIAM J. Sci. Comput. 16(5), 1190–1208 (1995). https://doi.org/10.1137/0916069
Delbridge, I., Bindel, D., Wilson, A.G.: Randomly Projected Additive Gaussian Processes for Regression. In: Proc. of the 37th International Conference on Machine Learning (ICML), pp. 2453–2463. PMLR, November 2020
Duvenaud, D.K., Nickisch, H., Rasmussen, C.: Additive Gaussian Processes. In: Advances in Neural Information Processing Systems, vol. 24. Curran Associates, Inc. (2011)
García-González, A., Huerta, A., Zlotnik, S., Díez, P.: A kernel principal component analysis (kpca) digest with a new backward mapping (pre-image reconstruction) strategy. CoRR abs/2001.01958 (2020)
Gaudrie, D., Le Riche, R., Picheny, V., Enaux, B., Herbert, V.: Modeling and optimization with Gaussian processes in reduced eigenbases. Struct. Multidiscip. Optim. 61(6), 2343–2361 (2020). https://doi.org/10.1007/s00158-019-02458-6
Ginsbourger, D., Roustant, O., Schuhmacher, D., Durrande, N., Lenz, N.: On ANOVA decompositions of kernels and Gaussian random field paths. arXiv:1409.6008 [math, stat], October 2014
Guhaniyogi, R., Dunson, D.B.: Compressed gaussian process for manifold regression. J. Mach. Learn. Res. 17(69), 1–26 (2016). http://jmlr.org/papers/v17/14-230.html
Hansen, N., Auger, A., Ros, R., Mersmann, O., Tušar, T., Brockhoff, D.: COCO: a platform for comparing continuous optimizers in a black-box setting. Optimization Methods and Software, pp. 1–31 (2020)
Hansen, N., Ostermeier, A.: Completely derandomized self-adaptation in evolution strategies. Evol. Comput. 9(2), 159–195 (2001). https://doi.org/10.1162/106365601750190398
Huang, W., Zhao, D., Sun, F., Liu, H., Chang, E.: Scalable Gaussian process regression using deep neural networks. In: Proceedings of the 24th International Conference on Artificial Intelligence (IJCAI), pp. 3576–3582. AAAI Press (2015)
Jones, D.R., Schonlau, M., Welch, W.J.: Efficient global optimization of expensive black-box functions. J. Global Optim. 13(4), 455–492 (1998). https://doi.org/10.1023/A:1008306431147
Kapsoulis, D., Tsiakas, K., Asouti, V., Giannakoglou, K.C.: The use of kernel PCA in evolutionary optimization for computationally demanding engineering applications. In: 2016 IEEE Symposium Series on Computational Intelligence, SSCI 2016, Athens, Greece, December 6–9, 2016, pp. 1–8. IEEE (2016). https://doi.org/10.1109/SSCI.2016.7850203
Li, C., Gupta, S., Rana, S., Nguyen, V., Venkatesh, S., Shilton, A.: High dimensional bayesian optimization using dropout. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI), pp. 2096–2102. AAAI Press (2017)
Močkus, J.: On bayesian methods for seeking the extremum. In: Marchuk, G.I. (ed.) Optimization Techniques 1974. LNCS, vol. 27, pp. 400–404. Springer, Heidelberg (1975). https://doi.org/10.1007/3-540-07165-2_55
Muehlenstaedt, T., Roustant, O., Carraro, L., Kuhnt, S.: Data-driven Kriging models based on FANOVA-decomposition. Stat. Comput. 22(3), 723–738 (2012). https://doi.org/10.1007/s11222-011-9259-7
Niederreiter, H.: Low-discrepancy and low-dispersion sequences. J. Number Theory 30(1), 51–70 (1988)
Raponi, E., Wang, H., Bujny, M., Boria, S., Doerr, C.: High dimensional bayesian optimization assisted by principal component analysis. In: Bäck, T., Preuss, M., Deutz, A., Wang, H., Doerr, C., Emmerich, M., Trautmann, H. (eds.) PPSN 2020. LNCS, vol. 12269, pp. 169–183. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58112-1_12
Rasmussen, C.E., Williams, C.K.I.: Gaussian processes for machine learning. Adaptive computation and machine learning, MIT Press (2006), https://www.worldcat.org/oclc/61285753
Rolland, P., Scarlett, J., Bogunovic, I., Cevher, V.: High-dimensional bayesian optimization via additive models with overlapping groups. In: Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, pp. 298–307. PMLR, March 2018
Santner, T.J., Williams, B.J., Notz, W.I.: The Design and Analysis of Computer Experiments. Springer (2003). https://doi.org/10.1007/978-1-4757-3799-8
Schölkopf, B., Smola, A., Müller, K.R.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput. 10(5), 1299–1319 (1998). https://doi.org/10.1162/089976698300017467
Shahriari, B., Swersky, K., Wang, Z., Adams, R.P., de Freitas, N.: Taking the Human Out of the Loop: A Review of Bayesian Optimization. Proc. IEEE 104(1), 148–175 (2016). https://doi.org/10.1109/JPROC.2015.2494218
Ulmasov, D., Baroukh, C., Chachuat, B., Deisenroth, M., Misener, R.: Bayesian optimization with dimension scheduling: application to biological systems. Comput. Aided Chem. Eng. 38, November 2015. https://doi.org/10.1016/B978-0-444-63428-3.50180-6
Wang, H., Vermetten, D., Ye, F., Doerr, C., Bäck, T.: IOHanalyzer: performance analysis for iterative optimization heuristic. ACM Trans. Evol. Learn. Optim. (2022). https://doi.org/10.1145/3510426
Wang, Z., Hutter, F., Zoghi, M., Matheson, D., De Freitas, N.: Bayesian optimization in a billion dimensions via random embeddings (2016)
Acknowledgments
Our work is supported by the Paris Ile-de-France region (via the DIM RFSI project AlgoSelect), by the CNRS INS2I institute (via the RandSearch project), by the PRIME programme of the German Academic Exchange Service (DAAD) with funds from the German Federal Ministry of Education and Research (BMBF), and by RFBR and CNRS, project number 20-51-15009.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Antonov, K., Raponi, E., Wang, H., Doerr, C. (2022). High Dimensional Bayesian Optimization with Kernel Principal Component Analysis. In: Rudolph, G., Kononova, A.V., Aguirre, H., Kerschke, P., Ochoa, G., Tušar, T. (eds) Parallel Problem Solving from Nature – PPSN XVII. PPSN 2022. Lecture Notes in Computer Science, vol 13398. Springer, Cham. https://doi.org/10.1007/978-3-031-14714-2_9
Download citation
DOI: https://doi.org/10.1007/978-3-031-14714-2_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-14713-5
Online ISBN: 978-3-031-14714-2
eBook Packages: Computer ScienceComputer Science (R0)