High Dimensional Bayesian Optimization with Kernel Principal Component Analysis

Antonov, Kirill; Raponi, Elena; Wang, Hao; Doerr, Carola

doi:10.1007/978-3-031-14714-2_9

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13398))

Included in the following conference series:

International Conference on Parallel Problem Solving from Nature

2038 Accesses

Abstract

Bayesian Optimization (BO) is a surrogate-based global optimization strategy that relies on a Gaussian Process regression (GPR) model to approximate the objective function and an acquisition function to suggest candidate points. It is well-known that BO does not scale well for high-dimensional problems because the GPR model requires substantially more data points to achieve sufficient accuracy and acquisition optimization becomes computationally expensive in high dimensions. Several recent works aim at addressing these issues, e.g., methods that implement online variable selection or conduct the search on a lower-dimensional sub-manifold of the original search space. Advancing our previous work of PCA-BO that learns a linear sub-manifold, this paper proposes a novel kernel PCA-assisted BO (KPCA-BO) algorithm, which embeds a non-linear sub-manifold in the search space and performs BO on this sub-manifold. Intuitively, constructing the GPR model on a lower-dimensional sub-manifold helps improve the modeling accuracy without requiring much more data from the objective function. Also, our approach defines the acquisition function on the lower-dimensional sub-manifold, making the acquisition optimization more manageable.

We compare the performance of KPCA-BO to a vanilla BO and to PCA-BO on the multi-modal problems of the COCO/BBOB benchmark suite. Empirical results show that KPCA-BO outperforms BO in terms of convergence speed on most test problems, and this benefit becomes more significant when the dimensionality increases. For the 60D functions, KPCA-BO achieves better results than PCA-BO for many test cases. Compared to the vanilla BO, it efficiently reduces the CPU time required to train the GPR model and to optimize the acquisition function compared to the vanilla BO.

K. Antonov—Work done while visiting Sorbonne Université in Paris.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Re-examining Supervised Dimension Reduction for High-Dimensional Bayesian Optimization

Composition of Kernel and Acquisition Functions for High Dimensional Bayesian Optimization

MADMM: A Generic Algorithm for Non-smooth Optimization on Manifolds

Notes

1.
The outer product is a linear operator defined as $\forall h\in \mathcal {H}, [\phi (\mathbf {x})\phi (\mathbf {x})^\top ](h):h\mapsto \langle \phi (\mathbf {x}), h \rangle _{\mathcal {H}}\phi (\mathbf {x})$. Hence, the sample covariance is also a linear operator $C:\mathcal {H} \rightarrow \mathcal {H} $.

References

Bellman, R.: Dynamic programming. Science 153(3731), 34–37 (1966). https://doi.org/10.1126/science.153.3731.34
Article MATH Google Scholar
Ben Salem, M., Bachoc, F., Roustant, O., Gamboa, F., Tomaso, L.: Sequential dimension reduction for learning features of expensive black-box functions (2019). https://hal.archives-ouvertes.fr/hal-01688329, preprint
Binois, M., Wycoff, N.: A survey on high-dimensional Gaussian process modeling with application to Bayesian optimization. arXiv:2111.05040 [math], November 2021
Bull, A.D.: Convergence rates of efficient global optimization algorithms. J. Mach. Learn. Res. 12, 2879–2904 (2011). http://dl.acm.org/citation.cfm?id=2078198
Byrd, R.H., Lu, P., Nocedal, J., Zhu, C.: A limited memory algorithm for bound constrained optimization. SIAM J. Sci. Comput. 16(5), 1190–1208 (1995). https://doi.org/10.1137/0916069
Article MathSciNet MATH Google Scholar
Delbridge, I., Bindel, D., Wilson, A.G.: Randomly Projected Additive Gaussian Processes for Regression. In: Proc. of the 37th International Conference on Machine Learning (ICML), pp. 2453–2463. PMLR, November 2020
Google Scholar
Duvenaud, D.K., Nickisch, H., Rasmussen, C.: Additive Gaussian Processes. In: Advances in Neural Information Processing Systems, vol. 24. Curran Associates, Inc. (2011)
Google Scholar
García-González, A., Huerta, A., Zlotnik, S., Díez, P.: A kernel principal component analysis (kpca) digest with a new backward mapping (pre-image reconstruction) strategy. CoRR abs/2001.01958 (2020)
Google Scholar
Gaudrie, D., Le Riche, R., Picheny, V., Enaux, B., Herbert, V.: Modeling and optimization with Gaussian processes in reduced eigenbases. Struct. Multidiscip. Optim. 61(6), 2343–2361 (2020). https://doi.org/10.1007/s00158-019-02458-6
Article MATH Google Scholar
Ginsbourger, D., Roustant, O., Schuhmacher, D., Durrande, N., Lenz, N.: On ANOVA decompositions of kernels and Gaussian random field paths. arXiv:1409.6008 [math, stat], October 2014
Guhaniyogi, R., Dunson, D.B.: Compressed gaussian process for manifold regression. J. Mach. Learn. Res. 17(69), 1–26 (2016). http://jmlr.org/papers/v17/14-230.html
Hansen, N., Auger, A., Ros, R., Mersmann, O., Tušar, T., Brockhoff, D.: COCO: a platform for comparing continuous optimizers in a black-box setting. Optimization Methods and Software, pp. 1–31 (2020)
Google Scholar
Hansen, N., Ostermeier, A.: Completely derandomized self-adaptation in evolution strategies. Evol. Comput. 9(2), 159–195 (2001). https://doi.org/10.1162/106365601750190398
Huang, W., Zhao, D., Sun, F., Liu, H., Chang, E.: Scalable Gaussian process regression using deep neural networks. In: Proceedings of the 24th International Conference on Artificial Intelligence (IJCAI), pp. 3576–3582. AAAI Press (2015)
Google Scholar
Jones, D.R., Schonlau, M., Welch, W.J.: Efficient global optimization of expensive black-box functions. J. Global Optim. 13(4), 455–492 (1998). https://doi.org/10.1023/A:1008306431147
Article MathSciNet MATH Google Scholar
Kapsoulis, D., Tsiakas, K., Asouti, V., Giannakoglou, K.C.: The use of kernel PCA in evolutionary optimization for computationally demanding engineering applications. In: 2016 IEEE Symposium Series on Computational Intelligence, SSCI 2016, Athens, Greece, December 6–9, 2016, pp. 1–8. IEEE (2016). https://doi.org/10.1109/SSCI.2016.7850203
Li, C., Gupta, S., Rana, S., Nguyen, V., Venkatesh, S., Shilton, A.: High dimensional bayesian optimization using dropout. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI), pp. 2096–2102. AAAI Press (2017)
Google Scholar
Močkus, J.: On bayesian methods for seeking the extremum. In: Marchuk, G.I. (ed.) Optimization Techniques 1974. LNCS, vol. 27, pp. 400–404. Springer, Heidelberg (1975). https://doi.org/10.1007/3-540-07165-2_55
Chapter Google Scholar
Muehlenstaedt, T., Roustant, O., Carraro, L., Kuhnt, S.: Data-driven Kriging models based on FANOVA-decomposition. Stat. Comput. 22(3), 723–738 (2012). https://doi.org/10.1007/s11222-011-9259-7
Article MathSciNet MATH Google Scholar
Niederreiter, H.: Low-discrepancy and low-dispersion sequences. J. Number Theory 30(1), 51–70 (1988)
Article MathSciNet Google Scholar
Raponi, E., Wang, H., Bujny, M., Boria, S., Doerr, C.: High dimensional bayesian optimization assisted by principal component analysis. In: Bäck, T., Preuss, M., Deutz, A., Wang, H., Doerr, C., Emmerich, M., Trautmann, H. (eds.) PPSN 2020. LNCS, vol. 12269, pp. 169–183. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58112-1_12
Chapter Google Scholar
Rasmussen, C.E., Williams, C.K.I.: Gaussian processes for machine learning. Adaptive computation and machine learning, MIT Press (2006), https://www.worldcat.org/oclc/61285753
Rolland, P., Scarlett, J., Bogunovic, I., Cevher, V.: High-dimensional bayesian optimization via additive models with overlapping groups. In: Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, pp. 298–307. PMLR, March 2018
Google Scholar
Santner, T.J., Williams, B.J., Notz, W.I.: The Design and Analysis of Computer Experiments. Springer (2003). https://doi.org/10.1007/978-1-4757-3799-8
Schölkopf, B., Smola, A., Müller, K.R.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput. 10(5), 1299–1319 (1998). https://doi.org/10.1162/089976698300017467
Article Google Scholar
Shahriari, B., Swersky, K., Wang, Z., Adams, R.P., de Freitas, N.: Taking the Human Out of the Loop: A Review of Bayesian Optimization. Proc. IEEE 104(1), 148–175 (2016). https://doi.org/10.1109/JPROC.2015.2494218
Article Google Scholar
Ulmasov, D., Baroukh, C., Chachuat, B., Deisenroth, M., Misener, R.: Bayesian optimization with dimension scheduling: application to biological systems. Comput. Aided Chem. Eng. 38, November 2015. https://doi.org/10.1016/B978-0-444-63428-3.50180-6
Wang, H., Vermetten, D., Ye, F., Doerr, C., Bäck, T.: IOHanalyzer: performance analysis for iterative optimization heuristic. ACM Trans. Evol. Learn. Optim. (2022). https://doi.org/10.1145/3510426
Article Google Scholar
Wang, Z., Hutter, F., Zoghi, M., Matheson, D., De Freitas, N.: Bayesian optimization in a billion dimensions via random embeddings (2016)
Google Scholar

Download references

Acknowledgments

Our work is supported by the Paris Ile-de-France region (via the DIM RFSI project AlgoSelect), by the CNRS INS2I institute (via the RandSearch project), by the PRIME programme of the German Academic Exchange Service (DAAD) with funds from the German Federal Ministry of Education and Research (BMBF), and by RFBR and CNRS, project number 20-51-15009.

Author information

Authors and Affiliations

ITMO University, Saint Petersburg, Russia
Kirill Antonov
TUM School of Engineering and Design, Technical University of Munich, Munich, Germany
Elena Raponi
LIACS Department, Leiden University, Leiden, Netherlands
Kirill Antonov & Hao Wang
Sorbonne Université, CNRS, LIP6, Paris, France
Elena Raponi & Carola Doerr

Authors

Kirill Antonov
View author publications
You can also search for this author in PubMed Google Scholar
Elena Raponi
View author publications
You can also search for this author in PubMed Google Scholar
Hao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Carola Doerr
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kirill Antonov .

Editor information

Editors and Affiliations

TU Dortmund, Dortmund, Germany
Günter Rudolph
Leiden University, Leiden, The Netherlands
Anna V. Kononova
Shinshu University, Nagano, Japan
Hernán Aguirre
Technische Universität Dresden, Dresden, Germany
Pascal Kerschke
University of Stirling, Stirling, UK
Gabriela Ochoa
Jožef Stefan Institute, Ljubljana, Slovenia
Tea Tušar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Antonov, K., Raponi, E., Wang, H., Doerr, C. (2022). High Dimensional Bayesian Optimization with Kernel Principal Component Analysis. In: Rudolph, G., Kononova, A.V., Aguirre, H., Kerschke, P., Ochoa, G., Tušar, T. (eds) Parallel Problem Solving from Nature – PPSN XVII. PPSN 2022. Lecture Notes in Computer Science, vol 13398. Springer, Cham. https://doi.org/10.1007/978-3-031-14714-2_9

Download citation

DOI: https://doi.org/10.1007/978-3-031-14714-2_9
Published: 14 August 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-14713-5
Online ISBN: 978-3-031-14714-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

High Dimensional Bayesian Optimization with Kernel Principal Component Analysis