Abstract
Gaussian Processes (GP) are a powerful framework for modeling expensive black-box functions and have thus been adopted for various challenging modeling and optimization problems. In GP-based modeling, we typically default to a stationary covariance kernel to model the underlying function over the input domain, but many real-world applications, such as controls and cyber-physical system safety, often require modeling and optimization of functions that are locally stationary and globally non-stationary across the domain; using standard GPs with a stationary kernel often yields poor modeling performance in such scenarios. In this paper, we propose a novel modeling technique called Class-GP (Class Gaussian Process) to model a class of heterogeneous functions, i.e., non-stationary functions which can be divided into locally stationary functions over the partitions of input space with one active stationary function in each partition. We provide theoretical insights into the modeling power of Class-GP and demonstrate its benefits over standard modeling techniques via extensive empirical evaluations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Routledge, Milton Park (2017)
Candelieri, A., Pedrielli, G.: Treed-gaussian processes with support vector machines as nodes for nonstationary Bayesian optimization. In: 2021 Winter Simulation Conference (WSC), pp. 1–12. IEEE (2021)
Davis, C.B., Hans, C.M., Santner, T.J.: Prediction of non-stationary response functions using a Bayesian composite gaussian process. Comput. Stat. Data Anal. 154, 107083 (2021)
Fuentes, M., Smith, R.L.: A new class of nonstationary spatial models. Technical report, North Carolina State University, Department of Statistics (2001)
Gibbs, M.N.: Bayesian Gaussian processes for regression and classification. Ph.D. thesis, Citeseer (1998)
Gramacy, R.B., Lee, H.K.H.: Bayesian treed gaussian process models with an application to computer modeling. J. Am. Stat. Assoc. 103(483), 1119–1130 (2008)
Hansen, N., Ostermeier, A.: Completely derandomized self-adaptation in evolution strategies. Evol. Comput. 9(2), 159–195 (2001)
Heinonen, M., Mannerström, H., Rousu, J., Kaski, S., Lähdesmäki, H.: Non-stationary gaussian process regression with hamiltonian monte carlo. In: Gretton, A., Robert, C.C. (eds.) Proceedings of the 19th International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, Cadiz, Spain, vol. 51, pp. 732–740. PMLR (2016)
Jones, D.R., Perttunen, C.D., Stuckman, B.E.: Lipschitzian optimization without the lipschitz constant. J. Optim. Theory Appl. 79(1), 157–181 (1993)
Kim, H.M., Mallick, B.K., Holmes, C.C.: Analyzing nonstationary spatial data using piecewise gaussian processes. J. Am. Stat. Assoc. 100(470), 653–668 (2005)
Lederer, A., Umlauft, J., Hirche, S.: Uniform error bounds for gaussian process regression with application to safe control. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Liu, D.C., Nocedal, J.: On the limited memory BFGS method for large scale optimization. Math. Program. 45(1), 503–528 (1989)
Loh, W.Y.: Classification and regression trees. Wiley Interdiscip. Rev. Data Mining Knowl. Discov. 1(1), 14–23 (2011)
Malu, M., Dasarathy, G., Spanias, A.: Bayesian optimization in high-dimensional spaces: a brief survey. In: 2021 12th International Conference on Information, Intelligence, Systems & Applications (IISA), pp. 1–8. IEEE (2021)
Marmin, S., Ginsbourger, D., Baccou, J., Liandrat, J.: Warped gaussian processes and derivative-based sequential designs for functions with heterogeneous variations. SIAM/ASA J. Uncertain. Quantif. 6(3), 991–1018 (2018)
Mathesen, L., Yaghoubi, S., Pedrielli, G., Fainekos, G.: Falsification of cyber-physical systems with robustness uncertainty quantification through stochastic optimization with adaptive restart. In: 2019 IEEE 15th International Conference on Automation Science and Engineering (CASE), pp. 991–997. IEEE (2019)
Paciorek, C.J., Schervish, M.J.: Spatial modelling using a new class of nonstationary covariance functions. Environmetrics Official J. Int. Environ. Soc. 17(5), 483–506 (2006)
Paciorek, C.J.: Nonstationary Gaussian processes for regression and spatial modelling. Ph.D. thesis, Carnegie Mellon University (2003)
Pope, C.A., et al.: Gaussian process modeling of heterogeneity and discontinuities using voronoi tessellations. Technometrics 63(1), 53–63 (2021)
Rasmussen, C.E.: Gaussian processes in machine learning. In: Bousquet, O., von Luxburg, U., Rätsch, G. (eds.) ML -2003. LNCS (LNAI), vol. 3176, pp. 63–71. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-28650-9_4
Schmidt, A.M., O’Hagan, A.: Bayesian inference for non-stationary spatial covariance structure via spatial deformations. J. Roy. Stat. Soc. Ser. B (Stat. Methodol.) 65(3), 743–758 (2003)
Schulz, E., Speekenbrink, M., Krause, A.: A tutorial on gaussian process regression: modelling, exploring, and exploiting functions. J. Math. Psychol. 85, 1–16 (2018)
Acknowledgements
This work is supported in part by National Science Foundation (NSF) under the awards 2200161, 2048223, 2003111, 2046588, 2134256, 1815361, 2031799, 2205080, 1901243, 1540040, 2003111, 2048223, by DARPA ARCOS program under contract FA8750-20-C-0507, Lockheed Martin funded contract FA8750-22-9-0001, and the SenSIP Center.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Appendix
A Appendix
Proof sketch for the Theorem 1 follows along the lines of the proof of Theorem 3.1 in [11]. We get probabilistic uniform error bounds for GPs in each partitions \(j \in [p]\) from [11] and we use per partition based bounds to bound the over all function and to derive bound on \(L_1\) norm. The proof for the theorem and corollary given as follows:
Proof
1 Following bounds on each partition holds with probability \(1-\delta _j\)
where \(\beta _j(r)\) and \(\gamma _j(r)\) are given as follows
Now to bound the entire function lets look at the difference \(|f(\textbf{x}) - \mu _n(\textbf{x})|\).
The last inequality (17) follows from (12) and holds with probability \(1-\delta \), where \(\delta = \sum _{j=1}^{p} \mathbb {1}\{x\in \mathcal {X}_j\} \delta _j\).
Now, redefining \(\sum _{j=1}^{p} \mathbb {1}\{x\in \mathcal {X}_j\} \left( \sqrt{\beta _j(r)}\sigma _{n_j}(\textbf{x})\right) = \sqrt{\beta (r)}\sigma _{n}(\textbf{x})\) and
\(\sum _{j=1}^{p} \mathbb {1}\{x\in \mathcal {X}_j\} \gamma _j(r) = \gamma _(r)\), we have the result. \(\square \)
The proof for the Corollary 1 uses the high confidence bound 10 and is given as follows:
Proof
We know that \(L_1\) norm is given by
where \(\delta = \sum _{j=1}^p \delta _j\) and \(\delta _j = 1 - M(r,\mathcal {X}_j)\exp (-\beta _j(r)/2)\). \(\square \)
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Malu, M., Pedrielli, G., Dasarathy, G., Spanias, A. (2023). Class GP: Gaussian Process Modeling for Heterogeneous Functions. In: Sellmann, M., Tierney, K. (eds) Learning and Intelligent Optimization. LION 2023. Lecture Notes in Computer Science, vol 14286. Springer, Cham. https://doi.org/10.1007/978-3-031-44505-7_28
Download citation
DOI: https://doi.org/10.1007/978-3-031-44505-7_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44504-0
Online ISBN: 978-3-031-44505-7
eBook Packages: Computer ScienceComputer Science (R0)