Abstract
In business and academia we are continuously trying to model and analyze complex processes in order to gain insight and optimize. One of the most popular modeling algorithms is Kriging, or Gaussian Processes. A major bottleneck with Kriging is the amount of processing time of at least \(O(n^3)\) and memory required \(O(n^2)\) when applying this algorithm on medium to big data sets. With big data sets, that are more and more available these days, Kriging is not computationally feasible. As a solution to this problem we introduce a hybrid approach in which a number of Kriging models built on disjoint subsets of the data are properly weighted for the predictions. The proposed model is both in processing time and memory much more efficient than standard Global Kriging and performs equally well in terms of accuracy. The proposed algorithm is better scalable, and well suited for parallelization.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
There are asymptotically faster algorithms for inverting a matrix. e.g. Strassen’s \(O(n^{2.807})\) and Stothers \(O(n^{2.373})\).
References
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, New York (2004)
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Bui, T.D., Turner, R.E.: Tree-structured Gaussian process approximations. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 27, pp. 2213–2221. Curran Associates Inc. (2014). http://papers.nips.cc/paper/5459-tree-structured-gaussian-process-approximations.pdf
Chalupka, K., Williams, C., Murray, I.: A Framework for Evaluating Approximation Methods for Gaussian Process Regression, pp. 1–18 (2012). arXiv preprint arXiv:1205.6326
Chen, T., Ren, J.: Bagging for Gaussian process regression. Neurocomputing 72(7–9), 1605–1610 (2009)
D’Ambrosio, A., Aria, M., Siciliano, R.: Accurate tree-based missing data imputation and data fusion within the statistical learning paradigm. J. Classif. 29(2), 227–258 (2012)
Emmerich, M.: Single-and multi-objective evolutionary design optimization assisted by Gaussian random field metamodels. Ph.D. thesis, FB Informatik, TU Dortmund (2005)
Forgy, E.W.: Cluster analysis of multivariate data: efficiency versus interpretability of classifications. Biometrics 21, 768–769 (1965)
Fortin, F., Michel, F., Gardner, M.A., Parizeau, M., Gagné, C.: DEAP: evolutionary algorithms made easy. J. Mach. Learn. Res. 13, 2171–2175 (2012)
Ginsbourger, D., Le Riche, R., Carraro, L.: Kriging is well-suited to parallelize optimization. In: Tenne, Y., Goh, C.-K. (eds.) Computational Intel. in Expensive Opti. Prob. ALO, vol. 2, pp. 131–162. Springer, Heidelberg (2010)
Hartman, L., Hössjer, O.: Fast kriging of large data sets with Gaussian Markov random fields. Comput. Stat. Data Analy. 52(5), 2331–2349 (2008)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics. Springer, New York (2001)
Hensman, J., Sheffield, U., Fusi, N., Lawrence, N.: Gaussian processes for big data. In: Proceedings of UAI, vol. 29, pp. 282–290 (2013)
Huang, G., Wang, D.H., Lan, Y.: Extreme learning machines: a survey. Int. J. Mach. Learn. Cybern. 2(2), 107–122 (2011)
Krige, D.G.: A statistical approach to some basic mine valuation problems on the Witwatersrand. J. Chem. Metall. Min. Soc. S. Af. 52(6), 119–139 (1951)
Lindsay, B.: Mixture models: theory, geometry, and applications. In: Conference Board of the Mathematical Sciences: NSF-CBMS Regional Conference Series in Probability and Statistics, Institute of Mathematical Statistics (1995)
Memarsadeghi, N., Raykar, V.C., Duraiswami, R., Mount, D.M.: Efficient kriging via fast matrix-vector products. In: 2008 IEEE, Aerospace Conference, pp. 1–7. IEEE (2008)
Nguyen-Tuong, D., Seeger, M., Peters, J.: Model learning with local Gaussian process regression. Adv. Rob. 23(15), 2015–2034 (2009)
Powell, M.J.D.: A direct search optimization method that models the objective and constraint functions by linear interpolation. In: Gomez, S., Hennart, J.-P. (eds.) Advances in Optimization and Numerical Analysis, pp. 51–67. Springer, Boston (1994)
Rasmussen, C., Williams, C.: Gaussian Processes for Machine Learning. Adaptative Computation and Machine Learning Series. University Press Group Limited, New Era Estate (2006)
Stein, M.L.: Interpolation of Spatial Data: Some Theory for Kriging. Springer Science & Business Media, New York (1999)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
van Stein, B., Wang, H., Kowalczyk, W., Bäck, T., Emmerich, M. (2015). Optimally Weighted Cluster Kriging for Big Data Regression. In: Fromont, E., De Bie, T., van Leeuwen, M. (eds) Advances in Intelligent Data Analysis XIV. IDA 2015. Lecture Notes in Computer Science(), vol 9385. Springer, Cham. https://doi.org/10.1007/978-3-319-24465-5_27
Download citation
DOI: https://doi.org/10.1007/978-3-319-24465-5_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24464-8
Online ISBN: 978-3-319-24465-5
eBook Packages: Computer ScienceComputer Science (R0)