Optimally Weighted Cluster Kriging for Big Data Regression

van Stein, Bas; Wang, Hao; Kowalczyk, Wojtek; Bäck, Thomas; Emmerich, Michael

doi:10.1007/978-3-319-24465-5_27

Bas van Stein¹⁶,
Hao Wang¹⁶,
Wojtek Kowalczyk¹⁶,
Thomas Bäck¹⁶ &
…
Michael Emmerich¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9385))

Included in the following conference series:

International Symposium on Intelligent Data Analysis

1738 Accesses
15 Citations

Abstract

In business and academia we are continuously trying to model and analyze complex processes in order to gain insight and optimize. One of the most popular modeling algorithms is Kriging, or Gaussian Processes. A major bottleneck with Kriging is the amount of processing time of at least $O(n^3)$ and memory required $O(n^2)$ when applying this algorithm on medium to big data sets. With big data sets, that are more and more available these days, Kriging is not computationally feasible. As a solution to this problem we introduce a hybrid approach in which a number of Kriging models built on disjoint subsets of the data are properly weighted for the predictions. The proposed model is both in processing time and memory much more efficient than standard Global Kriging and performs equally well in terms of accuracy. The proposed algorithm is better scalable, and well suited for parallelization.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

An Efficient Data Analysis Method for Big Data Using Multiple-Model Linear Regression

Nested Kriging predictions for datasets with a large number of observations

Article 26 July 2017

Properties and Comparison of Some Kriging Sub-model Aggregation Methods

Article 12 January 2022

Notes

1.
There are asymptotically faster algorithms for inverting a matrix. e.g. Strassen’s $O(n^{2.807})$ and Stothers $O(n^{2.373})$.

References

Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, New York (2004)
Book MATH Google Scholar
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
MATH Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article MATH Google Scholar
Bui, T.D., Turner, R.E.: Tree-structured Gaussian process approximations. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 27, pp. 2213–2221. Curran Associates Inc. (2014). http://papers.nips.cc/paper/5459-tree-structured-gaussian-process-approximations.pdf
Chalupka, K., Williams, C., Murray, I.: A Framework for Evaluating Approximation Methods for Gaussian Process Regression, pp. 1–18 (2012). arXiv preprint arXiv:1205.6326
Chen, T., Ren, J.: Bagging for Gaussian process regression. Neurocomputing 72(7–9), 1605–1610 (2009)
Article Google Scholar
D’Ambrosio, A., Aria, M., Siciliano, R.: Accurate tree-based missing data imputation and data fusion within the statistical learning paradigm. J. Classif. 29(2), 227–258 (2012)
Article MathSciNet MATH Google Scholar
Emmerich, M.: Single-and multi-objective evolutionary design optimization assisted by Gaussian random field metamodels. Ph.D. thesis, FB Informatik, TU Dortmund (2005)
Google Scholar
Forgy, E.W.: Cluster analysis of multivariate data: efficiency versus interpretability of classifications. Biometrics 21, 768–769 (1965)
Google Scholar
Fortin, F., Michel, F., Gardner, M.A., Parizeau, M., Gagné, C.: DEAP: evolutionary algorithms made easy. J. Mach. Learn. Res. 13, 2171–2175 (2012)
MathSciNet MATH Google Scholar
Ginsbourger, D., Le Riche, R., Carraro, L.: Kriging is well-suited to parallelize optimization. In: Tenne, Y., Goh, C.-K. (eds.) Computational Intel. in Expensive Opti. Prob. ALO, vol. 2, pp. 131–162. Springer, Heidelberg (2010)
Chapter Google Scholar
Hartman, L., Hössjer, O.: Fast kriging of large data sets with Gaussian Markov random fields. Comput. Stat. Data Analy. 52(5), 2331–2349 (2008)
Article MathSciNet MATH Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics. Springer, New York (2001)
Book MATH Google Scholar
Hensman, J., Sheffield, U., Fusi, N., Lawrence, N.: Gaussian processes for big data. In: Proceedings of UAI, vol. 29, pp. 282–290 (2013)
Google Scholar
Huang, G., Wang, D.H., Lan, Y.: Extreme learning machines: a survey. Int. J. Mach. Learn. Cybern. 2(2), 107–122 (2011)
Article Google Scholar
Krige, D.G.: A statistical approach to some basic mine valuation problems on the Witwatersrand. J. Chem. Metall. Min. Soc. S. Af. 52(6), 119–139 (1951)
Google Scholar
Lindsay, B.: Mixture models: theory, geometry, and applications. In: Conference Board of the Mathematical Sciences: NSF-CBMS Regional Conference Series in Probability and Statistics, Institute of Mathematical Statistics (1995)
Google Scholar
Memarsadeghi, N., Raykar, V.C., Duraiswami, R., Mount, D.M.: Efficient kriging via fast matrix-vector products. In: 2008 IEEE, Aerospace Conference, pp. 1–7. IEEE (2008)
Google Scholar
Nguyen-Tuong, D., Seeger, M., Peters, J.: Model learning with local Gaussian process regression. Adv. Rob. 23(15), 2015–2034 (2009)
Article Google Scholar
Powell, M.J.D.: A direct search optimization method that models the objective and constraint functions by linear interpolation. In: Gomez, S., Hennart, J.-P. (eds.) Advances in Optimization and Numerical Analysis, pp. 51–67. Springer, Boston (1994)
Chapter Google Scholar
Rasmussen, C., Williams, C.: Gaussian Processes for Machine Learning. Adaptative Computation and Machine Learning Series. University Press Group Limited, New Era Estate (2006)
MATH Google Scholar
Stein, M.L.: Interpolation of Spatial Data: Some Theory for Kriging. Springer Science & Business Media, New York (1999)
Book MATH Google Scholar

Download references

Author information

Authors and Affiliations

Leiden Institute of Advanced Computer Science, Leiden University, Niels Bohrweg 1, Leiden, The Netherlands
Bas van Stein, Hao Wang, Wojtek Kowalczyk, Thomas Bäck & Michael Emmerich

Authors

Bas van Stein
View author publications
You can also search for this author in PubMed Google Scholar
Hao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Wojtek Kowalczyk
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Bäck
View author publications
You can also search for this author in PubMed Google Scholar
Michael Emmerich
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bas van Stein .

Editor information

Editors and Affiliations

Université de Saint-Etienne, Saint-Etienne, France
Elisa Fromont
Intelligent Systems Lab, University of Bristol Intelligent Systems Lab, Bristol, United Kingdom
Tijl De Bie
Informatics Section, Katholieke Universiteit Leuven, Leuven, Belgium
Matthijs van Leeuwen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

van Stein, B., Wang, H., Kowalczyk, W., Bäck, T., Emmerich, M. (2015). Optimally Weighted Cluster Kriging for Big Data Regression. In: Fromont, E., De Bie, T., van Leeuwen, M. (eds) Advances in Intelligent Data Analysis XIV. IDA 2015. Lecture Notes in Computer Science(), vol 9385. Springer, Cham. https://doi.org/10.1007/978-3-319-24465-5_27

Download citation

DOI: https://doi.org/10.1007/978-3-319-24465-5_27
Published: 22 November 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24464-8
Online ISBN: 978-3-319-24465-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics