Skip to main content

Optimally Weighted Cluster Kriging for Big Data Regression

  • Conference paper
  • First Online:
Advances in Intelligent Data Analysis XIV (IDA 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9385))

Included in the following conference series:

Abstract

In business and academia we are continuously trying to model and analyze complex processes in order to gain insight and optimize. One of the most popular modeling algorithms is Kriging, or Gaussian Processes. A major bottleneck with Kriging is the amount of processing time of at least \(O(n^3)\) and memory required \(O(n^2)\) when applying this algorithm on medium to big data sets. With big data sets, that are more and more available these days, Kriging is not computationally feasible. As a solution to this problem we introduce a hybrid approach in which a number of Kriging models built on disjoint subsets of the data are properly weighted for the predictions. The proposed model is both in processing time and memory much more efficient than standard Global Kriging and performs equally well in terms of accuracy. The proposed algorithm is better scalable, and well suited for parallelization.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    There are asymptotically faster algorithms for inverting a matrix. e.g. Strassen’s \(O(n^{2.807})\) and Stothers \(O(n^{2.373})\).

References

  1. Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, New York (2004)

    Book  MATH  Google Scholar 

  2. Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)

    MATH  Google Scholar 

  3. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  4. Bui, T.D., Turner, R.E.: Tree-structured Gaussian process approximations. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 27, pp. 2213–2221. Curran Associates Inc. (2014). http://papers.nips.cc/paper/5459-tree-structured-gaussian-process-approximations.pdf

  5. Chalupka, K., Williams, C., Murray, I.: A Framework for Evaluating Approximation Methods for Gaussian Process Regression, pp. 1–18 (2012). arXiv preprint arXiv:1205.6326

  6. Chen, T., Ren, J.: Bagging for Gaussian process regression. Neurocomputing 72(7–9), 1605–1610 (2009)

    Article  Google Scholar 

  7. D’Ambrosio, A., Aria, M., Siciliano, R.: Accurate tree-based missing data imputation and data fusion within the statistical learning paradigm. J. Classif. 29(2), 227–258 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  8. Emmerich, M.: Single-and multi-objective evolutionary design optimization assisted by Gaussian random field metamodels. Ph.D. thesis, FB Informatik, TU Dortmund (2005)

    Google Scholar 

  9. Forgy, E.W.: Cluster analysis of multivariate data: efficiency versus interpretability of classifications. Biometrics 21, 768–769 (1965)

    Google Scholar 

  10. Fortin, F., Michel, F., Gardner, M.A., Parizeau, M., Gagné, C.: DEAP: evolutionary algorithms made easy. J. Mach. Learn. Res. 13, 2171–2175 (2012)

    MathSciNet  MATH  Google Scholar 

  11. Ginsbourger, D., Le Riche, R., Carraro, L.: Kriging is well-suited to parallelize optimization. In: Tenne, Y., Goh, C.-K. (eds.) Computational Intel. in Expensive Opti. Prob. ALO, vol. 2, pp. 131–162. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  12. Hartman, L., Hössjer, O.: Fast kriging of large data sets with Gaussian Markov random fields. Comput. Stat. Data Analy. 52(5), 2331–2349 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  13. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics. Springer, New York (2001)

    Book  MATH  Google Scholar 

  14. Hensman, J., Sheffield, U., Fusi, N., Lawrence, N.: Gaussian processes for big data. In: Proceedings of UAI, vol. 29, pp. 282–290 (2013)

    Google Scholar 

  15. Huang, G., Wang, D.H., Lan, Y.: Extreme learning machines: a survey. Int. J. Mach. Learn. Cybern. 2(2), 107–122 (2011)

    Article  Google Scholar 

  16. Krige, D.G.: A statistical approach to some basic mine valuation problems on the Witwatersrand. J. Chem. Metall. Min. Soc. S. Af. 52(6), 119–139 (1951)

    Google Scholar 

  17. Lindsay, B.: Mixture models: theory, geometry, and applications. In: Conference Board of the Mathematical Sciences: NSF-CBMS Regional Conference Series in Probability and Statistics, Institute of Mathematical Statistics (1995)

    Google Scholar 

  18. Memarsadeghi, N., Raykar, V.C., Duraiswami, R., Mount, D.M.: Efficient kriging via fast matrix-vector products. In: 2008 IEEE, Aerospace Conference, pp. 1–7. IEEE (2008)

    Google Scholar 

  19. Nguyen-Tuong, D., Seeger, M., Peters, J.: Model learning with local Gaussian process regression. Adv. Rob. 23(15), 2015–2034 (2009)

    Article  Google Scholar 

  20. Powell, M.J.D.: A direct search optimization method that models the objective and constraint functions by linear interpolation. In: Gomez, S., Hennart, J.-P. (eds.) Advances in Optimization and Numerical Analysis, pp. 51–67. Springer, Boston (1994)

    Chapter  Google Scholar 

  21. Rasmussen, C., Williams, C.: Gaussian Processes for Machine Learning. Adaptative Computation and Machine Learning Series. University Press Group Limited, New Era Estate (2006)

    MATH  Google Scholar 

  22. Stein, M.L.: Interpolation of Spatial Data: Some Theory for Kriging. Springer Science & Business Media, New York (1999)

    Book  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bas van Stein .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

van Stein, B., Wang, H., Kowalczyk, W., Bäck, T., Emmerich, M. (2015). Optimally Weighted Cluster Kriging for Big Data Regression. In: Fromont, E., De Bie, T., van Leeuwen, M. (eds) Advances in Intelligent Data Analysis XIV. IDA 2015. Lecture Notes in Computer Science(), vol 9385. Springer, Cham. https://doi.org/10.1007/978-3-319-24465-5_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-24465-5_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-24464-8

  • Online ISBN: 978-3-319-24465-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics