skip to main content
10.1145/3340531.3412182acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
short-paper

Automatic Gaussian Process Model Retrieval for Big Data

Published: 19 October 2020 Publication History

Abstract

Gaussian Process Models (GPMs) are widely regarded as a prominent tool for capturing the inherent characteristics of data. These bayesian machine learning models allow for data analysis tasks such as regression and classification. Usually a process of automatic GPM retrieval is needed to find an optimal model for a given dataset, despite prevailing default instantiations and existing prior knowledge in some scenarios, which both shortcut the way to an optimal GPM. Since non-approximative Gaussian Processes only allow for processing small datasets with low statistical versatility, we propose a new approach that allows to efficiently and automatically retrieve GPMs for large-scale data. The resulting model is composed of independent statistical representations for non-overlapping segments of the given data. Our performance evaluation of the new approach demonstrates the quality of resulting models, which clearly outperform default GPM instantiations, while maintaining reasonable model training time.

Supplementary Material

MP4 File (3340531.3412182.mp4)
Gaussian Process Models (GPMs) are widely regarded as a prominent tool for capturing the inherent characteristics of data. These bayesian machine learning models allow for data analysis tasks such as regression and classification. Usually a process of automatic GPM retrieval is needed to find an optimal model for a given dataset, despite prevailing default instantiations and existing prior knowledge in some scenarios, which both shortcut the way to an optimal GPM. Since non-approximative Gaussian Processes only allow for processing small datasets with low statistical versatility, we propose a new approach that allows to efficiently and automatically retrieve GPMs for large-scale data. The resulting model is composed of independent statistical representations for non-overlapping segments of the given data. Our performance evaluation of the new approach demonstrates the quality of resulting models, which clearly outperform default GPM instantiations, while maintaining reasonable model training time.

References

[1]
Mumtaz Ahmad, Songyun Duan, Ashraf Aboulnaga, and Shivnath Babu. 2011. Predicting completion times of batch query workloads using interaction-aware models and simulation. In EDBT. ACM, 449--460.
[2]
Dana Van Aken, Andrew Pavlo, Geoffrey J. Gordon, and Bohan Zhang. 2017. Automatic Database Management System Tuning Through Large-scale Machine Learning. In SIGMOD Conference. ACM, 1009--1024.
[3]
Pablo A. Alvarado and Dan Stowell. 2016. Gaussian processes for music audio modelling and content analysis. In MLSP. IEEE, 1--6.
[4]
Christian Beecks, Kjeld Willy Schmidt, Fabian Berns, and Alexander Graß. 2019. Gaussian Processes for Anomaly Description in Production Environments. In EDBT/ICDT Workshops (CEUR Workshop Proceedings), Vol. 2322.
[5]
Fabian Berns and Christian Beecks. 2020. Towards Large-scale Gaussian Process Models for Efficient Bayesian Machine Learning. In Proceedings of the 9th International Conference on Data Science, Technology and Applications.
[6]
Fabian Berns, Kjeld Willy Schmidt, Alexander Grass, and Christian Beecks. 2019. A New Approach for Efficient Structure Discovery in IoT. In BigData. IEEE, 4152--4156.
[7]
Roberto Calandra, Jan Peters, Carl Edward Rasmussen, and Marc Peter Deisenroth. 2016. Manifold Gaussian Processes for regression. In IJCNN. IEEE, 3338--3345.
[8]
Ching-An Cheng and Byron Boots. 2017. Variational Inference for Gaussian Process Models with Linear Complexity. NIPS. 5184--5194.
[9]
Kai-Wen Cheng, Yie-Tarng Chen, and Wen-Hsien Fang. 2015. Video anomaly detection and localization using hierarchical feature representation and Gaussian process regression. In CVPR. IEEE, 2909--2917.
[10]
Lehel Csató and Manfred Opper. 2000. Sparse Representation for Gaussian Process Models. NIPS. MIT Press, 444--450.
[11]
Andreas C. Damianou, Michalis K. Titsias, and Neil D. Lawrence. 2011. Variational Gaussian Process Dynamical Systems. NIPS. 2510--2518.
[12]
Abhirup Datta, Sudipto Banerjee, Andrew O. Finley, and Alan E. Gelfand. 2016. Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets. J. Amer. Statist. Assoc., Vol. 111, 514 (2016), 800--812.
[13]
Alex Gittens and Michael W. Mahoney. 2016. Revisiting the Nystrom Method for Improved Large-scale Machine Learning. J. Mach. Learn. Res., Vol. 17 (2016), 117:1--117:65.
[14]
Kohei Hayashi, Masaaki Imaizumi, and Yuichi Yoshida. 2020. On Random Subsampling of Gaussian Process Regression: A Graphon-Based Analysis. In AISTATS (Proceedings of Machine Learning Research), Vol. 108. PMLR, 2055--2065.
[15]
Georges Hebrail and Alice Berard. 2012. Individual household electric power consumption Data Set. (2012). https://archive.ics.uci.edu/ml/datasets/individual+household+electric+power+consumption
[16]
James Hensman, Nicoló Fusi, and Neil D. Lawrence. 2013. Gaussian Processes for Big Data. In UAI. AUAI Press.
[17]
Geoffrey E. Hinton. 2002. Training Products of Experts by Minimizing Contrastive Divergence. Neural Computation, Vol. 14, 8 (2002), 1771--1800.
[18]
Tao Hong, Pierre Pinson, and Shu Fan. 2014. Global Energy Forecasting Competition 2012. International Journal of Forecasting, Vol. 30, 2 (2014), 357--363.
[19]
Anton I. Iliev, Nikolay Kyurkchiev, and S. Markov. 2017. On the approximation of the step function by some sigmoid functions. Math. Comput. Simul., Vol. 133 (2017), 223--234.
[20]
Hyun-Chul Kim and Jaewook Lee. 2007. Clustering Based on Gaussian Processes. Neural Computation, Vol. 19, 11 (2007), 3088--3107.
[21]
Hyunjik Kim and Yee Whye Teh. 2018. Scaling up the Automatic Statistician: Scalable Structure Discovery using Gaussian Processes. In AISTATS (Proceedings of Machine Learning Research), Vol. 84. PMLR, 575--584.
[22]
Donghoon Lee, Hyunsin Park, and Chang Dong Yoo. 2015. Face alignment using cascade Gaussian process regression trees. In CVPR. IEEE, 4204--4212.
[23]
Jaehoon Lee, Yasaman Bahri, Roman Novak, Samuel S. Schoenholz, Jeffrey Pennington, and Jascha Sohl-Dickstein. 2018. Deep Neural Networks as Gaussian Processes. In ICLR (Poster).
[24]
Steven Cheng-Xian Li and Benjamin M. Marlin. 2016. A scalable end-to-end Gaussian process adapter for irregularly sampled time series classification. NIPS. 1804--1812.
[25]
Haitao Liu, Yew-Soon Ong, Xiaobo Shen, and Jianfei Cai. 2018. When Gaussian Process Meets Big Data: A Review of Scalable GPs. CoRR, Vol. abs/1807.01065 (2018).
[26]
James Robert Lloyd, David Duvenaud, Roger B. Grosse, Joshua B. Tenenbaum, and Zoubin Ghahramani. 2014. Automatic Construction and Natural-Language Description of Nonparametric Regression Models. In AAAI. 1242--1250.
[27]
Saeed Masoudnia and Reza Ebrahimpour. 2014. Mixture of experts: a literature survey. Artif. Intell. Rev., Vol. 42, 2 (2014), 275--293.
[28]
Max Planck Institute for Biogeochemistry. 2019. Weather Station Beutenberg / Weather Station Saaleaue: Jena Weather Data Analysis. (2019). https://www.bgc-jena.mpg.de/wetter/
[29]
Tony A. Plate. 1999. Accuracy Versus Interpretability in Flexible Modeling: Implementing a Tradeoff Using Gaussian Process Models. Behaviormetrika, Vol. 26, 1 (1999), 29--50.
[30]
Zhe Qiang and Jinwen Ma. 2015. Automatic Model Selection of the Mixtures of Gaussian Processes for Regression. In ISNN, Vol. 9377. Springer, 335--344.
[31]
C. E. Rasmussen and C. K. I. Williams. 2006. Gaussian Processes for Machine Learning (Adaptive Computation And Machine Learning). The MIT Press.
[32]
Rodrigo Rivera and Evgeny Burnaev. 2017. Forecasting of Commercial Sales with Large Scale Gaussian Processes. In ICDM Workshops. IEEE, 625--634.
[33]
S. Roberts, M. Osborne, M. Ebden, S. Reece, N. Gibson, and S. Aigrain. 2013. Gaussian processes for time-series modelling. Philosophical transactions. Series A, Mathematical, physical, and engineering sciences, Vol. 371, 1984 (2013).
[34]
Edward Snelson and Zoubin Ghahramani. 2007. Local and global sparse Gaussian process approximations. In AISTATS (JMLR Proceedings), Vol. 2. 524--531.
[35]
Michalis K. Titsias. 2009. Variational Learning of Inducing Variables in Sparse Gaussian Processes. In AISTATS (JMLR Proceedings), Vol. 5. 567--574.
[36]
Michalis K. Titsias and Neil D. Lawrence. 2010. Bayesian Gaussian Process Latent Variable Model. AISTATS (JMLR Proceedings), Vol. 9. 844--851.
[37]
Charles Truong, Laurent Oudre, and Nicolas Vayatis. 2020. Selective review of offline change point detection methods. Signal Process., Vol. 167 (2020).
[38]
Pinar Tüfekci. 2014. Prediction of full load electrical power output of a base load operated combined cycle power plant using machine learning methods. International Journal of Electrical Power & Energy Systems, Vol. 60 (2014), 126--140.
[39]
Raquel Urtasun and Trevor Darrell. 2007. Discriminative Gaussian process latent variable model for classification. In ICML, Vol. 227. ACM, 927--934.
[40]
Andrew Gordon Wilson and Ryan Prescott Adams. 2013. Gaussian Process Kernels for Pattern Discovery and Extrapolation. In ICML (3) (JMLR Workshop and Conference Proceedings), Vol. 28. JMLR.org, 1067--1075.
[41]
Dongkuan Xu and Yingjie Tian. 2015. A Comprehensive Survey of Clustering Algorithms. Annals of Data Science, Vol. 2, 2 (2015).

Cited By

View all
  • (2023)On Kernel Search Based Gaussian Process Anomaly DetectionInnovative Intelligent Industrial Production and Logistics10.1007/978-3-031-37228-5_1(1-23)Online publication date: 7-Jul-2023
  • (2022)Constraining Gaussian processes to systems of linear ordinary differential equationsProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602401(29386-29399)Online publication date: 28-Nov-2022
  • (2022)Automated Model Inference for Gaussian Processes: An Overview of State-of-the-Art Methods and AlgorithmsSN Computer Science10.1007/s42979-022-01186-x3:4Online publication date: 21-May-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management
October 2020
3619 pages
ISBN:9781450368599
DOI:10.1145/3340531
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 October 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. bayesian machine learning
  2. gaussian processes
  3. information retrieval
  4. performance evaluation
  5. regression

Qualifiers

  • Short-paper

Conference

CIKM '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)32
  • Downloads (Last 6 weeks)1
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)On Kernel Search Based Gaussian Process Anomaly DetectionInnovative Intelligent Industrial Production and Logistics10.1007/978-3-031-37228-5_1(1-23)Online publication date: 7-Jul-2023
  • (2022)Constraining Gaussian processes to systems of linear ordinary differential equationsProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602401(29386-29399)Online publication date: 28-Nov-2022
  • (2022)Automated Model Inference for Gaussian Processes: An Overview of State-of-the-Art Methods and AlgorithmsSN Computer Science10.1007/s42979-022-01186-x3:4Online publication date: 21-May-2022
  • (2022)Dynamically Self-adjusting Gaussian Processes for Data Stream ModellingKI 2022: Advances in Artificial Intelligence10.1007/978-3-031-15791-2_10(96-114)Online publication date: 19-Sep-2022
  • (2021)3CS Algorithm for Efficient Gaussian Process Model Retrieval2020 25th International Conference on Pattern Recognition (ICPR)10.1109/ICPR48806.2021.9412805(1773-1780)Online publication date: 10-Jan-2021
  • (2021)LOGIC: Probabilistic Machine Learning for Time Series Classification2021 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM51629.2021.00113(1000-1005)Online publication date: Dec-2021
  • (2021)Machine Learning for Storage Location Prediction in Industrial High Bay WarehousesPattern Recognition. ICPR International Workshops and Challenges10.1007/978-3-030-68799-1_47(650-661)Online publication date: 10-Jan-2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media