Skip to main content

State Space Approximation of Gaussian Processes for Time Series Forecasting

  • Conference paper
  • First Online:
Advanced Analytics and Learning on Temporal Data (AALTD 2021)

Abstract

Gaussian Processes (GPs), with a complex enough additive kernel, provide competitive results in time series forecasting compared to state-of-the-art approaches (arima, ETS) provided that: (i) during training the unnecessary components of the kernel are made irrelevant by automatic relevance determination; (ii) priors are assigned to each hyperparameter. However, GPs computational complexity grows cubically in time and quadratically in memory with the number of observations. The state space (SS) approximation of GPs allows to compute GPs based inferences with linear complexity. In this paper, we apply the SS representation to time series forecasting showing that SS models provide a performance comparable with that of full GP and better than state-of-the-art models (arima, ETS). Moreover, the SS representation allows us to derive new models by, for instance, combining ETS with kernels.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 44.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 59.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    A GP prior with zero mean function and covariance function \(k_{\boldsymbol{\theta }}:\mathbb {R}^p\times \mathbb {R}^p \rightarrow \mathbb {R}^+\), which depends on a vector of hyperparameters \(\boldsymbol{\theta }\).

  2. 2.

    In this work, we include the additive noise v into the kernel by adding a White noise kernel term.

  3. 3.

    A stationary kernel is one which is translation invariant: \( k_{\boldsymbol{\theta }}(x_1, x_2)\) depends only on \(x_1-x_2\), like for instance the Matern and RBF kernels.

  4. 4.

    m is a latent dimension which defines the dimension of the state space. The state is a function of tim.

  5. 5.

    The matrix exponential is \(e^A=I+A+A^2/2!+A^3/3!+\dots \) and, for many matrices A, it can be computed analytically.

  6. 6.

    We also tried a more accurate approximation of the periodic kernel, 11 COS kernels, but it did not provide a significant better performance in the M3 competition.

  7. 7.

    In both cases, we have estimated the kernels hyperparameters using MAP.

  8. 8.

    For the variances of the Holt’s model we use the same priors as in Table 2. For \(\alpha ,\beta \), we use the prior \(\text {Beta}(1,1.4)\) and, respectively, \(\text {Beta}(1,11.4)\). We learned the parameters of these priors using a hierarchical model similar to the one described in [3].

  9. 9.

    By contrast to arima and ETS, GP and SS models can easily model non-integer seasonality like the ones in the Electricity dataset, see [3] for more details.

References

  1. Bauer, M., van der Wilk, M., Rasmussen, C.E.: Understanding probabilistic sparse Gaussian process approximations. In: Advances in Neural Information Processing Systems, pp. 1533–1541 (2016)

    Google Scholar 

  2. Benavoli, A., Zaffalon, M.: State Space representation of non-stationary Gaussian processes. arXiv preprint arXiv:1601.01544 (2016)

  3. Corani, G., Benavoli, A., Zaffalon, M.: Time series forecasting with Gaussian Processes needs priors. In: Proceedings of the ECML PKDD (2021, accepted). https://arxiv.org/abs/2009.08102

  4. Foreman-Mackey, D., Agol, E., Ambikasaran, S., Angus, R.: Fast and scalable Gaussian process modeling with applications to astronomical time series. Astron. J. 154(6), 220 (2017)

    Article  Google Scholar 

  5. Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. J. Am. Stat. Assoc. 102(477), 359–378 (2007)

    Article  MathSciNet  Google Scholar 

  6. Hensman, J., Fusi, N., Lawrence, N.D.: Gaussian processes for big data. In: Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence, UAI 2013, pp. 282–290. AUAI Press, Arlington (2013)

    Google Scholar 

  7. Hernández-Lobato, D., Hernández-Lobato, J.M.: Scalable Gaussian process classification via expectation propagation. In: Artificial Intelligence and Statistics, pp. 168–176 (2016)

    Google Scholar 

  8. Hyndman, R.J., Athanasopoulos, G.: Forecasting: Principles and Practice, 2nd edn. OTexts, Melbourne (2018). OTexts.com/fpp2

  9. Hyndman, R.J., Khandakar, Y.: Automatic time series forecasting: the forecast package for R. J. Stat. Softw. 26(3), 1–22 (2008). http://www.jstatsoft.org/article/view/v027i03

  10. Jazwinski, A.H.: Stochastic Processes and Filtering Theory. Courier Corporation, New York (2007)

    MATH  Google Scholar 

  11. Karvonen, T., Sarkkä, S.: Approximate state-space Gaussian processes via spectral transformation. In: 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1–6. IEEE (2016)

    Google Scholar 

  12. Lloyd, J.R.: GEFCom2012 hierarchical load forecasting: gradient boosting machines and Gaussian processes. Int. J. Forecast. 30(2), 369–374 (2014)

    Article  Google Scholar 

  13. Loper, J., Blei, D., Cunningham, J.P., Paninski, L.: General linear-time inference for Gaussian processes on one dimension. arXiv preprint arXiv:2003.05554 (2020)

  14. Quiñonero-Candela, J., Rasmussen, C.E.: A unifying view of sparse approximate Gaussian process regression. J. Machine Learn. Res. 6, 1939–1959 (2005)

    MathSciNet  MATH  Google Scholar 

  15. Rasmussen, C., Williams, C.: Gaussian Processes for Machine Learning. The MIT Press, Cambridge (2006)

    MATH  Google Scholar 

  16. Roberts, S., Osborne, M., Ebden, M., Reece, S., Gibson, N., Aigrain, S.: Gaussian processes for time-series modelling. Philos. Trans. Royal Soc. A Math. Phys. Eng. Sci. 371(1984), 20110550 (2013)

    Article  MathSciNet  Google Scholar 

  17. Särkkä, S., Hartikainen, J.: Infinite-dimensional Kalman filtering approach to spatio-temporal Gaussian process regression. In: International Conference on Artificial Intelligence and Statistics, pp. 993–1001 (2012)

    Google Scholar 

  18. Sarkka, S., Solin, A., Hartikainen, J.: Spatiotemporal learning via infinite-dimensional Bayesian filtering and smoothing: a look at Gaussian process regression through kalman filtering. Signal Process. Mag. IEEE 30(4), 51–61 (2013)

    Article  Google Scholar 

  19. Schuerch, M., Azzimonti, D., Benavoli, A., Zaffalon, M.: Recursive estimation for sparse Gaussian process regression. Automatica 120, 109–127 (2020)

    MathSciNet  MATH  Google Scholar 

  20. Snelson, E., Ghahramani, Z.: Sparse Gaussian processes using pseudo-inputs. In: Advances in Neural Information Processing Systems, pp. 1257–1264 (2006)

    Google Scholar 

  21. Solin, A., Särkkä, S.: Explicit link between periodic covariance functions and state space models. In: Artificial Intelligence and Statistics, pp. 904–912. PMLR (2014)

    Google Scholar 

  22. Solin, A., Sarkka, S.: Gaussian quadratures for state space approximation of scale mixtures of squared exponential covariance functions. In: 2014 IEEE International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1–6. IEEE (2014)

    Google Scholar 

  23. Taylor, S.J., Letham, B.: Forecasting at scale. Am. Stat. 72(1), 37–45 (2018)

    Article  MathSciNet  Google Scholar 

  24. Titsias, M.: Variational learning of inducing variables in sparse Gaussian processes. In: van Dyk, D., Welling, M. (eds.) Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, PMLR, Hilton Clearwater Beach Resort, Clearwater Beach, Florida USA, 16–18 April 2009, vol. 5, pp. 567–574 (2009)

    Google Scholar 

  25. Wilson, A., Adams, R.: Gaussian process Kernels for pattern discovery and extrapolation. In: International Conference on Machine Learning, pp. 1067–1075. PMLR (2013)

    Google Scholar 

  26. Wood, S.N.: Generalized Additive Models: An Introduction with R. CRC Press, Boca Raton (2017)

    Book  Google Scholar 

Download references

Acknowledgements

The authors acknowledge support from the Swiss National Research Programme 75 “Big Data” Grant No. 407540_167199/1.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alessio Benavoli .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Benavoli, A., Corani, G. (2021). State Space Approximation of Gaussian Processes for Time Series Forecasting. In: Lemaire, V., Malinowski, S., Bagnall, A., Guyet, T., Tavenard, R., Ifrim, G. (eds) Advanced Analytics and Learning on Temporal Data. AALTD 2021. Lecture Notes in Computer Science(), vol 13114. Springer, Cham. https://doi.org/10.1007/978-3-030-91445-5_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-91445-5_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-91444-8

  • Online ISBN: 978-3-030-91445-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics