Foreformer: an enhanced transformer-based framework for multivariate time series forecasting

Yang, Ye; Lu, Jiangang

doi:10.1007/s10489-022-04100-3

Foreformer: an enhanced transformer-based framework for multivariate time series forecasting

Published: 28 September 2022

Volume 53, pages 12521–12540, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

1842 Accesses
2 Altmetric
Explore all metrics

Abstract

Multivariate time series forecasting (MTSF) has been extensively studied throughout years with ubiquitous applications in finance, traffic, environment, etc. Recent investigations have demonstrated the potential of Transformer to improve the forecasting performance. Transformer, however, has limitations that prohibit it from being directly applied to MTSF, such as insufficient extraction of temporal patterns at different time scales, extraction of irrelevant information in the self-attention, and no targeted processing of static covariates. Motivated by above, an enhanced Transformer-based framework for MTSF is proposed, named Foreformer, with three distinctive characteristics: (i) a multi-temporal resolution module that deeply captures temporal patterns at different scales, (ii) an explicit sparse attention mechanism forces model to prioritize the most contributive components, and (iii) a static covariates processing module for nonlinear processing of static covariates. Extensive experiments on three real-world datasets demonstrate that Foreformer outperforms existing methodologies, making it a reliable approach for MTSF tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

RSMformer: an efficient multiscale transformer-based framework for long sequence time-series forecasting

Article 04 January 2024

DBFF-GRU: dual-branch temporal feature fusion network with fast GRU for multivariate time series forecasting

Article 03 April 2025

Sparse transformer with local and seasonal adaptation for multivariate time series forecasting

Article Open access 10 July 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

References

Kim K-J (2003) Financial time series forecasting using support vector machines. Neurocomputing 55(1-2):307–319
Article Google Scholar
Bashar MK et al (2021) Event-driven time series analysis and the comparison of public reactions on covid-19. In: CS & IT conference proceedings, vol 11. CS & IT conference proceedings
Michau G, Frusque G, Fink O (2022) Fully learnable deep wavelet transform for unsupervised monitoring of high-frequency time series. Proc National Acad Sci 119(8):2106598119
Article Google Scholar
Garnot VSF, Landrieu L (2020) Lightweight temporal self-attention for classifying satellite images time series. In: International workshop on advanced analytics and learning on temporal data. Springer, pp 171–181
Box GE, Jenkins GM (1968) Some recent advances in forecasting and control. J Royal Stat Society. Series C (Appl Stat) 17(2):91–109
MathSciNet Google Scholar
Ostertagová E, Ostertag O (2011) The simple exponential smoothing model. In: The 4th international conference on modelling of mechanical and mechatronic systems, technical university of Košice, Slovak Republic, proceedings of conference, pp 380–384
Lim B, Zohren S (2021) Time-series forecasting with deep learning: a survey. Phil Trans R Soc A 379(2194):20200209
Article MathSciNet Google Scholar
Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828
Article Google Scholar
Gers FA, Schmidhuber J, Cummins F (2000) Learning to forget: continual prediction with lstm. Neural Comput 12(10):2451–2471
Article Google Scholar
Chung J, Gulcehre C, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. In: NIPS 2014 workshop on deep learning, December 2014
Liu Y, Gong C, Yang L, Chen Y (2020) Dstp-rnn: a dual-stage two-phase attention-based recurrent neural network for long-term and multivariate time series prediction. Expert Syst Appl 113082:143
Google Scholar
Qin Y, Song D, Cheng H, Cheng W, Jiang G, Cottrell GW (2017) A dual-stage attention-based recurrent neural network for time series prediction. In: Proceedings of the 26th international joint conference on artificial intelligence, pp 2627–2633
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Advances in neural information processing systems, pp 3104–3112
Chen Y, Kang Y, Chen Y, Wang Z (2020) Probabilistic forecasting with temporal convolutional neural network. Neurocomputing 399:491–501
Article Google Scholar
Karim F, Majumdar S, Darabi H, Chen S (2017) Lstm fully convolutional networks for time series classification. IEEE Access 6:1662–1669
Article Google Scholar
Lai G, Chang W-C, Yang Y, Liu H (2018) Modeling long-and short-term temporal patterns with deep neural networks. In: The 41st international ACM SIGIR conference on research & development in information retrieval, pp 95–104
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008
Zerveas G, Jayaraman S, Patel D, Bhamidipaty A, Eickhoff C (2021) A transformer-based framework for multivariate time series representation learning. In: Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining, pp 2114–2124
Wu N, Green B, Ben X, O’Banion S (2020) Deep transformer models for time series forecasting: The influenza prevalence case. arXiv:2001.08317
Khan S, Naseer M, Hayat M et al (2021) Transformers in vision: A survey. ACM Computing Surveys (CSUR)
Wolf T, Chaumond J, Debut L, Sanh V, Delangue C, Moi A, Cistac P, Funtowicz M, Davison J, Shleifer S et al (2020) Transformers: State-of-the-art natural language processing. In: Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, pp 38–45
Li S, Jin X, Xuan Y, Zhou X, Chen W, Wang Y-X, Yan X (2019) Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. Adv Neural Inf Process Syst 32:5243–5253
Google Scholar
Wu S, Xiao X, Ding Q, Zhao P, Wei Y, Huang J (2020) Adversarial sparse transformer for time series forecasting. Adv Neural Inf Process Syst, vol 33
Zhou H, Zhang S, Peng J, Zhang S, Li J, Xiong H, Zhang W (2021) Informer: beyond efficient transformer for long sequence time-series forecasting. In: Proceedings of AAAI
Chen M, Yu X, Liu Y (2018) Pcnn: deep convolutional networks for short-term traffic congestion prediction. IEEE Trans Intell Transp Syst 19(11):3550–3559
Article Google Scholar
Bai S, Kolter JZ, Koltun V (2018) An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv:https://axiv.org/abs/1803.01271
Freeman JR, Williams JT, Lin T-M (1989) Vector autoregression and the study of politics. Am J Polit Sci:842–877
Benjamin MA, Rigby RA, Stasinopoulos DM (2003) Generalized autoregressive moving average models. J Amer Stat Assoc 98(461):214–223
Article MathSciNet MATH Google Scholar
Nelson BK (1998) Time series analysis using autoregressive integrated moving average (arima) models. Acad Emergency Med 5(7):739–744
Article Google Scholar
Smola AJ, Schölkopf B (2004) A tutorial on support vector regression. Stat Comput 14(3):199–222
Article MathSciNet Google Scholar
Wen R, Torkkola K, Narayanaswamy B (2017) A Multi-Horizon Quantile Recurrent Forecaster. arXiv:1711.11053
Rangapuram SS, Seeger MW, Gasthaus J, Stella L, Wang Y, Januschowski T (2018) Deep state space models for time series forecasting. Adv Neural Inf Process Syst 31:7785–7794
Google Scholar
Yu R, Zheng S, Anandkumar A, Yue Y (2017) Long-term forecasting using tensor-train rnns. arXiv:1711.00073
Salinas D, Flunkert V, Gasthaus J, Januschowski T (2020) Deepar: probabilistic forecasting with autoregressive recurrent networks. Int J Forecast 36(3):1181–1191
Article Google Scholar
Shih S-Y, Sun F-K, Lee H-Y (2019) Temporal pattern attention for multivariate time series forecasting. Mach Learn 108(8):1421–1441
Article MathSciNet MATH Google Scholar
Song H, Rajan D, Thiagarajan JJ, Spanias A (2018) Attend and diagnose: Clinical time series analysis using attention models. In: Thirty-second AAAI conference on artificial intelligence
Van Den Oord A, Dieleman S, Zen H, Simonyan K, Vinyals O, Graves A, Kalchbrenner N, Senior A, Kavukcuoglu K (2016) Wavenet: a generative model for raw audio. In: 9th ISCA speech synthesis workshop, pp 125–125
Borovykh A, Bohte S, Oosterlee CW (2017) Conditional time series forecasting with convolutional neural networks. Stat 1050:16
Google Scholar
Sen R, Yu H-F, Dhillon I (2019) Think globally, act locally: a deep neural network approach to high-dimensional time series forecasting. In: Proceedings of the 33rd international conference on neural information processing systems, pp 4837–4846
Lim B, Arık Sö, Loeff N, Pfister T (2021) Temporal fusion transformers for interpretable multi-horizon time series forecasting. Int J Forecasting
Mallat S (1989) A theory for multi-resolution approximation: the wavelet approximation. IEEE Trans PAMI 11:674–693
Article MATH Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Kazemi SM, Goel R, Eghbali S, Ramanan J, Sahota J, Thakur S, Wu S, Smyth C, Poupart P, Brubaker M (2019) Time2vec: learning a vector representation of time. arXiv:1907.05321
Bahdanau D, Cho KH, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: 3rd International conference on learning representations, ICLR 2015
Luong M-T, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp 1412–1421
Ke NR, Alias Parth GOYAL AG, Bilaniuk O, Binas J, Mozer MC, Pal C, Bengio Y (2018) Sparse attentive backtracking: temporal credit assignment through reminding. Adv Neural Inf Process Syst, vol 31
Dauphin YN, Fan A, Auli M, Grangier D (2017) Language modeling with gated convolutional networks. In: International conference on machine learning. PMLR, pp 933–941
Zeng Z, Xiao H, Zhang X (2016) Self cnn-based time series stream forecasting. Electron Lett 52(22):1857–1858
Article Google Scholar
Fu R, Zhang Z, Li L (2016) Using lstm and gru neural network methods for traffic flow prediction. In: 2016 31st Youth academic annual conference of chinese association of automation (YAC). IEEE, pp 324–328
Wu Z, Pan S, Long G, Jiang J, Chang X, Zhang C (2020) Connecting the dots: multivariate time series forecasting with graph neural networks. In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pp 753–763
Ziat A, Delasalles E, Denoyer L, Gallinari P (2017) Spatio-temporal neural networks for space-time series forecasting and relations discovery. In: 2017 IEEE international conference on data mining (ICDM). IEEE, pp 705–714

Download references

Acknowledgements

This work was supported in part by the Major Scientific Project of Zhejiang Laboratory under Grant 2020MC0AE01, in part by the Fundamental Research Funds for the Central Universities (Zhejiang University New Generation Industrial Control System (NGICS) Platform) under Grant ZJUNGICS2021010, and in part by the Zhejiang University Robotics Institute (Yuyao) Project under Grant K12001.

Author information

Authors and Affiliations

State Key Laboratory of Industrial Control Technology, College of Control Science and Engineering, Zhejiang University, Hangzhou, 310027, China
Ye Yang & Jiangang Lu
Zhejiang Laboratory, Hangzhou, 311121, China
Jiangang Lu

Authors

Ye Yang
View author publications
You can also search for this author inPubMed Google Scholar
Jiangang Lu
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Jiangang Lu.

Ethics declarations

Conflict of Interests

The authors declared that they have no conflicts of interest to this work.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yang, Y., Lu, J. Foreformer: an enhanced transformer-based framework for multivariate time series forecasting. Appl Intell 53, 12521–12540 (2023). https://doi.org/10.1007/s10489-022-04100-3

Download citation

Accepted: 19 August 2022
Published: 28 September 2022
Issue Date: May 2023
DOI: https://doi.org/10.1007/s10489-022-04100-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Foreformer: an enhanced transformer-based framework for multivariate time series forecasting

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

RSMformer: an efficient multiscale transformer-based framework for long sequence time-series forecasting

DBFF-GRU: dual-branch temporal feature fusion network with fast GRU for multivariate time series forecasting

Sparse transformer with local and seasonal adaptation for multivariate time series forecasting

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now