U-Net Inspired Transformer Architecture for Far Horizon Time Series Forecasting

Madhusudhanan, Kiran; Burchert, Johannes; Duong-Trung, Nghia; Born, Stefan; Schmidt-Thieme, Lars

doi:10.1007/978-3-031-26422-1_3

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13718))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

1495 Accesses

Abstract

Time series data is ubiquitous in research as well as in a wide variety of industrial applications. Effectively analyzing the available historical data and providing insights into the far future allows us to make effective decisions. Recent research has witnessed the superior performance of transformer-based architectures, especially in the regime of far horizon time series forecasting. However, the current state of the art sparse Transformer architectures fail to couple down- and upsampling procedures to produce outputs in a similar resolution as the input. We propose a U-Net inspired Transformer architecture named Yformer, based on a novel Y-shaped encoder-decoder architecture that (1) uses direct connection from the downscaled encoder layer to the corresponding upsampled decoder layer in a U-Net inspired architecture, (2) Combines the downscaling/upsampling with sparse attention to capture long-range effects, and (3) stabilizes the encoder-decoder stacks with the addition of an auxiliary reconstruction loss. Extensive experiments have been conducted with relevant baselines on three benchmark datasets, demonstrating an average improvement of 19.82, 18.41% MSE and 13.62, 11.85% MAE in comparison to the baselines for the univariate and the multivariate settings respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

RSMformer: an efficient multiscale transformer-based framework for long sequence time-series forecasting

Article 04 January 2024

Self-attention-based time-variant neural networks for multi-step time series forecasting

Article 05 February 2022

An improved self-attention for long-sequence time-series data forecasting with missing values

Article 23 December 2023

Notes

References

Ba, L.J., Kiros, J.R., Hinton, G.E.: Layer normalization. CoRR (2016)
Google Scholar
Box, G.E.P., Jenkins, G.M.: Some recent advances in forecasting and control. Journal of the Royal Statistical Society. Series C (Appl. Stat.) 17, 91–109 (1968)
Google Scholar
Camacho, E.F., Alba, C.B.: Model Predictive Control. Springer, Heidelberg (2013)
Google Scholar
Cao, W., Wang, D., Li, J., Zhou, H., Li, L., Li, Y.: Brits: bidirectional recurrent imputation for time series. In: NeurIPS (2018)
Google Scholar
Cirstea, R.G., Guo, C., Yang, B., Kieu, T., Dong, X., Pan, S.: Triformer: Triangular, variable-specific attentions for long sequence multivariate time series forecasting. In: IJCAI (2022)
Google Scholar
Clevert, D., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (ELUS). In: ICLR (2016)
Google Scholar
Fan, C., et al.: Multi-horizon time series forecasting with temporal attention learning. In: SIGKDD (2019)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Google Scholar
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: CVPR (2017)
Google Scholar
Hyndman, R.J., Athanasopoulos, G.: Forecasting: principles and practice. In: OTexts (2018)
Google Scholar
Jarrett, D., van der Schaar, M.: Target-embedding autoencoders for supervised representation learning. In: ICLR (2020)
Google Scholar
Jawed, S., Rashed, A., Schmidt-Thieme, L.: Multi-step forecasting via multi-task learning. In: IEEE Big Data (2019)
Google Scholar
Kieu, T., Yang, B., Guo, C., S. Jensen, C.: Outlier detection for time series with recurrent autoencoder ensembles. In: IJCAI (2019)
Google Scholar
Kitaev, N., Kaiser, L., Levskaya, A.: Reformer: The efficient transformer. In: ICLR (2020)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NeurIPS (2012)
Google Scholar
Lai, G., Chang, W.C., Yang, Y., Liu, H.: Modeling long-and short-term temporal patterns with deep neural networks. In: SIGIR (2018)
Google Scholar
Le, L., Patterson, A., White, M.: Supervised autoencoders: Improving generalization performance with unsupervised regularizers. In: NeurIPS (2018)
Google Scholar
Li, S., et al.: Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. In: NeurIPS (2019)
Google Scholar
Lim, B., Arık, S.Ö., Loeff, N., Pfister, T.: Temporal fusion transformers for interpretable multi-horizon time series forecasting. Int. J. Forecast. 37, 1748–1764 (2021)
Google Scholar
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: CVPR (2017)
Google Scholar
Oreshkin, B.N., Carpov, D., Chapados, N., Bengio, Y.: N-BEATS: neural basis expansion analysis for interpretable time series forecasting. In: ICLR (2020)
Google Scholar
Perslev, M., Jensen, M., Darkner, S., Jennum, P.J., Igel, C.: U-time: a fully convolutional network for time series segmentation applied to sleep staging. In: NeurIPS (2019)
Google Scholar
Petit, O., Thome, N., Rambour, C., Themyr, L., Collins, T., Soler, L.: U-net transformer: self and cross attention for medical image segmentation. In: International Workshop on MLMI (2021)
Google Scholar
Qin, Y., Song, D., Chen, H., Cheng, W., Jiang, G., Cottrell, G.W.: A dual-stage attention-based recurrent neural network for time series prediction. In: IJCAI (2017)
Google Scholar
Rangapuram, S.S., Seeger, M.W., Gasthaus, J., Stella, L., Wang, Y., Januschowski, T.: Deep state space models for time series forecasting. In: NeurIPS (2018)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Sagheer, A., Kotb, M.: Time series forecasting of petroleum production using deep lstm recurrent networks. Neurocomputing 323, 203–213 (2019)
Google Scholar
Salinas, D., Flunkert, V., Gasthaus, J., Januschowski, T.: Deepar: Probabilistic forecasting with autoregressive recurrent networks. Int. J. Forecast. 36, 1181–1191 (2020)
Google Scholar
Sezer, O.B., Gudelek, M.U., Ozbayoglu, A.M.: Financial time series forecasting with deep learning: a systematic literature review: 2005–2019. Appl. Soft Comput. 90 106181(2020)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: NeurIPS (2017)
Google Scholar
Wang, S., Li, B.Z., Khabsa, M., Fang, H., Ma, H.: Linformer: Self-attention with linear complexity. ArXiv (2020)
Google Scholar
Wang, Y., Tao, X., Shen, X., Jia, J.: Wide-context semantic image extrapolation. In: CVPR (2019)
Google Scholar
Wang, Z., Yan, W., Oates, T.: Time series classification from scratch with deep neural networks: a strong baseline. In: IJCNN (2017)
Google Scholar
Wu, H., Xu, J., Wang, J., Long, M.: Autoformer: decomposition transformers with auto-correlation for long-term series forecasting. In: NeurIPS (2021)
Google Scholar
Zhou, H., et al.: Informer: beyond efficient transformer for long sequence time-series forecasting. In: AAAI (2021)
Google Scholar
Zhou, H.Y., Guo, J., Zhang, Y., Yu, L., Wang, L., Yu, Y.: nnFormer: interleaved transformer for volumetric segmentation. ArXiv (2021)
Google Scholar
Zou, X., Wang, Z., Li, Q., Sheng, W.: Integration of residual network and convolutional neural network along with various activation functions and global pooling for time series classification. Neurocomputing 367, 39–45 (2019)
Google Scholar

Download references

Acknowledgements

This work was supported by the Federal Ministry for Economic Affairs and Climate Action (BMWK), Germany, within the framework of the IIP-Ecosphere project (project number: 01MK20006D).

Author information

Authors and Affiliations

Institute for Computer Science, University of Hildesheim, Hildesheim, Germany
Kiran Madhusudhanan, Johannes Burchert & Lars Schmidt-Thieme
Technische Universität Berlin, Berlin, Germany
Nghia Duong-Trung & Stefan Born

Authors

Kiran Madhusudhanan
View author publications
You can also search for this author in PubMed Google Scholar
Johannes Burchert
View author publications
You can also search for this author in PubMed Google Scholar
Nghia Duong-Trung
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Born
View author publications
You can also search for this author in PubMed Google Scholar
Lars Schmidt-Thieme
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kiran Madhusudhanan .

Editor information

Editors and Affiliations

Grenoble Alpes University, Saint Martin d'Hères, France
Massih-Reza Amini
INSA Rouen Normandy, Saint Etienne du Rouvray, France
Stéphane Canu
Ruhr-Universität Bochum, Bochum, Germany
Asja Fischer
KU Leuven, Leuven, Belgium
Tias Guns
Central European University, Vienna, Austria
Petra Kralj Novak
Aristotle University of Thessaloniki, Thessaloniki, Greece
Grigorios Tsoumakas

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 389 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Madhusudhanan, K., Burchert, J., Duong-Trung, N., Born, S., Schmidt-Thieme, L. (2023). U-Net Inspired Transformer Architecture for Far Horizon Time Series Forecasting. In: Amini, MR., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2022. Lecture Notes in Computer Science(), vol 13718. Springer, Cham. https://doi.org/10.1007/978-3-031-26422-1_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-26422-1_3
Published: 18 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26421-4
Online ISBN: 978-3-031-26422-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

U-Net Inspired Transformer Architecture for Far Horizon Time Series Forecasting