InTrans: Fast Incremental Transformer for Time Series Data Prediction

Bou, Savong; Amagasa, Toshiyuki; Kitagawa, Hiroyuki

doi:10.1007/978-3-031-12426-6_4

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13427))

Included in the following conference series:

International Conference on Database and Expert Systems Applications

1681 Accesses

Abstract

Predicting time-series data is useful in many applications, such as natural disaster prevention system, weather forecast, traffic control system, etc. Time-series forecasting has been extensively studied. Many existing forecasting models tend to perform well when predicting short sequence time-series. However, their performances greatly degrade when dealing with the long one. Recently, more dedicated research has been done for this direction, and Informer is currently the most efficient predicting model. The main drawback of Informer is the inability to incrementally learn. This paper proposes an incremental Transformer, called InTrans, to address the above bottleneck by reducing the training/predicting time of Informer. The time complexities of InTrans comparing to the Informer are: (1) O(S) vs O(L) for positional and temporal embedding, (2) $O((S+k-1)*k)$ vs $O(L*k)$ for value embedding, and (3) $O((S+k-1)*d_{dim})$ vs $O(L*d_{dim})$ for the computation of Query/Key/Value, where L is the length of the input; k is the kernel size; $d_{dim}$ is the number of dimensions; and S is the length of the non-overlapping part of the input that is usually significantly smaller than L. Therefore, InTrans could greatly improve both training and predicting speed over the state-of-the-art model, Informer. Extensive experiments have shown that InTrans is about 26% faster than Informer for both short sequence and long sequence time-series prediction.

S. Bou—Due to name change, Savong Bou is now known as Takehiko Hashimoto.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

TDMixer: Lightweight Long-Term Series Forecasting using Time-Continuous Embedding and Magnitude Decomposition

Auto-sktime: Automated Time Series Forecasting

Persistence Initialization: a novel adaptation of the Transformer architecture for time series forecasting

Article Open access 29 August 2023

Notes

References

Ariyo, A.A., Adewumi, A.O., Ayo, C.K.: Stock price prediction using the arima model. In: 2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation, pp. 106–112 (2014)
Google Scholar
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate (2016)
Google Scholar
Beltagy, I., Peters, M.E., Cohan, A.: Longformer: the long-document transformer. arXiv:2004.05150 (2020)
Bou, S., Kitagawa, H., Amagasa, T.: L-Bix: incremental sliding-window aggregation over data streams using linear bidirectional aggregating indexes. Knowl. Inf. Syst. 62(8), 3107–3131 (2020)
Article Google Scholar
Bou, S., Kitagawa, H., Amagasa, T.: Cpix: real-time analytics over out-of-order data streams by incremental sliding-window aggregation. IEEE Trans. Knowl. Data Eng. (2021). https://doi.org/10.1109/TKDE.2021.3054898
Kitaev, N., Kaiser, L., Levskaya, A.: Reformer: the efficient transformer. In: International Conference on Learning Representations (2020)
Google Scholar
Lai, G., Chang, W.C., Yang, Y., Liu, H.: Modeling long- and short-term temporal patterns with deep neural networks (2018)
Google Scholar
Park, H.J., Kim, Y., Kim, H.Y.: Stock market forecasting using a multi-task approach integrating long short-term memory and the random forest framework. Appl. Soft Comput. 114, 108106 (2022)
Article Google Scholar
Salinas, D., Flunkert, V., Gasthaus, J., Januschowski, T.: Deepar: probabilistic forecasting with autoregressive recurrent networks. Int. J. Forecast. 36(3), 1181–1191 (2020)
Article Google Scholar
Su, T., Pan, T., Chang, Y., Lin, S., Hao, M.: A hybrid fuzzy and k-nearest neighbor approach for debris flow disaster prevention. IEEE Access 10, 21787–21797 (2022). https://doi.org/10.1109/ACCESS.2022.3152906
Article Google Scholar
Taylor, S., Letham, B.: Forecasting at scale. Am. Stat. 72, 37–45 (2018)
Article MathSciNet Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017)
Google Scholar
Zhou, H., et al.: Informer: beyond efficient transformer for long sequence time-series forecasting. In: The Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Virtual Conference, vol. 35, pp. 11106–11115. AAAI Press (2021)
Google Scholar

Download references

Acknowledgements

This work was supported by University of Tsukuba Basic Research Support Program Type A, Japan Society for the Promotion of Science (JSPS) KAKENHI under Grant Number JP19H04114 and JP22H03694, the New Energy and Industrial Technology Development Organization (NEDO) Grant Number JPNP20006, and Japan Agency for Medical Research and Development (AMED) Grant Number JP21zf0127005.

Author information

Authors and Affiliations

Center for Computational Sciences, University of Tsukuba, Tsukuba, Japan
Savong Bou & Toshiyuki Amagasa
International Institute for Integrative Sleep Medicine, University of Tsukuba, Tsukuba, Japan
Hiroyuki Kitagawa

Authors

Savong Bou
View author publications
You can also search for this author in PubMed Google Scholar
Toshiyuki Amagasa
View author publications
You can also search for this author in PubMed Google Scholar
Hiroyuki Kitagawa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Savong Bou .

Editor information

Editors and Affiliations

University of Vienna, Vienna, Austria
Christine Strauss
University of Calabria, Rende, Italy
Alfredo Cuzzocrea
Johannes Kepler University of Linz, Linz, Austria
Gabriele Kotsis
Vienna University of Technology, Vienna, Austria
A Min Tjoa
Johannes Kepler University of Linz, Linz, Austria
Ismail Khalil

Appendices

Appendix

A Proof of Theorem 1

The inputs to the Eq. 2 are positional and dimensional information. The dimensional information is the same for all records in the training data. Assuming that we have two consecutive input i and $i+1$ of L records. The gap between the beginning of input i and $i+1$ is S records. The positional information of input i is $POS_i=[pos_{(i-1)S+1}, pos_{(i-1)S+2},..., pos_{(i-1)S+S}, pos_{(i-1)S+S+1}, ..., pos_{(i-1)S+L}]$, and the positional information of input $i+1$ is $POS_{i+1}=[pos_{(i-1)S+S+1}, pos_{(i-1)S+S+2}, ..., pos_{(i-1)S+L}, pos_{(i-1)S+L+1}, ..., pos_{(i-1)S+L+S}]$. We have

$POS_i=POSn_i \oplus POSo_i$, and
$POS_{i+1}=POSo_{i+1} \oplus POSn_{i+1}$, where
- $POSn_i=[pos_{(i-1)S+1}, pos_{(i-1)S+2},..., pos_{(i-1)S+S}]$, and
- $POSo_i=POSo_{i+1}=[pos_{(i-1)S+S+1}, ..., pos_{(i-1)S+L}]$
- $POSn_{i+1}=[pos_{(i-1)S+L+1}, ..., pos_{(i-1)S+L+S}]$

PosEncoding(POS) represents encoding the POS by Eq. 2. We have:

$Pn_{i}=PosEncoding(POSn_i)$, and
$Po_{i}=PosEncoding(POSo_i)$, so
$P_{i}=Pn_{i} \oplus Po_{i}$.

Because $POSo_i=POSo_{i+1}$, then we have:

$Pn_{i+1}=PosEncoding(POSn_{i+1})$, therefore
$P_{i+1}= Po_{i+1} \oplus Pn_{i+1}= Po_{i} \oplus Pn_{i+1} $.

Theorem 1 is proven.

B Proof of Theorem 2

The notations in Appendix A are also used in this Section. For relative position, the positional information of all input/output is the same, so $POS_1= POS_2=...= POS_i=POS_{i+1}=[pos_{1}, pos_{2},..., pos_{L}]$. Therefore, we have $P_1= P_2=...=P_i=P_{i+1}=PosEncoding(POS_1)$. Theorem 2 is proven.

C Proof of Theorem 3

The temporal embedding takes temporal information, such as weak, month, and holiday, of the records as a basis for embedding. The temporal information of the same record does not change wrt different training samples. Therefore, the temporal embedding of all records belongs to the overlapping part between input i and $i+1$ is the same. The proof is similar to that of absolute positional embedding in Appendix A. Thus, $To_{i+1}=To_{i}$, so Theorem 3 is proved.

D Proof of Theorem 4

The notations in Appendix A are also used in this Section. We need to prove that $Vr_{i} = Vr_{i+1}$. When computing the embedding value of input i, the input i is convolved by a kernel (e.g., width k). Since the default stride is set to one, the resulting embedding values of records $x_{(i-1)S+1}$ to $x_{(i-1)S+L-(k-1)}$ are fully convolved without including the padding values. Similarly, the embedding values of records $x_{(i-1)S+S+1}$ to $x_{(i-1)S+L-(k-1)}$ are fully convolved. Therefore, the embedding values of records $x_{(i-1)S+S+1}$ to $x_{(i-1)S+L-(k-1)}$ are the same for both input i and $i+1$. Therefore, $Vr_{i} = Vr_{i+1}$, which proves Theorem 4.

E Proof of Theorem 5

Query, Key, and Value are computed as in Eq. 7. Such multiplication does not change the value distribution from the original input embedding. Therefore, Query, Key, and Value can be incrementally computed in similar manner to that of input embedding, which is proved in Theorems 3, 1, and 4. Therefore, Theorem 5 is proven.

F Proof of Theorem 6

InTrans is implemented on top of Informer by adopting the incremental computation of the temporal/positional/value embeddings and Query/Key/Value of the training sample. To prove that InTrans has the same predicting accuracy as that of Informer, we have to prove that the temporal/positional/value embeddings and Query/Key/Value incrementally computed by InTrans is the same as the temporal/positional/value embeddings and Query/Key/Value computed by Informer. Theorems 1, 2, 3, 4, 5 have proved that the embedding of the input and the Query/Key/Value incrementally computed by InTrans are the same as those non-incrementally computed by Informer. Theorem 3.7 is proven.

G Proof of Theorem 7

Equations 4 and 5 suggests that positional and temporal embedding can be incrementally computed, which is proved in Theorems 1 and 3. For each input $i+1$, the positional embedding ($Po_i$) and the temporal embedding ($To_i$) corresponding to the overlapping part between inputs i and $i+1$ computed when embedding the input i can be reused for input $i+1$. Therefore, only the positional embedding ($Pn_{i+1}$) and the temporal embedding ($Tn_{i+1}$) corresponding to the non-overlapping part between input i and $i+1$ need to be computed when embedding input $i+1$. The size of the non-overlapping part between input i and $i+1$ is represented by S, so the time complexity to compute the positional and temporal embedding of each input is O(S).

Similarly, value embedding is done by using Conv1d. Equation 6 and Theorem 4 suggest that the value embedding of the overlapping part, excluding the value embedding of the last $k-1$ records, between input i and $i+1$ after computing value embedding of input i can be reused for embedding the input $i+1$. Therefore, the time complexity to compute the value embedding of each input is $O((S+k-1)*k)$.

Similar to value embedding, Eq. 6 and Theorem 4 suggest that the time complexity to compute Query or Key or Value is $O((S+k-1)*d_{dim})$. Theorem 7 is proved.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bou, S., Amagasa, T., Kitagawa, H. (2022). InTrans: Fast Incremental Transformer for Time Series Data Prediction. In: Strauss, C., Cuzzocrea, A., Kotsis, G., Tjoa, A.M., Khalil, I. (eds) Database and Expert Systems Applications. DEXA 2022. Lecture Notes in Computer Science, vol 13427. Springer, Cham. https://doi.org/10.1007/978-3-031-12426-6_4

Download citation

DOI: https://doi.org/10.1007/978-3-031-12426-6_4
Published: 29 July 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-12425-9
Online ISBN: 978-3-031-12426-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

InTrans: Fast Incremental Transformer for Time Series Data Prediction