Abstract
Accurate time series forecasting has been recognized as an essential task in many application domains. Real-world time series data often consist of non-linear patterns with complexities that prevent conventional forecasting techniques from accurate predictions. To forecast a given time series accurately, a hybrid model based on two deep learning methods, i.e., long short-term memory (LSTM) and multi-head attention is proposed in this study. The proposed method leverages the two learned representations from these techniques. The performance of this method is also compared with some standard time series forecasting techniques as well as some hybrid cases proposed in the related literature using 16 datasets. Moreover, the individual models based on LSTM and multi-head attention are implemented to perform a comprehensive evaluation. The results of experiments in this study indicate that the proposed model outperforms all benchmarking methods in most datasets in terms of symmetric mean absolute percentage error (SMAPE). It yields the best average rank (AR) among the utilized methods. Besides, the results reveal that model based on multi-head attention is the second-best method with regard to AR, which demonstrates the predictive power of attention mechanism in time series forecasting.
Similar content being viewed by others
References
Abbasimehr H, Sabani M (2020) A new framework for predicting customer behavior in terms of RFM by considering the temporal aspect based on time series techniques. J Ambient Intell Human Comput. https://doi.org/10.1007/s12652-020-02015-wh
Abbasimehr H, Shabani M, Yousefi M (2020) An optimized model using LSTM network for demand forecasting. Comput Ind Eng 143:106435. https://doi.org/10.1016/j.cie.2020.106435
Atsalakis GS (2016) Using computational intelligence to forecast carbon prices. Appl Soft Comput 43:107–116
Babu CN, Reddy BE (2014) A moving-average filter based hybrid ARIMA–ANN model for forecasting time series data. Appl Soft Comput 23:27–38
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv: 14090473
Bandara K, Bergmeir C, Smyl S (2020) Forecasting across time series databases using recurrent neural networks on groups of similar series: a clustering approach. Expert Syst Appl 140:112896. https://doi.org/10.1016/j.eswa.2019.112896
Bedi J, Toshniwal D (2019) Deep learning framework to forecast electricity demand. Appl Energy 238:1312–1326
Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Networks 5:157–166
Brochu E, Cora VM, De Freitas N (2010) A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv preprint arXiv: 10122599
Büyükşahin ÜÇ, Ertekin Ş (2019) Improving forecasting accuracy of time series data using a new ARIMA-ANN hybrid method and empirical mode decomposition. Neurocomputing 361:151–163. https://doi.org/10.1016/j.neucom.2019.05.099
Chen W, Yeo CK, Lau CT, Lee BS (2018) Leveraging social media news to predict stock index movement using RNN-boost. Data Knowl Eng 118:14–24. https://doi.org/10.1016/j.datak.2018.08.003
Chollet F (2015) Keras. https://github.com/fchollet/keras. Accessed January 12, 2020
Chorowski J, Bahdanau D, Serdyuk D, Cho K, Bengio Y (2015) Attention-based models for speech recognition. In: The 28th international conference on neural information processing systems, Montreal, Canada. MIT Press, pp 577–585
de Oliveira JF, Ludermir TB (2016) A hybrid evolutionary decomposition system for time series forecasting. Neurocomputing 180:27–34
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
Farzad A, Mashayekhi H, Hassanpour H (2019) A comparative performance analysis of different activation functions in LSTM networks for classification. Neural Comput Applic 31:2507–2521. https://doi.org/10.1007/s00521-017-3210-6
Fischer T, Krauss C (2018) Deep learning with long short-term memory networks for financial market predictions. Eur J Oper Res 270:654–669
Fu X, Yang J, Li J, Fang M, Wang H (2018) Lexicon-enhanced LSTM with attention for general sentiment analysis. IEEE Access 6:71884–71891. https://doi.org/10.1109/ACCESS.2018.2878425
Gao W, Darvishan A, Toghani M, Mohammadi M, Abedinia O, Ghadimi N (2019) Different states of multi-block based forecast engine for price and load prediction. Int J Electr Power Energy Syst 104:423–435. https://doi.org/10.1016/j.ijepes.2018.07.014
Ghadimi N, Akbarimajd A, Shayeghi H, Abedinia O (2018a) A new prediction model based on multi-block forecast engine in smart grid. J Ambient Intell Human Comput 9:1873–1888. https://doi.org/10.1007/s12652-017-0648-4
Ghadimi N, Akbarimajd A, Shayeghi H, Abedinia O (2018b) Two stage forecast engine with feature selection technique and improved meta-heuristic algorithm for electricity load forecasting. Energy 161:130–142. https://doi.org/10.1016/j.energy.2018.07.088
Ghadimi N, Akbarimajd A, Shayeghi H, Abedinia O (2019) Application of a new hybrid forecast engine with feature selection algorithm in a power system. Int J Ambient Energy 40:494–503. https://doi.org/10.1080/01430750.2017.1412350
Graves A (2013) Generating sequences with recurrent neural networks. https://arxiv.org/. Accessed January 10, 2020
Gundu V, Simon SP (2020) PSO–LSTM for short term forecast of heterogeneous time series electricity price signals. J Ambient Intell Human Comput. https://doi.org/10.1007/s12652-020-02353-9
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9:1735–1780
Hyndman R, Koehler AB, Ord JK, Snyder RD (2008) Forecasting with exponential smoothing: the state space approach. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71918-2
Khandelwal I, Adhikari R, Verma G (2015) Time series forecasting using hybrid ARIMA and ANN models based on DWT decomposition. Procedia Comput Sci 48:173–179
Khashei M, Bijari M (2011) A novel hybridization of artificial neural networks and ARIMA models for time series forecasting. Appl Soft Comput 11:2664–2675
Kim J, Moon N (2019) BiLSTM model based on multivariate time series data in multiple field for forecasting trading area. J Ambient Intell Human Comput. https://doi.org/10.1007/s12652-019-01398-9
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv: 14126980
Kulshrestha A, Krishnaswamy V, Sharma M (2020) Bayesian BILSTM approach for tourism demand forecasting. Ann Tourism Res 83:102925. https://doi.org/10.1016/j.annals.2020.102925
Kumaresan K, Ganeshkumar P (2020) Software reliability prediction model with realistic assumption using time series (S)ARIMA model. J Ambient Intell Human Comput. https://doi.org/10.1007/s12652-020-01912-4
Law T, Shawe-Taylor J (2017) Practical Bayesian support vector regression for financial time series prediction and market condition change detection. Quant Financ 17:1403–1416
Law R, Li G, Fong DKC, Han X (2019) Tourism demand forecasting: a deep learning approach. Ann Tourism Res 75:410–423
Li J, Tu Z, Yang B, Lyu MR, Zhang T (2018) Multi-head attention with disagreement regularization. Paper presented at the 2018 conference on empirical methods in natural language processing. Belgium, Brussels
Martínez F, Frías MP, Pérez MD, Rivera AJ (2019) A methodology for applying k-nearest neighbor to time series forecasting. Artif Intell Rev 52:2019–2037. https://doi.org/10.1007/s10462-017-9593-z
Martínez F, Frías MP, Pérez-Godoy MD, Rivera AJ (2018) Dealing with seasonality by narrowing the training set in time series forecasting with kNN. Expert Syst Appl 103:38–48
Mir M, Shafieezadeh M, Heidari MA, Ghadimi N (2020) Application of hybrid forecast engine based intelligent algorithm and feature selection for wind signal prediction. Evolv Syst 11:559–573. https://doi.org/10.1007/s12530-019-09271-y
Murray PW, Agard B, Barajas MA (2018) Forecast of individual customer’s demand from a large and noisy dataset. Comput Ind Eng 118:33–43
Nayak SC, Misra BB, Behera HS (2019) Efficient financial time series prediction with evolutionary virtual data position exploration. Neural Comput & Applic 31:1053–1074. https://doi.org/10.1007/s00521-017-3061-1
Olah C (2015) Understanding lstm networks. http://colah.github.io/posts/2015-08-Understanding-LSTMs. Accessed 20 Nov 2019
Panigrahi S, Behera HS (2017) A hybrid ETS–ANN model for time series forecasting. Eng Appl Artif Intell 66:49–59
Parmezan ARS, Souza VM, Batista GE (2019) Evaluation of statistical and machine learning models for time series prediction: identifying the state-of-the-art and the best conditions for the use of each model. Inform Sci 484:302–337
Prechelt L (2012) Early stopping: but When? In: Montavon G, Orr GB, Müller K-R (eds) Neural Networks: tricks of the trade: second edition. Springer Berlin Heidelberg, Berlin, Heidelberg 53–67 https://doi.org/10.1007/978-3-642-35289-8_5
Reimers N, Gurevych I (2017) Optimal hyperparameters for deep lstm-networks for sequence labeling tasks. arXiv preprint arXiv: 170706799
Sagheer A, Kotb M (2019) Time series forecasting of petroleum production using deep LSTM recurrent networks. Neurocomputing 323:203–213
Samet H, Reisi M, Marzbani F (2019) Evaluation of neural network-based methodologies for wind speed forecasting. Comput Electr Eng 78:356–372. https://doi.org/10.1016/j.compeleceng.2019.07.024
Sangeetha K, Prabha D (2020) Sentiment analysis of student feedback using multi-head attention fusion model of word and context embedding for LSTM. J Ambient Intell Human Comput. https://doi.org/10.1007/s12652-020-01791-9
Sengar S, Liu X (2020) Ensemble approach for short term load forecasting in wind energy system using hybrid algorithm. J Ambient Intell Human Comput. https://doi.org/10.1007/s12652-020-01866-7
Shankar S, Ilavarasan PV, Punia S, Singh Surya P (2019) Forecasting container throughput with long short-term memory networks. Ind Manage Data Syst 120:425–441. https://doi.org/10.1108/IMDS-07-2019-0370
Takahashi S, Chen Y, Tanaka-Ishii K (2019) Modeling financial time-series with generative adversarial networks. Phys A 527:121261. https://doi.org/10.1016/j.physa.2019.121261
Vaswani A et al (2017) Attention is all you need. In: 31st international conference on neural information processing systems, Long Beach, California, USA. Curran Associates Inc, pp 6000–6010
Zhang GP (2003) Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing 50:159–175. https://doi.org/10.1016/S0925-2312(01)00702-0
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Abbasimehr, H., Paki, R. Improving time series forecasting using LSTM and attention models. J Ambient Intell Human Comput 13, 673–691 (2022). https://doi.org/10.1007/s12652-020-02761-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-020-02761-x