Abstract
Missing values are common in the Internet of Things (IoT) environment for various reasons, including regular maintenance or malfunction. In time-series prediction in the IoT, missing values may have a relationship with the target labels, and their missing patterns result in informative missingness. Thus, missing values can be a barrier to achieving high accuracy of prediction and analysis in data mining in the IoT. Although several methods have been proposed to estimate values that are missing, few studies have investigated the comparison of interpolation methods using conventional and deep learning models. There has thus far been relatively little research into interpolation methods in the IoT environment. To address these problems, this paper presents the use of linear regression, support vector regression, artificial neural networks, and long short-term memory to make time-series predictions for missing values. Finally, a full comparison and analysis of interpolation methods are presented. We believe that these findings can be of value to future work in IoT applications.
Similar content being viewed by others
References
Rubin DB (1976) Inference and missing data. Biometrika 63(3):581–592
Soh PW, Chang JW, Huang JW (2018) Adaptive deep learning-based air quality prediction model using the most relevant spatial-temporal relations. IEEE Access 6:38186–38199
Taqm.epa.gov.tw. (2019) Environmental Protection Administration, Executive Yuan—Air quality monitoring. https://taqm.epa.gov.tw/taqm/tw/default.aspx?fbclid=IwAR2bR0r0Gm12Je4GmLjMIopTKZLP49Svo8q49c4PxP3ftHjFBqkYJ-0TU_o. Accessed 20 Aug 2019
Yadav ML, Roychoudhury B (2018) Handling missing values: a study of popular imputation packages in R. Knowl Based Syst 160:104–118
Allison PD, Horizons S (2012) Handling missing data by maximum likelihood. In: SAS Global Forum, pp 1–21
Batista GE, Monard MC (2002) A study of k-nearest neighbour as an imputation method. HIS 87(48):251–260
Malarvizhi MR, Thanamani AS (2012) K-nearest neighbor in missing data imputation. Int J Eng Res Dev (IJERD) 5(1):5–7
Royston P (2004) Multiple imputation of missing values. Stata J 4(3):227–241
Amiri M, Jensen R (2016) Missing data imputation using fuzzy-rough methods. Neurocomputing 205:152–164
Belanche LA, Kobayashi V, Aluja T (2014) Handling missing values in kernel methods with application to microbiology data. Neurocomputing 141:110–116
Soley-Bori M (2013) Dealing with missing data: key assumptions and methods for applied analysis. Boston University 4:1–19
Žliobaitė I, Hollmén J, Junninen H (2014) Regression models tolerant to massively missing data: a case study in solar-radiation nowcasting. Atmos Meas Tech 7(12):4387–4399
Raghunathan TE, Lepkowski JM, Van Hoewyk J, Solenberger P (2001) A multivariate technique for multiply imputing missing values using a sequence of regression models. Surv Methodol 27(1):85–96
Jones MP (1996) Indicator and stratification methods for missing explanatory variables in multiple linear regression. J Am Stat Assoc 91(433):222–230
Wang L, Fu D, Li Q, Mu Z (2010) Modelling method with missing values based on clustering and support vector regression. J Syst Eng Electron 21(1):142–147
Li Q, Fu Y, Zhou X, Xu Y (2009) The investigation and application of SVC and SVR in handling missing values. In: Proceedings of the First International Conference on Information Science and Engineering, pp 1002–1005, IEEE
Nourani V, Baghanam AH, Gebremichael M (2012) Investigating the ability of artificial neural network (ANN) models to estimate missing rain-gauge data. J Environ Inform 19(1):38–50
Tealab A, Hefny H, Badr A (2017) Forecasting of nonlinear time series using ANN. Fut Comput Inform J 2(1):39–47
Hu Y, Sun X, Nie X, Li Y, Liu L (2019) An enhanced LSTM for trend following of time series. IEEE Access 7:34020–34030
Hayes A (2019) What is an error term? Investopedia: https://www.investopedia.com/terms/e/errorterm.asp. Accessed 4 Aug 2019
Paisitriangkrai P (2019) Linear regression and support vector regression. Cs.adelaide.edu.au.: https://cs.adelaide.edu.au/~chhshen/teaching/ML_SVR.pdf. Accessed 4 Aug 2019
Berwick R, Idiot V (20169) An idiot’s guide to support vector machines (SVMs). Anon: http://web.mit.edu/6.034/wwwbob/svm-notes-long-08. Accessed 4 Aug 2019
Support Vector Machine—Classification (SVM) (2019) Saedsayad.com.:https://www.saedsayad.com/support_vector_machine.htm. Accessed 4 Aug 2019
Smola AJ, Schölkopf B (2004) A tutorial on support vector regression. Stat Comput 14(3):199–222
Ren Y, Suganthan PN, Srikanth N (2014) A novel empirical mode decomposition with support vector regression for wind speed forecasting. IEEE Trans Neural Netw Learn Syst 27(8):1793–1798
Gordon G, Tibshirani R (2019) Karush–Kuhn–Tucker conditions. Cs.cmu.edu.: https://www.cs.cmu.edu/~ggordon/10725-F12/slides/16-kkt.pdf. Accessed 4 Aug 2019
Xu B, Wang N, Chen T, Li M (2015) Empirical evaluation of rectified activations in convolutional network. arXiv. [Online] preprint arXiv:1505.00853. https://arxiv.org/abs/1505.00853. Accessed 4 Aug 2019
Maas AL, Hannun AY, Ng AY (2013) Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of the International Conference on Machine Learning (ICML), 2013, vol 30, no 1, p 3
Liu Q, Brigham K, Rao NS (2017) Estimation and fusion for tracking over long-haul links using artificial neural networks. IEEE Trans Signal Inf Process Over Netw 3(4):760–770
Lipton ZC, Berkowitz J, Elkan C (2015) A critical review of recurrent neural networks for sequence learning. arXiv. [Online] preprint arXiv: 1506.00019. https://arxiv.org/abs/1506.00019. Accessed 4 Aug 2019
Kong W, Dong ZY, Jia Y, Hill DJ, Xu Y, Zhang Y (2017) Short-term residential load forecasting based on LSTM recurrent neural network. IEEE Trans Smart Grid 10(1):841–851
Mikolov T, Karafiát M, Burget L, Černocký J, Khudanpur S (2010) Recurrent neural network based language model. In: Proceedings of the Eleventh Annual Conference of the International Speech Communication Association, 2010
Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 5(2):157–166
Pascanu R, Mikolov T, Bengio Y (2013) On the difficulty of training recurrent neural networks. In: Proceedings of the International Conference on Machine Learning, 2013 Feb, pp 1310–1318
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Gers FA, Schmidhuber J, Cummins F (2000) Learning to forget: continual prediction with LSTM. Neural Comput 12(10):2451–2471
Arbel N (2018) How LSTM networks solve the problem of vanishing gradients. Medium: https://medium.com/datadriveninvestor/how-do-lstm-networks-solve-the-problem-of-vanishing-gradients-a6784971a577. Accessed 4 Aug 2019
Hahne JM, Biessmann F, Jiang N, Rehbaum H, Farina D, Meinecke FC et al (2014) Linear and nonlinear regression techniques for simultaneous and proportional myoelectric control. IEEE Trans Neural Syst Rehabil Eng 22(2):269–279
Silva-Ramírez EL, Pino-Mejías R, López-Coello M, Cubiles-de-la-Vega MD (2011) Missing value imputation on missing completely at random data using multilayer perceptrons. Neural Netw 24(1):121–129
Acknowledgements
This work was supported by the Ministry of Science and Technology, Taiwan, R.O.C. (Grant Number MOST 108-2218-E-025-002-MY3) (Grant Number MOST 108-2218-E-001-001).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Yen, N.Y., Chang, JW., Liao, JY. et al. Analysis of interpolation algorithms for the missing values in IoT time series: a case of air quality in Taiwan. J Supercomput 76, 6475–6500 (2020). https://doi.org/10.1007/s11227-019-02991-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-019-02991-7