Skip to main content
Log in

Analysis of interpolation algorithms for the missing values in IoT time series: a case of air quality in Taiwan

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Missing values are common in the Internet of Things (IoT) environment for various reasons, including regular maintenance or malfunction. In time-series prediction in the IoT, missing values may have a relationship with the target labels, and their missing patterns result in informative missingness. Thus, missing values can be a barrier to achieving high accuracy of prediction and analysis in data mining in the IoT. Although several methods have been proposed to estimate values that are missing, few studies have investigated the comparison of interpolation methods using conventional and deep learning models. There has thus far been relatively little research into interpolation methods in the IoT environment. To address these problems, this paper presents the use of linear regression, support vector regression, artificial neural networks, and long short-term memory to make time-series predictions for missing values. Finally, a full comparison and analysis of interpolation methods are presented. We believe that these findings can be of value to future work in IoT applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22

Similar content being viewed by others

References

  1. Rubin DB (1976) Inference and missing data. Biometrika 63(3):581–592

    Article  MathSciNet  MATH  Google Scholar 

  2. Soh PW, Chang JW, Huang JW (2018) Adaptive deep learning-based air quality prediction model using the most relevant spatial-temporal relations. IEEE Access 6:38186–38199

    Article  Google Scholar 

  3. Taqm.epa.gov.tw. (2019) Environmental Protection Administration, Executive Yuan—Air quality monitoring. https://taqm.epa.gov.tw/taqm/tw/default.aspx?fbclid=IwAR2bR0r0Gm12Je4GmLjMIopTKZLP49Svo8q49c4PxP3ftHjFBqkYJ-0TU_o. Accessed 20 Aug 2019

  4. Yadav ML, Roychoudhury B (2018) Handling missing values: a study of popular imputation packages in R. Knowl Based Syst 160:104–118

    Article  Google Scholar 

  5. Allison PD, Horizons S (2012) Handling missing data by maximum likelihood. In: SAS Global Forum, pp 1–21

  6. Batista GE, Monard MC (2002) A study of k-nearest neighbour as an imputation method. HIS 87(48):251–260

    Google Scholar 

  7. Malarvizhi MR, Thanamani AS (2012) K-nearest neighbor in missing data imputation. Int J Eng Res Dev (IJERD) 5(1):5–7

    Google Scholar 

  8. Royston P (2004) Multiple imputation of missing values. Stata J 4(3):227–241

    Article  Google Scholar 

  9. Amiri M, Jensen R (2016) Missing data imputation using fuzzy-rough methods. Neurocomputing 205:152–164

    Article  Google Scholar 

  10. Belanche LA, Kobayashi V, Aluja T (2014) Handling missing values in kernel methods with application to microbiology data. Neurocomputing 141:110–116

    Article  Google Scholar 

  11. Soley-Bori M (2013) Dealing with missing data: key assumptions and methods for applied analysis. Boston University 4:1–19

    Google Scholar 

  12. Žliobaitė I, Hollmén J, Junninen H (2014) Regression models tolerant to massively missing data: a case study in solar-radiation nowcasting. Atmos Meas Tech 7(12):4387–4399

    Article  Google Scholar 

  13. Raghunathan TE, Lepkowski JM, Van Hoewyk J, Solenberger P (2001) A multivariate technique for multiply imputing missing values using a sequence of regression models. Surv Methodol 27(1):85–96

    Google Scholar 

  14. Jones MP (1996) Indicator and stratification methods for missing explanatory variables in multiple linear regression. J Am Stat Assoc 91(433):222–230

    Article  MathSciNet  MATH  Google Scholar 

  15. Wang L, Fu D, Li Q, Mu Z (2010) Modelling method with missing values based on clustering and support vector regression. J Syst Eng Electron 21(1):142–147

    Article  Google Scholar 

  16. Li Q, Fu Y, Zhou X, Xu Y (2009) The investigation and application of SVC and SVR in handling missing values. In: Proceedings of the First International Conference on Information Science and Engineering, pp 1002–1005, IEEE

  17. Nourani V, Baghanam AH, Gebremichael M (2012) Investigating the ability of artificial neural network (ANN) models to estimate missing rain-gauge data. J Environ Inform 19(1):38–50

    Article  Google Scholar 

  18. Tealab A, Hefny H, Badr A (2017) Forecasting of nonlinear time series using ANN. Fut Comput Inform J 2(1):39–47

    Article  Google Scholar 

  19. Hu Y, Sun X, Nie X, Li Y, Liu L (2019) An enhanced LSTM for trend following of time series. IEEE Access 7:34020–34030

    Article  Google Scholar 

  20. Hayes A (2019) What is an error term? Investopedia: https://www.investopedia.com/terms/e/errorterm.asp. Accessed 4 Aug 2019

  21. Paisitriangkrai P (2019) Linear regression and support vector regression. Cs.adelaide.edu.au.: https://cs.adelaide.edu.au/~chhshen/teaching/ML_SVR.pdf. Accessed 4 Aug 2019

  22. Berwick R, Idiot V (20169) An idiot’s guide to support vector machines (SVMs). Anon: http://web.mit.edu/6.034/wwwbob/svm-notes-long-08. Accessed 4 Aug 2019

  23. Support Vector Machine—Classification (SVM) (2019) Saedsayad.com.:https://www.saedsayad.com/support_vector_machine.htm. Accessed 4 Aug 2019

  24. Smola AJ, Schölkopf B (2004) A tutorial on support vector regression. Stat Comput 14(3):199–222

    Article  MathSciNet  Google Scholar 

  25. Ren Y, Suganthan PN, Srikanth N (2014) A novel empirical mode decomposition with support vector regression for wind speed forecasting. IEEE Trans Neural Netw Learn Syst 27(8):1793–1798

    Article  MathSciNet  Google Scholar 

  26. Gordon G, Tibshirani R (2019) Karush–Kuhn–Tucker conditions. Cs.cmu.edu.: https://www.cs.cmu.edu/~ggordon/10725-F12/slides/16-kkt.pdf. Accessed 4 Aug 2019

  27. Xu B, Wang N, Chen T, Li M (2015) Empirical evaluation of rectified activations in convolutional network. arXiv. [Online] preprint arXiv:1505.00853. https://arxiv.org/abs/1505.00853. Accessed 4 Aug 2019

  28. Maas AL, Hannun AY, Ng AY (2013) Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of the International Conference on Machine Learning (ICML), 2013, vol 30, no 1, p 3

  29. Liu Q, Brigham K, Rao NS (2017) Estimation and fusion for tracking over long-haul links using artificial neural networks. IEEE Trans Signal Inf Process Over Netw 3(4):760–770

    Article  MathSciNet  Google Scholar 

  30. Lipton ZC, Berkowitz J, Elkan C (2015) A critical review of recurrent neural networks for sequence learning. arXiv. [Online] preprint arXiv: 1506.00019. https://arxiv.org/abs/1506.00019. Accessed 4 Aug 2019

  31. Kong W, Dong ZY, Jia Y, Hill DJ, Xu Y, Zhang Y (2017) Short-term residential load forecasting based on LSTM recurrent neural network. IEEE Trans Smart Grid 10(1):841–851

    Article  Google Scholar 

  32. Mikolov T, Karafiát M, Burget L, Černocký J, Khudanpur S (2010) Recurrent neural network based language model. In: Proceedings of the Eleventh Annual Conference of the International Speech Communication Association, 2010

  33. Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 5(2):157–166

    Article  Google Scholar 

  34. Pascanu R, Mikolov T, Bengio Y (2013) On the difficulty of training recurrent neural networks. In: Proceedings of the International Conference on Machine Learning, 2013 Feb, pp 1310–1318

  35. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  36. Gers FA, Schmidhuber J, Cummins F (2000) Learning to forget: continual prediction with LSTM. Neural Comput 12(10):2451–2471

    Article  Google Scholar 

  37. Arbel N (2018) How LSTM networks solve the problem of vanishing gradients. Medium: https://medium.com/datadriveninvestor/how-do-lstm-networks-solve-the-problem-of-vanishing-gradients-a6784971a577. Accessed 4 Aug 2019

  38. Hahne JM, Biessmann F, Jiang N, Rehbaum H, Farina D, Meinecke FC et al (2014) Linear and nonlinear regression techniques for simultaneous and proportional myoelectric control. IEEE Trans Neural Syst Rehabil Eng 22(2):269–279

    Article  Google Scholar 

  39. Silva-Ramírez EL, Pino-Mejías R, López-Coello M, Cubiles-de-la-Vega MD (2011) Missing value imputation on missing completely at random data using multilayer perceptrons. Neural Netw 24(1):121–129

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the Ministry of Science and Technology, Taiwan, R.O.C. (Grant Number MOST 108-2218-E-025-002-MY3) (Grant Number MOST 108-2218-E-001-001).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jia-Wei Chang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yen, N.Y., Chang, JW., Liao, JY. et al. Analysis of interpolation algorithms for the missing values in IoT time series: a case of air quality in Taiwan. J Supercomput 76, 6475–6500 (2020). https://doi.org/10.1007/s11227-019-02991-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-019-02991-7

Keywords

Navigation