Analysis of interpolation algorithms for the missing values in IoT time series: a case of air quality in Taiwan

Yen, Neil Y.; Chang, Jia-Wei; Liao, Jia-Yi; Yong, You-Ming

doi:10.1007/s11227-019-02991-7

Analysis of interpolation algorithms for the missing values in IoT time series: a case of air quality in Taiwan

Published: 11 September 2019

Volume 76, pages 6475–6500, (2020)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Neil Y. Yen¹,
Jia-Wei Chang ORCID: orcid.org/0000-0002-9321-6278²,
Jia-Yi Liao² &
…
You-Ming Yong³

795 Accesses
11 Citations
Explore all metrics

Abstract

Missing values are common in the Internet of Things (IoT) environment for various reasons, including regular maintenance or malfunction. In time-series prediction in the IoT, missing values may have a relationship with the target labels, and their missing patterns result in informative missingness. Thus, missing values can be a barrier to achieving high accuracy of prediction and analysis in data mining in the IoT. Although several methods have been proposed to estimate values that are missing, few studies have investigated the comparison of interpolation methods using conventional and deep learning models. There has thus far been relatively little research into interpolation methods in the IoT environment. To address these problems, this paper presents the use of linear regression, support vector regression, artificial neural networks, and long short-term memory to make time-series predictions for missing values. Finally, a full comparison and analysis of interpolation methods are presented. We believe that these findings can be of value to future work in IoT applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reconstructing Environmental Variables with Missing Field Data via End-to-End Machine Learning

Comparison of Estimating Missing Values in IoT Time Series Data Using Different Interpolation Algorithms

Article 17 August 2018

Air Quality Index Prediction Based on Deep Recurrent Neural Network

References

Rubin DB (1976) Inference and missing data. Biometrika 63(3):581–592
Article MathSciNet MATH Google Scholar
Soh PW, Chang JW, Huang JW (2018) Adaptive deep learning-based air quality prediction model using the most relevant spatial-temporal relations. IEEE Access 6:38186–38199
Article Google Scholar
Taqm.epa.gov.tw. (2019) Environmental Protection Administration, Executive Yuan—Air quality monitoring. https://taqm.epa.gov.tw/taqm/tw/default.aspx?fbclid=IwAR2bR0r0Gm12Je4GmLjMIopTKZLP49Svo8q49c4PxP3ftHjFBqkYJ-0TU_o. Accessed 20 Aug 2019
Yadav ML, Roychoudhury B (2018) Handling missing values: a study of popular imputation packages in R. Knowl Based Syst 160:104–118
Article Google Scholar
Allison PD, Horizons S (2012) Handling missing data by maximum likelihood. In: SAS Global Forum, pp 1–21
Batista GE, Monard MC (2002) A study of k-nearest neighbour as an imputation method. HIS 87(48):251–260
Google Scholar
Malarvizhi MR, Thanamani AS (2012) K-nearest neighbor in missing data imputation. Int J Eng Res Dev (IJERD) 5(1):5–7
Google Scholar
Royston P (2004) Multiple imputation of missing values. Stata J 4(3):227–241
Article Google Scholar
Amiri M, Jensen R (2016) Missing data imputation using fuzzy-rough methods. Neurocomputing 205:152–164
Article Google Scholar
Belanche LA, Kobayashi V, Aluja T (2014) Handling missing values in kernel methods with application to microbiology data. Neurocomputing 141:110–116
Article Google Scholar
Soley-Bori M (2013) Dealing with missing data: key assumptions and methods for applied analysis. Boston University 4:1–19
Google Scholar
Žliobaitė I, Hollmén J, Junninen H (2014) Regression models tolerant to massively missing data: a case study in solar-radiation nowcasting. Atmos Meas Tech 7(12):4387–4399
Article Google Scholar
Raghunathan TE, Lepkowski JM, Van Hoewyk J, Solenberger P (2001) A multivariate technique for multiply imputing missing values using a sequence of regression models. Surv Methodol 27(1):85–96
Google Scholar
Jones MP (1996) Indicator and stratification methods for missing explanatory variables in multiple linear regression. J Am Stat Assoc 91(433):222–230
Article MathSciNet MATH Google Scholar
Wang L, Fu D, Li Q, Mu Z (2010) Modelling method with missing values based on clustering and support vector regression. J Syst Eng Electron 21(1):142–147
Article Google Scholar
Li Q, Fu Y, Zhou X, Xu Y (2009) The investigation and application of SVC and SVR in handling missing values. In: Proceedings of the First International Conference on Information Science and Engineering, pp 1002–1005, IEEE
Nourani V, Baghanam AH, Gebremichael M (2012) Investigating the ability of artificial neural network (ANN) models to estimate missing rain-gauge data. J Environ Inform 19(1):38–50
Article Google Scholar
Tealab A, Hefny H, Badr A (2017) Forecasting of nonlinear time series using ANN. Fut Comput Inform J 2(1):39–47
Article Google Scholar
Hu Y, Sun X, Nie X, Li Y, Liu L (2019) An enhanced LSTM for trend following of time series. IEEE Access 7:34020–34030
Article Google Scholar
Hayes A (2019) What is an error term? Investopedia: https://www.investopedia.com/terms/e/errorterm.asp. Accessed 4 Aug 2019
Paisitriangkrai P (2019) Linear regression and support vector regression. Cs.adelaide.edu.au.: https://cs.adelaide.edu.au/~chhshen/teaching/ML_SVR.pdf. Accessed 4 Aug 2019
Berwick R, Idiot V (20169) An idiot’s guide to support vector machines (SVMs). Anon: http://web.mit.edu/6.034/wwwbob/svm-notes-long-08. Accessed 4 Aug 2019
Support Vector Machine—Classification (SVM) (2019) Saedsayad.com.:https://www.saedsayad.com/support_vector_machine.htm. Accessed 4 Aug 2019
Smola AJ, Schölkopf B (2004) A tutorial on support vector regression. Stat Comput 14(3):199–222
Article MathSciNet Google Scholar
Ren Y, Suganthan PN, Srikanth N (2014) A novel empirical mode decomposition with support vector regression for wind speed forecasting. IEEE Trans Neural Netw Learn Syst 27(8):1793–1798
Article MathSciNet Google Scholar
Gordon G, Tibshirani R (2019) Karush–Kuhn–Tucker conditions. Cs.cmu.edu.: https://www.cs.cmu.edu/~ggordon/10725-F12/slides/16-kkt.pdf. Accessed 4 Aug 2019
Xu B, Wang N, Chen T, Li M (2015) Empirical evaluation of rectified activations in convolutional network. arXiv. [Online] preprint arXiv:1505.00853. https://arxiv.org/abs/1505.00853. Accessed 4 Aug 2019
Maas AL, Hannun AY, Ng AY (2013) Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of the International Conference on Machine Learning (ICML), 2013, vol 30, no 1, p 3
Liu Q, Brigham K, Rao NS (2017) Estimation and fusion for tracking over long-haul links using artificial neural networks. IEEE Trans Signal Inf Process Over Netw 3(4):760–770
Article MathSciNet Google Scholar
Lipton ZC, Berkowitz J, Elkan C (2015) A critical review of recurrent neural networks for sequence learning. arXiv. [Online] preprint arXiv: 1506.00019. https://arxiv.org/abs/1506.00019. Accessed 4 Aug 2019
Kong W, Dong ZY, Jia Y, Hill DJ, Xu Y, Zhang Y (2017) Short-term residential load forecasting based on LSTM recurrent neural network. IEEE Trans Smart Grid 10(1):841–851
Article Google Scholar
Mikolov T, Karafiát M, Burget L, Černocký J, Khudanpur S (2010) Recurrent neural network based language model. In: Proceedings of the Eleventh Annual Conference of the International Speech Communication Association, 2010
Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 5(2):157–166
Article Google Scholar
Pascanu R, Mikolov T, Bengio Y (2013) On the difficulty of training recurrent neural networks. In: Proceedings of the International Conference on Machine Learning, 2013 Feb, pp 1310–1318
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
Gers FA, Schmidhuber J, Cummins F (2000) Learning to forget: continual prediction with LSTM. Neural Comput 12(10):2451–2471
Article Google Scholar
Arbel N (2018) How LSTM networks solve the problem of vanishing gradients. Medium: https://medium.com/datadriveninvestor/how-do-lstm-networks-solve-the-problem-of-vanishing-gradients-a6784971a577. Accessed 4 Aug 2019
Hahne JM, Biessmann F, Jiang N, Rehbaum H, Farina D, Meinecke FC et al (2014) Linear and nonlinear regression techniques for simultaneous and proportional myoelectric control. IEEE Trans Neural Syst Rehabil Eng 22(2):269–279
Article Google Scholar
Silva-Ramírez EL, Pino-Mejías R, López-Coello M, Cubiles-de-la-Vega MD (2011) Missing value imputation on missing completely at random data using multilayer perceptrons. Neural Netw 24(1):121–129
Article Google Scholar

Download references

Acknowledgements

This work was supported by the Ministry of Science and Technology, Taiwan, R.O.C. (Grant Number MOST 108-2218-E-025-002-MY3) (Grant Number MOST 108-2218-E-001-001).

Author information

Authors and Affiliations

The University of Aizu, Aizuwakamatsu, Fukushima, Japan
Neil Y. Yen
National Taichung University of Science and Technology, Taichung City, Taiwan
Jia-Wei Chang & Jia-Yi Liao
National Chung Hsing University, Taichung City, Taiwan
You-Ming Yong

Authors

Neil Y. Yen
View author publications
You can also search for this author in PubMed Google Scholar
Jia-Wei Chang
View author publications
You can also search for this author in PubMed Google Scholar
Jia-Yi Liao
View author publications
You can also search for this author in PubMed Google Scholar
You-Ming Yong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jia-Wei Chang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yen, N.Y., Chang, JW., Liao, JY. et al. Analysis of interpolation algorithms for the missing values in IoT time series: a case of air quality in Taiwan. J Supercomput 76, 6475–6500 (2020). https://doi.org/10.1007/s11227-019-02991-7

Download citation

Published: 11 September 2019
Issue Date: August 2020
DOI: https://doi.org/10.1007/s11227-019-02991-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Analysis of interpolation algorithms for the missing values in IoT time series: a case of air quality in Taiwan

Abstract

Access this article

Similar content being viewed by others

Reconstructing Environmental Variables with Missing Field Data via End-to-End Machine Learning

Comparison of Estimating Missing Values in IoT Time Series Data Using Different Interpolation Algorithms

Air Quality Index Prediction Based on Deep Recurrent Neural Network

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Analysis of interpolation algorithms for the missing values in IoT time series: a case of air quality in Taiwan

Abstract

Access this article

Similar content being viewed by others

Reconstructing Environmental Variables with Missing Field Data via End-to-End Machine Learning

Comparison of Estimating Missing Values in IoT Time Series Data Using Different Interpolation Algorithms

Air Quality Index Prediction Based on Deep Recurrent Neural Network

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation