skip to main content
survey

A Comprehensive Survey on Imputation of Missing Data in Internet of Things

Published: 15 December 2022 Publication History

Abstract

The Internet of Things (IoT) is enabled by the latest developments in smart sensors, communication technologies, and Internet protocols with broad applications. Collecting data from IoT and generating information from these data become tedious tasks in real-life applications when missing data are encountered in datasets. It is of critical importance to deal with the missing data timely for intelligent decision-making. Hence, this survey attempts to provide a structured and comprehensive overview of the research on the imputation of incomplete data in IoT. The article starts by providing an overview of incomplete data based on the architecture of IoT. Then, it discusses the various strategies to handle the missing data, the assumptions used, the computing platform, and the issues related to them. The article also explores the application of imputation in the area of IoT. We encourage researchers and data analysts to use known imputation techniques and discuss various issues and challenges. Finally, potential future directions regarding the method are suggested. We believe this survey will provide a better understanding of the research of incomplete data and serve as a guide for future research.

Supplementary Material

3533381.supp (3533381.supp.pdf)
Supplementary material

References

[1]
Najmeh Abiri, Björn Linse, Patrik Edén, and Mattias Ohlsson. 2019. Establishing strong imputation performance of a denoising autoencoder in a wide range of missing data problems. Neurocomputing 365 (2019), 137–146.
[2]
Deepak Adhikari, Wei Jiang, and Jinyu Zhan. 2021. Imputation using information fusion technique for sensor generated incomplete data with high missing gap. Microprocess. Microsyst. (2021), 103636.
[3]
Deepak Adhikari, Wei Jiang, and Jinyu Zhan. 2021. Iterative imputation using ratio-based imputation for high missing gap. In Proceedings of the International Conference on Intelligent Technology and Embedded Systems (ICITES’21). 1–6.
[4]
Muhammad Aurangzeb Ahmad, Carly Eckert, and Ankur Teredesai. 2019. The Challenge of Imputation in Explainable Artificial Intelligence Models. CoRR abs/1907.12669 (2019). arXiv:1907.12669.
[5]
A. Al-Fuqaha, M. Guizani, M. Mohammadi, M. Aledhari, and M. Ayyash. 2015. Internet of Things: A survey on enabling technologies, protocols, and applications. IEEE Commun. Surv. Tutor. 17, 4 (2015), 2347–2376.
[6]
Md Golam Rabiul Alam, Mohammad Mehedi Hassan, Md. ZIa Uddin, Ahmad Almogren, and Giancarlo Fortino. 2019. Autonomic computation offloading in mobile edge for IoT applications. Fut. Gener. Comput. Syst. 90 (2019), 149–157.
[7]
Paul D. Allison. 2012. Handling missing data by maximum likelihood. In SAS Global Forum, Vol. 2012. Statistical Horizons Haverford, PA, 1038–21.
[8]
Mehran Amiri and Richard Jensen. 2016. Missing data imputation using fuzzy-rough methods. Neurocomputing 205 (2016), 152–164.
[9]
Agung Andiojaya and Haydar Demirhan. 2019. A bagging algorithm for the imputation of missing values in time series. Expert Syst. Appl. 129 (2019), 10–26.
[10]
Rebecca R. Andridge and Roderick J. A. Little. 2010. A review of hot deck imputation for survey non-response. Int. Stat. Rev. 78, 1 (2010), 40–64.
[11]
Ibrahim Berkan Aydilek and Ahmet Arslan. 2013. A hybrid method for imputation of missing values using optimized fuzzy c-means with support vector regression and a genetic algorithm. Inf. Sci. 233 (2013), 25–35.
[12]
Melissa J. Azur, Elizabeth A. Stuart, Constantine Frangakis, and Philip J. Leaf. 2011. Multiple imputation by chained equations: What is it and how does it work? Int. J. Methods Psychiatr. Res. 20, 1 (2011), 40–49.
[13]
S. Balasubramanian and T. Meyyappan. 2019. Enhancing the computational intelligence of smart fog gateway with boundary-constrained dynamic time warping based imputation and data reduction. In Proceedings of the 3rd International Conference on Imaging, Signal Processing and Communication (ICISPC’19). 15–23. DOI:
[14]
Xavier Basagaña, Jose Barrera-Gómez, Marta Benet, Josep M. Antó, and Judith Garcia-Aymerich. 2013. A framework for multiple imputation in cluster analysis. Am. J. Epidemiol. 177, 7 (2013), 718–725.
[15]
Mohamed-Aymen Ben Aissia, Fateh Chebana, and Taha B. M. J. Ouarda. 2017. Multivariate missing data in hydrology–Review and applications. Adv. Water Resour. 110 (2017), 299–309.
[16]
Michael R. Berthold and Klaus-Peter Huber. 1998. Missing Values and learning of fuzzy rules. Int. J. Uncertain. Fuzz. Knowl.-Bas. Syst. 6, 2 (April1998), 171–178.
[17]
Dimitris Bertsimas, Arthur Delarue, and Jean Pauphilet. 2021. Prediction with missing data. arXiv:2104.03158 [stat.ML].
[18]
James C. Bezdek. 2013. Pattern Recognition with Fuzzy Objective Function Algorithms. Springer Science & Business Media.
[19]
Christopher M. Bishop et al. 1995. Neural Networks for Pattern Recognition. Oxford University Press.
[20]
Y. Boiko, C. Lin, I. Kiringa, and T. Yeap. 2019. Navigational data imputation with GPS pinning in compositional Kalman filter for IoT systems. In Proceedings of the IEEE International Symposium on Robotic and Sensors Environments (ROSE’19). 1–7.
[21]
Guillem Boquet, Antoni Morell, Javier Serrano, and Jose Lopez Vicario. 2020. A variational autoencoder solution for road traffic forecasting systems: Missing data imputation, dimension reduction, model selection and anomaly detection. Transport. Res. C: Emerg. Technol. 115 (2020), 102622.
[22]
George E. P. Box, Gwilym M. Jenkins, Gregory C. Reinsel, and Greta M. Ljung. 2015. Time Series Analysis: Forecasting and Control. John Wiley &. Sons.
[23]
P. M. T. Broersen and R. Bos. 2006. Time-series analysis if data are randomly missing. IEEE Trans. Instrum. Meas. 55, 1 (2006), 79–84.
[24]
Renato Bruni, Cinzia Daraio, and Davide Aureli. 2021. Imputation techniques for the reconstruction of missing interconnected data from higher Educational Institutions. Knowl.-Bas. Syst. 212 (2021), 106512.
[25]
Fanyu Bu, Zhikui Chen, Qingchen Zhang, and Laurence T. Yang. 2016. Incomplete high-dimensional data imputation algorithm using feature selection and clustering analysis on cloud. J. Supercomput. 72, 8 (2016), 2977–2990.
[26]
S. van Buuren and Karin Groothuis-Oudshoorn. 2010. Mice: Multivariate imputation by chained equations in R. J. Stat. Softw. (2010), 1–68.
[27]
Zhengping Che, Sanjay Purushotham, Kyunghyun Cho, David Sontag, and Yan Liu. 2018. Recurrent neural networks for multivariate time series with missing values. Sci. Rep. 8, 1 (2018), 1–12.
[28]
Michail Cheliotis, Christos Gkerekos, Iraklis Lazakis, and Gerasimos Theotokatos. 2019. A novel data condition and performance hybrid imputation method for energy efficient operations of marine systems. Ocean Eng. 188 (2019), 106220.
[29]
C. Chen, S. Jiao, S. Zhang, W. Liu, L. Feng, and Y. Wang. 2018. TripImputor: Real-time imputing taxi trip purpose leveraging multi-sourced urban data. IEEE Trans. Intell. Transport. Syst. 19, 10 (2018), 3292–3304.
[30]
Jiahua Chen and Jun Shao. 2000. Nearest neighbor imputation for survey data. J. Offic. Stat. 16, 2 (2000), 113–131.
[31]
Xinyu Chen, Zhaocheng He, and Lijun Sun. 2019. A Bayesian tensor decomposition approach for spatiotemporal traffic data imputation. Transport. Res. C: Emerg. Technol. 98 (2019), 73–84.
[32]
Ching-Hsue Cheng, Chia-Pang Chan, and Yu-Jheng Sheu. 2019. A novel purity-based k nearest neighbors imputation method and its application in financial distress prediction. Eng. Appl. Artif. Intell. 81 (2019), 283–299.
[33]
Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Mach. Learn. 20 (1995), 273–297.
[34]
MIT Critical Data and M. Komorowski. 2016. Secondary Analysis of Electronic Health Records. Springer.
[35]
Shounak Datta, Debaleena Misra, and Swagatam Das. 2016. A feature weighted penalty based dissimilarity measure for k-nearest neighbor classification with missing features. Pattern Recogn. Lett. 80 (2016), 231–237.
[36]
Arthur P. Dempster, Nan M. Laird, and Donald B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. Ser. B (Methodol.) 39, 1 (1977), 1–22.
[37]
Yaohui Ding and Arun Ross. 2012. A comparison of imputation methods for handling missing scores in biometric fusion. Pattern Recogn. 45, 3 (2012), 919–933.
[38]
Zengyu Ding, Gang Mei, Salvatore Cuomo, Yixuan Li, and Nengxiong Xu. 2020. Comparison of estimating missing values in IoT time series data using different interpolation algorithms. Int. J. Parallel Program. 48 (2020), 534–548.
[39]
J. K. Dixon. 1979. Pattern recognition with partly missing data. IEEE Trans. Syst. Man Cybernet. 9, 10 (1979), 617–621.
[40]
Yiran Dong and Chao-Ying Joanne Peng. 2013. Principled missing data methods for researchers. SpringerPlus 2, 1 (2013), 222.
[41]
Timothy J. Durham, Maxwell W. Libbrecht, J. Jeffry Howbert, Jeff Bilmes, and William Stafford Noble. 2018. PREDICTD parallel epigenomics data imputation with cloud-based tensor decomposition. Nat. Commun. 9, 1 (2018), 1–15.
[42]
Craig K. Enders. 2010. Applied Missing Data Analysis. Guilford Press.
[43]
A. Farhangfar, L. A. Kurgan, and W. Pedrycz. 2007. A novel framework for imputation of missing values in databases. IEEE Trans. Syst. Man. Cybernet. A Syst. Hum. 37, 5 (2007), 692–709.
[44]
B. Fekade, T. Maksymyuk, M. Kyryk, and M. Jo. 2018. Probabilistic recovery of incomplete sensed data in IoT. IEEE IoT J. 5, 4 (2018), 2282–2292.
[45]
Zhun ga Liu, Quan Pan, Jean Dezert, and Arnaud Martin. 2016. Adaptive imputation of missing values for incomplete pattern classification. Pattern Recogn. 52 (2016), 85–95.
[46]
Pedro J. García-Laencina, José-Luis Sancho-Gómez, and Aníbal R. Figueiras-Vidal. 2010. Pattern classification with missing data: A review. Neur. Comput. Appl. 19, 2 (2010), 263–282.
[47]
Chandan Gautam and Vadlamani Ravi. 2015. Counter propagation auto-associative neural network based data imputation. Inf. Sci. 325 (2015), 288–299.
[48]
Andrew Gelman and Jennifer Hill. 2006. Chapter 25 on missing data imputation. In Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press.
[49]
Victor Gomez, Agustin Maravall, and Danie Pena. 1999. Missing observations in ARIMA models: Skipping approach versus additive outlier approach. J. Econometr. 88, 2 (1999), 341–363.
[50]
M. P. Gómez-Carracedo, J. M. Andrade, P. López-Mahía, S. Muniategui, and D. Prada. 2014. A practical comparison of single and multiple imputation methods to handle complex missing data in air quality datasets. Chemometr. Intell. Lab. Syst. 134 (2014), 23–33.
[51]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems. 2672–2680.
[52]
John W. Graham. 2009. Missing data analysis: Making it work in the real world. Annu. Rev. Psychol. 60 (2009), 549–576.
[53]
Davide Andrea Guastella, Guilhem Marcillaud, and Cesare Valenti. 2021. Edge-based missing data imputation in large-scale environments. Information 12, 5 (2021).
[54]
James Honaker and Gary King. 2010. What to do about missing values in time-series cross-section data. Am. J. Pol. Sci. 54, 2 (2010), 561–581.
[55]
James Honaker, Gary King, Matthew Blackwell, et al. 2011. Amelia II: A program for missing data. J. Stat. Softw. 45, 7 (2011), 1–47.
[56]
Feng Honghai, Chen Guoshun, Yin Cheng, Yang Bingru, and Chen Yumei. 2005. A SVM regression based approach to filling in missing values. In Proceedings of the International Conference on Knowledge-Based and Intelligent Information and Engineering Systems. Springer, 581–587.
[57]
Xiaoshui Huang, Fujin Zhu, Lois Holloway, and Ali Haidar. 2020. Causal discovery from incomplete data using an encoder and reinforcement learning. CoRR abs/2006.05554 (2020).
[58]
W. L. Junger and A. [Ponce de Leon]. 2015. Imputation of missing data in time series for air pollutants. Atmos. Environ. 102 (2015), 96–104.
[59]
Heikki Junninen, Harri Niska, Kari Tuppurainen, Juhani Ruuskanen, and Mikko Kolehmainen. 2004. Methods for imputation of missing values in air quality data sets. Atmos. Environ. 38, 18 (2004), 2895–2907.
[60]
Thomas Kautz, Benjamin H. Groh, Julius Hannink, Ulf Jensen, Holger Strubberg, and Bjoern M. Eskofier. 2017. Activity recognition in beach volleyball using a deep convolutional neural network. Data Min. Knowl. Discov. (2017).
[61]
J. M. Keller, M. R. Gray, and J. A. Givens. 1985. A fuzzy K-nearest neighbor algorithm. IEEE Trans. Syst. Man Cybernet. SMC-15, 4 (1985), 580–585.
[62]
H. A. Khorshidi, M. Kirley, and U. Aickelin. 2020. Machine learning with incomplete datasets using multi-objective optimization models. In Proceedings of the International Joint Conference on Neural Networks (IJCNN’20). 1–8.
[63]
Diederik P. Kingma and Max Welling. 2013. Auto-encoding variational Bayes. arXiv:1312.6114 [stat.ML].
[64]
Marietta Kokla, Jyrki Virtanen, Marjukka Kolehmainen, Jussi Paananen, and Kati Hanhineva. 2019. Random forest-based imputation outperforms other methods for imputing LC-MS metabolomics data: A comparative study. BMC Bioinform. 20, 1 (2019), 1–11.
[65]
Gueorgi Kossinets. 2006. Effects of missing data in social networks. Soc. Netw. 28, 3 (2006), 247–268.
[66]
J. Krstulovic, V. Miranda, A. J. A. Simões Costa, and J. Pereira. 2013. Towards an auto-associative topology state estimator. IEEE Trans. Power Syst. 28, 3 (2013), 3311–3318.
[67]
İbrahim Kök and Suat Özdemir. 2021. DeepMDP: A novel deep-learning-based missing data prediction protocol for IoT. IEEE IoT J. 8, 1 (2021), 232–243. DOI:
[68]
Qiujun Lan, Xuqing Xu, Haojie Ma, and Gang Li. 2020. Multivariable data imputation for the analysis of incomplete credit data. Expert Syst. Appl. 141 (2020), 112926.
[69]
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 13 (2015), 436–444.
[70]
Collins Achepsah Leke and Tshilidzi Marwala. 2019. Deep Learning and Missing Data in Engineering Systems. Springer.
[71]
Dan Li, Jitender Deogun, William Spaulding, and Bill Shuart. 2004. Towards missing data imputation: A study of fuzzy k-means clustering method. In Proceedings of the International Conference on Rough Sets and Current Trends in Computing. Springer, 573–579.
[72]
Linchao Li, Bowen Du, Yonggang Wang, Lingqiao Qin, and Huachun Tan. 2020. Estimation of missing values in heterogeneous traffic data: Application of multimodal deep learning model. Knowl.-Bas. Syst. 194 (2020), 105592.
[73]
J. Lin, W. Yu, N. Zhang, X. Yang, H. Zhang, and W. Zhao. 2017. A survey on Internet of Things: Architecture, enabling technologies, security and privacy, and applications. IEEE IoT J. 4, 5 (2017), 1125–1142.
[74]
Wei-Chao Lin and Chih-Fong Tsai. 2020. Missing value imputation: A review and analysis of the literature (2006–2017). Artif. Intell. Rev. 53, 2 (2020), 1487–1509.
[75]
Roderick J. A. Little and Donald B. Rubin. 2019. Statistical Analysis with Missing Data. Vol. 793. John Wiley &. Sons.
[76]
X. Liu, X. Zhu, M. Li, L. Wang, E. Zhu, T. Liu, M. Kloft, D. Shen, J. Yin, and W. Gao. 2020. Multiple kernel kk-means with incomplete kernels. IEEE Trans. Pattern Anal. Mach. Intell. 42, 5 (2020), 1191–1204.
[77]
Yushan Liu and Steven D. Brown. 2013. Comparison of five iterative imputation methods for multivariate classification. Chemometr. Intell. Lab. Syst. 120 (2013), 106–115.
[78]
Yuehua Liu, Tharam Dillon, Wenjin Yu, Wenny Rahayu, and Fahed Mostafa. 2020. Missing value imputation for industrial IoT sensor data with large gaps. IEEE Internet of Things Journal 7, 8 (2020), 6855–6867.
[79]
C. Lu and Y. Mei. 2018. An imputation method for missing data based on an extreme learning machine auto-encoder. IEEE Access 6 (2018), 52930–52935.
[80]
Julián Luengo, Salvador García, and Francisco Herrera. 2012. On the choice of the best imputation methods for missing values considering three groups of classification methods. Knowl. Inf. Syst. 32, 1 (2012), 77–108.
[81]
Julián Luengo, Salvador García, and Francisco Herrera. 2010. A study on the use of imputation methods for experimentation with radial basis function network classifiers handling missing attribute values: The good synergy between RBFNs and EventCovering method. Neural Netw. 23, 3 (2010), 406–418.
[82]
Ivan Lujic, Vincenzo De Maio, and Ivona Brandic. 2020. Resilient edge data management framework. IEEE Trans. Serv. Comput. 13, 4 (2020), 663–674. DOI:
[83]
Yonghong Luo, Xiangrui Cai, Ying ZHANG, Jun Xu, and Yuan xiaojie. 2018. Multivariate time series imputation with generative adversarial networks. In Advances in Neural Information Processing Systems, Vol. 31, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.). Curran Associates, Inc., 1596–1607.
[84]
Jun Ma, Jack C. P. Cheng, Yuexiong Ding, Changqing Lin, Feifeng Jiang, Mingzhu Wang, and Chong Zhai. 2020. Transfer learning for long-interval consecutive missing values imputation without external features in air pollution time series. Advanced Engineering Informatics 44 (2020), 101092.
[85]
Jun Ma, Jack C. P. Cheng, Feifeng Jiang, Weiwei Chen, Mingzhu Wang, and Chong Zhai. 2020. A bi-directional missing data imputation scheme based on LSTM and transfer learning for building energy data. Energy Build. 216 (2020), 109941.
[86]
Zhihua Ma and Guanghui Chen. 2018. Bayesian methods for dealing with missing data problems. J. Kor. Stat. Soc. 47, 3 (2018), 297–313.
[87]
M. Mardani, G. Mateos, and G. B. Giannakis. 2015. Subspace learning and imputation for streaming big data matrices and tensors. IEEE Trans. Sign. Process. 63, 10 (2015), 2663–2677.
[88]
Lim Kian Ming, Loo Chu Kiong, and Lim Way Soong. 2011. Autonomous and deterministic supervised fuzzy clustering with data imputation capabilities. Appl. Soft Comput. 11, 1 (2011), 1117–1125.
[89]
Ho MingKang and Fadhilah Yusof. 2012. Application of self-organizing map (SOM) in missing daily rainfall data in Malaysia. Int. J. Comput. Appl. 48, 5 (June2012), 23–28.
[90]
V. Miranda, J. Krstulovic, H. Keko, C. Moreira, and J. Pereira. 2012. Reconstructing missing data in state estimation with autoencoders. IEEE Trans. Power Syst. 27, 2 (2012), 604–611.
[91]
Juan Javier Miró, Vicente Caselles, and María José Estrela. 2017. Multiple imputation of rainfall missing data in the Iberian Mediterranean context. Atmos. Res. 197 (2017), 313–330.
[92]
Andriy Mnih and Russ R. Salakhutdinov. 2008. Probabilistic matrix factorization. In Advances in Neural Information Processing Systems. 1257–1264.
[93]
Philip R. C. Nelson, Paul A. Taylor, and John F. MacGregor. 1996. Missing data methods in PCA and PLS: Score calculations with incomplete observations. Chemometr. Intell. Lab. Syst. 35, 1 (1996), 45–65.
[94]
S. Nikfalazar, C. H. Yeh, S. Bedingfield, and H. A. Khorshidi. 2017. A new iterative fuzzy clustering algorithm for multiple imputation of missing data. In Proceedings of the IEEE International Conference on Fuzzy Systems (FUZZ-IEEE’17).
[95]
S. Nikfalazar, C. H. Yeh, S. Bedingfield, and H. A. Khorshidi. 2020. Missing data imputation using decision trees and fuzzy clustering with iterative learning. Knowl. Inf. Syst. 62, 6 (2020), 1–19.
[96]
Kancherla Jonah Nishanth and Vadlamani Ravi. 2016. Probabilistic neural network based categorical data imputation. Neurocomputing 218 (2016), 17–25.
[97]
Y. Nishimura, K. Sudoh, G. Neubig, and S. Nakamura. 2020. Multi-source neural machine translation with missing data. IEEE/ACM Trans. Aud. Speech Lang. Process. 28 (2020), 569–580. DOI:
[98]
Mohamed Noor Norazian, Ahmad Shukri, Prof Yahaya, Nor Azam, Prof Ramli, Noor Faizah Fitri, Md Yusof, and Abdullah Mohd Mustafa Al Bakri. 2013. Roles of imputation methods for filling the missing values: A review. Adv. Environ. Biol. 7 (12013), 3861–3869.
[99]
Mohamed Noor Norazian, Yahaya Ahmad Shukri, Ramli Nor Azam, and Abdullah Mohd Mustafa Al Bakri. 2008. Estimation of missing values in air pollution data using single imputation techniques. Science Asia 34, 3 (2008), 341–345.
[100]
H. Reed Ogrosky, Samuel N. Stechmann, Nan Chen, and Andrew J. Majda. 2019. Singular spectrum analysis with conditional predictions for real-time state estimation and forecasting. Geophys. Res. Lett. 46, 3 (2019), 1851–1860.
[101]
M. S. Osman, A. M. Abu-Mahfouz, and P. R. Page. 2018. A survey on data imputation techniques: Water distribution system as a use case. IEEE Access 6 (2018), 63279–63291.
[102]
A. Otgonbayar, Z. Pervez, and K. Dahal. 2020. \(X-BAND\): Expiration band for anonymizing varied data streams. IEEE IoT J. 7, 2 (2020), 1438–1450.
[103]
Jendrik Poloczek, Nils André Treiber, and Oliver Kramer. 2014. KNN regression as geo-imputation method for spatio-temporal wind data. In International Joint Conference SOCO’14-CISIS’14-ICEUTE’14. Springer, 185–193.
[104]
A. Purwar and S. K. Singh. 2014. Empirical evaluation of algorithms to impute missing values for financial dataset. In Proceedings of the International Conference on Issues and Challenges in Intelligent Computing Techniques (ICICT). 652–656.
[105]
Yongsong Qin, Shichao Zhang, Xiaofeng Zhu, Jilian Zhang, and Chengqi Zhang. 2009. POP algorithm: Kernel-based imputation to treat missing values in knowledge discovery from databases. Expert Syst. Appl. 36, 2, Part 2 (2009), 2794–2804.
[106]
L. Qu, L. Li, Y. Zhang, and J. Hu. 2009. PPCA-based missing data imputation for traffic flow volume: A systematical approach. IEEE Trans. Intell. Transport. Syst. 10, 3 (2009), 512–522.
[107]
María Elisa Quinteros, Siyao Lu, Carola Blazquez, Juan Pablo Cárdenas-R, Ximena Ossa, Juana-María Delgado-Saborit, Roy M. Harrison, and Pablo Ruiz-Rudolph. 2019. Use of data imputation tools to reconstruct incomplete air quality datasets: A case-study in Temuco, Chile. Atmos. Environ. 200 (2019), 40–49.
[108]
Geaur Rahman and Zahidul Islam. 2011. A decision tree-based missing value imputation technique for data pre-processing. In Proceedings of the 9th Australasian Data Mining Conference-Volume 121. 41–50.
[109]
Md Geaur Rahman and Md Zahidul Islam. 2013. Missing value imputation using decision trees and decision forests by splitting and merging records: Two novel techniques. Knowl.-Bas. Syst. 53 (2013), 51–65.
[110]
Wajeeha Rashid and Manoj Kumar Gupta. 2021. A perspective of missing value imputation approaches. In Advances in Computational Intelligence and Communication Technology. Springer, 307–315.
[111]
R. Razavi-Far and M. Saif. 2016. Imputation of missing data using fuzzy neighborhood density-based clustering. In Proceedings of the IEEE International Conference on Fuzzy Systems (FUZZ-IEEE’16). 1834–1841.
[112]
Donald B. Rubin. 2004. Multiple Imputation for Nonresponse in Surveys. Vol. 81. John Wiley &. Sons.
[113]
Maytal Saar-Tsechansky and Foster Provost. 2007. Handling missing values when applying classification models. J. Mach. Learn. Res. 8 (December2007), 1623–1657.
[114]
Z. Sahri, R. Yusof, and J. Watada. 2014. FINNIM: Iterative imputation of missing values in dissolved gas analysis dataset. IEEE Trans. Industr. Inform. 10, 4 (2014), 2093–2102.
[115]
Tariq Samad and Steven A. Harp. 1992. Self-organization with partial data. Network 3, 2 (1992), 205–212.
[116]
Roosevelt Sardinha, Aline Paes, and Gerson Zaverucha. 2018. Revising the structure of Bayesian network classifiers in the presence of missing data. Inf. Sci. 439-440 (2018), 108–124.
[117]
Joseph L. Schafer. 1997. Analysis of Incomplete Multivariate Data. CRC Press.
[118]
Joseph L. Schafer and John W. Graham. 2002. Missing data: Our view of the state of the art. Psychol. Methods 7, 2 (2002), 147.
[119]
Tapio Schneider. 2001. Analysis of incomplete climate data: Estimation of mean values and covariance matrices and imputation of missing values. J. Clim. 14, 5 (32001), 853–871.
[120]
David H. Schoellhamer. 2001. Singular spectrum analysis for time series with missing data. Geophys. Res. Lett. 28, 16 (2001), 3187–3190.
[121]
Michael Schomaker and Christian Heumann. 2018. Bootstrap inference when using multiple imputation. Stat. Med. 37, 14 (2018), 2252–2266.
[122]
Amir Masoud Sefidian and Negin Daneshpour. 2019. Missing value imputation using a novel grey based fuzzy c-means, mutual information based feature selection, and regression model. Expert Syst. Appl. 115 (2019), 68–94.
[123]
Anoop D. Shah, Jonathan W. Bartlett, James Carpenter, Owen Nicholas, and Harry Hemingway. 2014. Comparison of random forest and parametric imputation models for imputing missing data using MICE: A CALIBER study. Am. J. Epidemiol. 179, 6 (2014), 764–774.
[124]
W. Shao, X. Shi, and P. S. Yu. 2013. Clustering on multiple incomplete datasets via collective kernel learning. In Proceedings of the IEEE 13th International Conference on Data Mining. 1181–1186.
[125]
Mohamed Abu Sharkh and Mohamed Kalil. 2018. A quest for optimizing the data processing decision for cloud-fog hybrid environments. In Proceedings of the IEEE International Conference on Communications Workshops (ICC Workshops’18). 1–6. DOI:
[126]
Y. Shen, F. Peng, and B. Li. 2015. Improved singular spectrum analysis for time series with missing data. Nonlin. Process. Geophys. 22, 4 (2015), 371–376.
[127]
Xingjian Shi, Zhourong Chen, Hao Wang, Dit-Yan Yeung, Wai-kin Wong, and Wang-chun Woo. 2015. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In Advances in Neural Information Processing Systems 28, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett (Eds.). Curran Associates, Inc., 802–810.
[128]
Esther-Lydia Silva-Ramírez, Rafael Pino-Mejías, and Manuel López-Coello. 2015. Single imputation with multilayer perceptron and multiple imputation combining multilayer perceptron and k-nearest neighbours for monotone patterns. Appl. Soft Comput. 29 (2015), 65–74.
[129]
Esther-Lydia Silva-Ramírez, Rafael Pino-Mejías, Manuel López-Coello, and María-Dolores Cubiles de-la Vega. 2011. Missing value imputation on missing completely at random data using multilayer perceptrons. Neural Netw. 24, 1 (2011), 121–129.
[130]
Marina Soley-Bori. 2013. Dealing with missing data: Key assumptions and methods for applied analysis. Technical Report. Boston University. 20 pages.
[131]
H. Song and D. A. Szafir. 2019. Whereś my data? Evaluating visualizations with missing data. IEEE Trans. Vis. Comput. Graph. 25, 1 (2019), 914–924. DOI:
[132]
Donald F. Specht. 1990. Probabilistic neural networks. Neural Netw. 3, 1 (1990), 109–118.
[133]
Reinaldo Squillante, Diolino J. [Santos Fo], Newton Maruyama, Fabrício Junqueira, Lucas A. Moscato, Francisco Y. Nakamoto, Paulo E. Miyagi, and Jun Okamoto. 2018. Modeling accident scenarios from databases with missing data: A probabilistic approach for safety-related systems design. Safe. Sci. 104 (2018), 119–134.
[134]
Daniel J. Stekhoven and Peter Bühlmann. 2011. MissForest-non-parametric missing value imputation for mixed-type data. Bioinformatics 28, 1 (102011), 112–118.
[135]
Samantha Stephens, Joseph Beyene, Mark S. Tremblay, Guy Faulkner, Eleanor Pullnayegum, and Brian M. Feldman. 2018. Strategies for dealing with missing accelerometer data. Rheum. Dis. Clin. North Am. 44, 2 (2018), 317–326.
[136]
Chuan Sun, Yueyi Chen, and Cheng Cheng. 2021. Imputation of missing data from offshore wind farms using spatio-temporal correlation and feature correlation. Energy 229 (2021), 120777.
[137]
S. Tak, S. Woo, and H. Yeo. 2016. Data-driven imputation method for traffic data in sectional units of road links. IEEE Trans. Intell. Transport. Syst. 17, 6 (2016), 1762–1771.
[138]
Masayoshi Takahashi. 2017. Statistical inference in missing data by MCMC and non-MCMC multiple imputation algorithms: Assessing the effects of between-imputation iterations. Data Sci. J. 16 (2017).
[139]
Fei Tang and Hemant Ishwaran. 2017. Random forest missing data algorithms. Stat. Anal. Data Min. 10, 6 (2017), 363–377.
[140]
Ramesh S. V. Teegavarapu. 2020. Precipitation imputation with probability space-based weighting methods. J. Hydrol. 581 (2020), 124447.
[141]
Olga Troyanskaya, Michael Cantor, Gavin Sherlock, Pat Brown, Trevor Hastie, Robert Tibshirani, David Botstein, and Russ B. Altman. 2001. Missing value estimation methods for DNA microarrays. Bioinformatics 17, 6 (62001), 520–525.
[142]
H. Turabieh, M. Mafarja, and S. Mirjalili. 2019. Dynamic adaptive network-based fuzzy inference system (D-ANFIS) for the imputation of missing data for Internet of medical things applications. IEEE IoT J. 6, 6 (2019), 9316–9325.
[143]
Aashma Uprety and Danda B. Rawat. 2021. Reinforcement learning for IoT security: A comprehensive survey. IEEE IoT J. 8, 11 (2021), 8693–8706. DOI:
[144]
Stef Van Buuren. 2018. Flexible Imputation of Missing Data. CRC Press.
[145]
Christian Velasco-Gallego and Iraklis Lazakis. 2020. Real-time data-driven missing data imputation for short-term sensor data of marine systems. A comparative study. Ocean Eng. 218 (2020), 108261.
[146]
G. Wang, J. Lu, K. Choi, and G. Zhang. 2020. A transfer-based additive LS-SVM classifier for handling missing data. IEEE Trans. Cybernet. 50, 2 (2020), 739–752.
[147]
Haolin Wang, Xuhai Tan, Zhilin Huang, Bo Pan, and Jie Tian. 2020. Mining incomplete clinical data for the early assessment of Kawasaki disease based on feature clustering and convolutional neural networks. Artif. Intell. Med. 105 (2020), 101859.
[148]
Ming-Chang Wang, Chih-Fong Tsai, and Wei-Chao Lin. 2021. Towards missing electric power data imputation for energy management systems. Expert Syst. Appl. 174 (2021), 114743.
[149]
Ian R. White, Patrick Royston, and Angela M. Wood. 2011. Multiple imputation using chained equations: Issues and guidance for practice. Stat. Med. 30, 4 (2011), 377–399.
[150]
Shengjie Xu, Yi Qian, and Rose Qingyang Hu. 2019. Data-driven network intelligence for anomaly detection. IEEE Netw. 33, 3 (2019), 88–95.
[151]
X. Xu, Y. Lei, and Z. Li. 2020. An incorrect data detection method for big data cleaning of machinery condition monitoring. IEEE Trans. Industr. Electr. 67, 3 (2020), 2326–2336.
[152]
Chen Ye, Hongzhi Wang, Wenbo Lu, and Jianzhong Li. 2020. Effective Bayesian-network-based missing value imputation enhanced by crowdsourcing. Knowl.-Bas. Syst. 190 (2020), 105199.
[153]
Xiuwen Yi, Yu Zheng, Junbo Zhang, and Tianrui Li. 2016. ST-MVL: Filling missing values in geo-sensory time series data. In Proceedings of the 25th International Joint Conference on Artificial Intelligence.
[154]
Xiuwen Yi, Yu Zheng, Junbo Zhang, and Tianrui Li. 2016. ST-MVL: Filling missing values in geo-sensory time series data. In Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI’16). AAAI Press, 2704–2710.
[155]
Qiyue Yin, Shu Wu, and Liang Wang. 2017. Unified subspace learning for incomplete and unlabeled multi-view data. Pattern Recogn. 67 (2017), 313–327.
[156]
Jinsung Yoon, James Jordon, and Mihaela Van Der Schaar. 2018. Gain: Missing data imputation using generative adversarial nets. arXiv:1806.02920.
[157]
Yang Yuan. 2014. Sensitivity analysis in multiple imputation for missing data. In Proceedings of the SAS Global Forum 2014.
[158]
Lotfi A. Zadeh. 1965. Fuzzy sets. Inf. Contr. 8, 3 (1965), 338–353.
[159]
Junlin Zhang, Samuel Oluwarotimi Williams, and Haoxiang Wang. 2018. Intelligent computing system based on pattern recognition and data mining algorithms. Sust. Comput. Inform. Syst. 20 (2018), 192–202.
[160]
Q. Zhang, Q. Yuan, C. Zeng, X. Li, and Y. Wei. 2018. Missing data reconstruction in remote sensing image with a unified spatial-temporal-spectral deep convolutional neural network. IEEE Trans. Geosci. Remote Sens. 56, 8 (2018), 4274–4288.
[161]
Y. Zhang, P. J. Thorburn, W. Xiang, and P. Fitch. 2019. SSIM-A deep learning approach for recovering missing time series sensor data. IEEE IoT J. 6, 4 (2019), 6618–6628.
[162]
Zhongrong Zhang, Xuan Yang, Hao Li, Weide Li, Haowen Yan, and Fei Shi. 2017. Application of a novel hybrid method for spatiotemporal data imputation: A case study of the Minqin County groundwater level. J. Hydrol. 553 (2017), 384–397.
[163]
Liang Zhao, Zhikui Chen, Yi Yang, Z. [Jane Wang], and Victor C. M. Leung. 2018. Incomplete multi-view clustering via deep semantic mapping. Neurocomputing 275 (2018), 1053–1062.
[164]
L. Zhao, Z. Chen, Z. Yang, Y. Hu, and M. S. Obaidat. 2018. Local similarity imputation based on fast clustering for incomplete data in cyber-physical systems. IEEE Syst. J. 12, 2 (2018), 1610–1620.
[165]
X. Zhu and X. Wu. 2005. Cost-constrained data acquisition for intelligent data preparation. IEEE Trans. Knowl. Data Eng. 17, 11 (2005), 1542–1556.
[166]
X. Zhu, S. Zhang, Z. Jin, Z. Zhang, and Z. Xu. 2011. Missing value estimation for mixed-attribute data sets. IEEE Trans. Knowl. Data Eng. 23, 1 (2011), 110–121.
[167]
Y. Zhuang, R. Ke, and Y. Wang. 2019. Innovative method for traffic data imputation based on convolutional neural network. IET Intell. Transport Syst. 13, 4 (2019), 605–613.

Cited By

View all
  • (2025)Addressing Missing Data Challenges in Geriatric Health Monitoring: A Study of Statistical and Machine Learning Imputation MethodsSensors10.3390/s2503061425:3(614)Online publication date: 21-Jan-2025
  • (2025)FICformer: A Multi-factor Fuzzy Bayesian Imputation Cross-former for Big Data-driven Agricultural Decision Support SystemsIEEE Transactions on Fuzzy Systems10.1109/TFUZZ.2024.336321333:1(69-81)Online publication date: Jan-2025
  • (2025)A shaping two-stage anomaly data recovery method based on multi-norm joint optimization under energy internetMeasurement10.1016/j.measurement.2024.115949242(115949)Online publication date: Jan-2025
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Computing Surveys
ACM Computing Surveys  Volume 55, Issue 7
July 2023
813 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/3567472
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 December 2022
Online AM: 23 May 2022
Accepted: 23 April 2022
Revised: 30 November 2021
Received: 02 July 2021
Published in CSUR Volume 55, Issue 7

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Imputation of missing data
  2. multiple imputations
  3. machine learning
  4. deep learning
  5. computing platform for incomplete data
  6. Internet of Things

Qualifiers

  • Survey
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1,124
  • Downloads (Last 6 weeks)105
Reflects downloads up to 18 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Addressing Missing Data Challenges in Geriatric Health Monitoring: A Study of Statistical and Machine Learning Imputation MethodsSensors10.3390/s2503061425:3(614)Online publication date: 21-Jan-2025
  • (2025)FICformer: A Multi-factor Fuzzy Bayesian Imputation Cross-former for Big Data-driven Agricultural Decision Support SystemsIEEE Transactions on Fuzzy Systems10.1109/TFUZZ.2024.336321333:1(69-81)Online publication date: Jan-2025
  • (2025)A shaping two-stage anomaly data recovery method based on multi-norm joint optimization under energy internetMeasurement10.1016/j.measurement.2024.115949242(115949)Online publication date: Jan-2025
  • (2025)Boundary-enhanced time series data imputation with long-term dependency diffusion modelsKnowledge-Based Systems10.1016/j.knosys.2024.112917310(112917)Online publication date: Feb-2025
  • (2025)Missing Data Imputation Approach for IoT Using Machine LearningArtificial Intelligence and High Performance Computing in the Cloud10.1007/978-3-031-78698-3_13(258-273)Online publication date: 1-Jan-2025
  • (2024)Enhancing Material Property Predictions through Optimized KNN Imputation and Deep Neural Network ModelingIgMin Research10.61927/igmin1972:6(425-431)Online publication date: 13-Jun-2024
  • (2024)OXYGENERATORProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693419(33219-33242)Online publication date: 21-Jul-2024
  • (2024)A Classification Method for Incomplete Mixed Data Using Imputation and Feature SelectionApplied Sciences10.3390/app1414599314:14(5993)Online publication date: 9-Jul-2024
  • (2024)Deep learning based decision tree ensembles for incomplete medical datasetsTechnology and Health Care10.3233/THC-22051432:1(75-87)Online publication date: 5-Jan-2024
  • (2024)Utilization of deep learning in ideological and political educationJournal of Intelligent Systems10.1515/jisys-2024-020633:1Online publication date: 15-Nov-2024
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media