Abstract
Artificial neural networks (ANNs) are now ubiquitous in data science. In this respect, Deep-Learning (DL) methods have been developed to address missing data problems. The present study compares state-of-the-art DL Generative Adversarial Network (GAN) models with the well-established kNN algorithm (1951) for numerical data imputation. Using real-world and generated datasets in various missing data scenarios, we show that the good old kNN algorithm is still competitive with powerful DL algorithms for numerical data imputation. This review consolidates the emerging consensus that numerical data imputation does not necessarily require powerful or heavy DL tools.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Full code available at: https://github.com/DeltaFloflo/imputation_comparison.
References
Batista, G.E., Monard, M.C.: A study of k-nearest neighbour as an imputation method. Front. Artif. Intell. Appl. 87 (2002)
Bertsimas, D., Pawlowski, C., Zhuo, Y.D.: From predictive methods to missing data imputation: an optimization approach. J. Mach. Learn. Res. 18, 7133–7171 (2018)
Clark, P., Niblett, T.: The CN2 induction algorithm. Mach. Learn. 3 (1989). https://doi.org/10.1023/A:1022641700528
Dua, D., Graff, C.: UCI Machine Learning Repository: Data Sets. University of California, School of Information and Computer Science, Irvine (2019). https://archive.ics.uci.edu/ml
Fix, E., Hodges, J.L.: Discriminatory analysis. Nonparametric discrimination: consistency properties. Int. Stat. Rev./Revue Internationale de Statistique 57 (1989). https://doi.org/10.2307/1403797
Gelman, A., Hill, J.: Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press (2006). https://doi.org/10.1017/cbo9780511790942
Goodfellow, I.J., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, vol. 3 (2014)
Jadhav, A., Pramod, D., Ramanathan, K.: Comparison of performance of data imputation methods for numeric dataset. Appl. Artif. Intell. 33 (2019). https://doi.org/10.1080/08839514.2019.1637138
Jäger, S., Allhorn, A., Bießmann, F.: A benchmark for data imputation methods. Front. Big Data 4 (2021). https://doi.org/10.3389/fdata.2021.693674
Kalton, G., Kasprzyk, D.: The treatment of missing survey data. Surv. Methodol. 12 (1986)
Lall, R.: How multiple imputation makes a difference. Polit. Anal. 24 (2016). https://doi.org/10.1093/pan/mpw020
Li, S.C.X., Marlin, B.M., Jiang, B.: MisGAN: learning from incomplete data with generative adversarial networks. In: 7th International Conference on Learning Representations, ICLR 2019 (2019)
Little, R.J., Rubin, D.B.: Statistical analysis with missing data. Stat. Anal. Missing Data (2014). https://doi.org/10.1002/9781119013563
Poulos, J., Valle, R.: Missing data imputation for supervised learning. Appl. Artif. Intell. 32 (2018). https://doi.org/10.1080/08839514.2018.1448143
Salzberg, S.L.: C4.5: programs for machine learning by J. Ross Quinlan. Morgan Kaufmann Publishers, Inc., 1993. Mach. Learn. 16, 235–240 (1994). https://doi.org/10.1007/bf00993309
Stekhoven, D.J., Bühlmann, P.: MissForest-non-parametric missing value imputation for mixed-type data. Bioinformatics 28 (2012). https://doi.org/10.1093/bioinformatics/btr597
Troyanskaya, O., et al.: Missing value estimation methods for DNA microarrays. Bioinformatics 17 (2001). https://doi.org/10.1093/bioinformatics/17.6.520
Yoon, J., Jordon, J., Schaar, M.V.D.: Gain: missing data imputation using generative adversarial nets. In: 35th International Conference on Machine Learning, ICML 2018, vol. 13, pp. 9042–9051 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Lalande, F., Doya, K. (2022). Numerical Data Imputation: Choose kNN over Deep Learning. In: Skopal, T., Falchi, F., Lokoč, J., Sapino, M.L., Bartolini, I., Patella, M. (eds) Similarity Search and Applications. SISAP 2022. Lecture Notes in Computer Science, vol 13590. Springer, Cham. https://doi.org/10.1007/978-3-031-17849-8_1
Download citation
DOI: https://doi.org/10.1007/978-3-031-17849-8_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-17848-1
Online ISBN: 978-3-031-17849-8
eBook Packages: Computer ScienceComputer Science (R0)