Abstract
Missing data imputation aims to accurately impute the unobserved regions with complete data in the real world. Although many current methods have made remarkable advances, the local homogenous regions, especially in boundary, and the reason of the imputed data are still the two most challenging issues. To address these issues, we propose a novel Generative Adversarial Guider Imputation Network (GAGIN) based on generative adversarial network (GAN) for unsupervised imputation, which is composed of a Global-Impute-Net (GIN), a Local-Impute-Net (LIN) and an Impute Guider Model (IGM). The GIN looks at the entire missing regions to generate and impute data as a whole. Considering the reason of the GIN results, IGM is assigned to capture coherent information between global and local and guide the LIN to look only at a small area centered at the missing focused regions. After processing these three modules, the local imputed results are concatenated to those global imputed results, which impute the rational values and refine the local details from rough to accurate. The comprehensive experiments demonstrate our proposed method is significantly superior to the other three state-of-the-art approaches and seven traditional methods, and we achieve the best RMSE surpass the second-best method on both numeric datasets (17.3%) and image dataset (24.1%). Besides, the extensive ablation study validates the superior performance for dealing with missing data imputation.
Similar content being viewed by others
References
Fortuin V, Baranchuk D, Rätsch G, et al. (2020) Gp-vae: Deep probabilistic time series imputation[C]//International Conference on artificial intelligence and statistics. PMLR, pp 1651–1661
Yonghong Luo, Ying Zhang, Xiangrui Cai, and Xiaojie Yuan. (2019) EGAN: End-to-end generative adversarial network for multivariate time series imputation. In: 12th International joint conference on artificial intelligence IJCAI-19
Rubanova Y, Chen R T Q, Duvenaud D. 2019 Latent odes for irregularly-sampled time series[J]. arXiv preprint arXiv:1907.03907
Liu Y, Yu R, Zheng S, et al. Naomi 2019 Non-auto regressive multiresolution sequence imputation[J]. arXiv preprint arXiv:1901.10946
Fedus W, Goodfellow I, Dai A M. Maskgan 2018 better text generation via filling in the_[J]. arXiv preprint arXiv:1801.07736
Lee D, Kim J, Moon W J, et al. 2019 CollaGAN: Collaborative GAN for missing image data imputation[C] In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 2487–2496
Becker P, Pandya H, Gebhardt G, et al. 2019 Recurrent kalman networks: Factorized inference in high-dimensional deep feature spaces[C]//International conference on machine learning. PMLR pp 544–552
Dalca AV, Bouman KL, Freeman WT et al (2018) Medical image imputation from image collections[J]. IEEE Trans Med Imaging 38(2):504–514
Lee D, Moon W J, Ye J C. 2019 Which contrast does matter? towards a deep understanding of MR contrast using collaborative GAN[J]. arXiv preprint arXiv:1905.04105
Khosravi P, Liang Y, Choi Y J, et al. 2019 What to expect of classifiers? reasoning about logistic regression with missing features[J]. arXiv preprint arXiv:1903.01620
Cortes D. 2019 Imputing missing values with unsupervised random trees[J]. arXiv preprint arXiv:1911.06646
Brown T B, Mann B, Ryder N, et al. 2020 Language models are few-shot learners[J]. arXiv preprint arXiv:2005.14165
Tran K, Bisazza A, Monz C. 2016 Recurrent memory networks for language modeling[J]. arXiv preprint arXiv:1601.01272
Zhang X, Lu L, Lapata M. 2015 Top-down tree long short-term memory networks[J]. arXiv preprint arXiv:1511.00060
Goodfellow IJ, Pouget-Abadie J, Mirza M (2014) Generative Adversarial Networks. Adv Neural Inf Process Syst 3:2672–2680
Seongwook Yoon, and Sanghoon Sull. 2020 GAMIN: Generative adversarial multiple imputation network for highly missing data. 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR). IEEE
Steven Cheng-Xian Li, Bo Jiang, and Benjamin Marlin. 2019 Misgan: Learning from incomplete data with generative adversarial networks
Jinsung Yoon, James Jordon, and Mihaela Schaar. 2018 Gain: Missing data imputation using generative adversarial nets. In International conference on machine learning, pp 5675–5684
Kantardzic Mehmed. 2011 Data mining: concepts, models, methods, and algorithms
White I R, Royston P, Wood A M (2011) Multiple imputation using chained equations: issues and guidance for practice. Statistic Med 30(4):377–399
Stekhoven DJ, Bühlmann P (2011) Missforest–non-parametric missing value imputation for mixed-type data. Bioinformatics 28(1):112–118
Evrim Acar, Daniel M Dunlavy, and Tamara G Kolda, and Morten Mørup. 2010 Scalable tensor factorizations with missing data. In Proceedings of the 2010 SIAM international conference on data mining, pp 701–712. SIAM
García-Laencina PJ, Sancho-Gómez J-L, Figueiras-Vidal AR (2010) Pattern classification with missing data: a review[J]. Neural Comput Appl 19(2):263–282
Hudak A T, Crookston N L, Evans J S, Hall D E, Falkowski M J (2008) Nearest neighbor imputation of species-level, plot-scale forest structure attributes from lidar data. Remote Sens Environ 112(5):2232–2245
Li M, Lin J, Ding Y, et al. 2020 Gan compression: Efficient architectures for interactive conditional gans[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 5284–5294
Shen Y, Gu J, Tang X, et al. 2020 Interpreting the latent space of gans for semantic face editing[C]//Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition. pp 9243–9252
Daras G, Odena A, Zhang H, et al. 2020 Your local GAN: Designing two dimensional local attention mechanisms for generative models[C]//Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition. pp 14531–14539
Lichman M. 2013 UCI machine learning repository. URL http://archive.ics.uci.edu/ml.
LeCun Y, and Cortes C. 2010 MNIST handwritten digit database. URL http://yann.lecun.com/ exdb/mnist/.
Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. J Mach Learn Res 9:249–256
Diederik P. Kingma, and Jimmy Lei Ba. 2014 Adam: A method for stochastic optimization. Computer Science
Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, et al. 2016 Tensorflow: a system for large-scale machine learning
Suhrid Balakrishnan and S. Chopra. 2012 Collaborative ranking. WSDM ’12, pp 143–152. ACM
Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S. 2017 Gans trained by a two time-scale update rule converge to a local nash equilibrium. Neural Information Processing Systems, pp 6626–6637
XU, Qiantong, et al. 2018 An empirical study on evaluation metrics of generative adversarial networks. arXiv preprint arXiv:1806.07755
Chai T, Draxler RR (2014) Root mean square error (RMSE) or mean absolute error (MAE)?–Arguments against avoiding RMSE in the literature. Geoscientific model development 7(3):1247–1250
Kumar S K. 2017 On weight initialization in deep neural networks[J]. arXiv preprint arXiv:1704.08863
Pajankar A (2021) Useful unix commands and tools[M]//Practical Linux with Raspberry Pi OS. Apress, Berkeley, CA, pp 81–89
Kumar N. 2019 Neural network implementation using CUDA[D]
Yin X et al (2003) A flexible sigmoid function of determinate owth. Annal Botany 91(3):361–371
Gulrajani I, Ahmed F, Arjovsky M, et al. 2017 Improved Training of Wasserstein GANs[J]. arXiv preprint arXiv:1704.00028v3
Vapnik V. 2013 The nature of statistical learning theory[M]. Springer science & business media
Mirza M, Osindero S. 2014 Conditional generative adversarial nets[J]. arXiv preprint arXiv:1411.1784
Liu Y, Gopalakrishnan V (2017) An overview and evaluation of recent machine learning imputation methods using cardiac imaging data. Data 2(1):8
Jerez José M et al (2010) Missing data imputation using statistical and machine learning methods in a real breast cancer problem. Artifi Intell Med 50(2):105–115
Li L, Fu H, Xu X (2021) Active learning with sampling by joint global-local uncertainty for salient object detection. Neural Comput Applic. https://doi.org/10.1007/s00521-021-06395-8
Ma X, Li X, Zhou Y et al (2021) Image smoothing based on global sparsity decomposition and a variable parameter. Comp Visual Media 7:483–497
Wang Q, Hu X, Gao Q et al (2014) Global–local fisher discriminant approach for face recognition. Neural Comput Applic 25:1137–1144
Cheng Y, Song F, Qian K (2021) Missing multi-label learning with non-equilibrium based on two-level autoencoder. Appl Intell 51:6997–7015
Raja PS, Sasirekha K, Thangavel K (2020) A novel fuzzy rough clustering parameter-based missing value imputation. Neural Comput Applic 32:10033–10050
Acknowledgements
This work was supported by Qian Xuesen Laboratory of Space Technology, CAST(GZZKFJJ2020002), National Key Research and Development Program of China under the grant number (2018hjyzkfkt-002).
Author information
Authors and Affiliations
Contributions
Wei Wang was involved in supervision and project administration. Yimeng Chai was involved in methodology, software, and writing—original draft. Yue Li was involved in conceptualization, methodology, and writing—review & editing.
Corresponding author
Ethics declarations
Conflict of Interest
We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work, and there is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in, or the review of, the manuscript entitled.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, W., Chai, Y. & Li, Y. GAGIN: generative adversarial guider imputation network for missing data. Neural Comput & Applic 34, 7597–7610 (2022). https://doi.org/10.1007/s00521-021-06862-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-021-06862-2