Skip to main content
Log in

GAGIN: generative adversarial guider imputation network for missing data

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Missing data imputation aims to accurately impute the unobserved regions with complete data in the real world. Although many current methods have made remarkable advances, the local homogenous regions, especially in boundary, and the reason of the imputed data are still the two most challenging issues. To address these issues, we propose a novel Generative Adversarial Guider Imputation Network (GAGIN) based on generative adversarial network (GAN) for unsupervised imputation, which is composed of a Global-Impute-Net (GIN), a Local-Impute-Net (LIN) and an Impute Guider Model (IGM). The GIN looks at the entire missing regions to generate and impute data as a whole. Considering the reason of the GIN results, IGM is assigned to capture coherent information between global and local and guide the LIN to look only at a small area centered at the missing focused regions. After processing these three modules, the local imputed results are concatenated to those global imputed results, which impute the rational values and refine the local details from rough to accurate. The comprehensive experiments demonstrate our proposed method is significantly superior to the other three state-of-the-art approaches and seven traditional methods, and we achieve the best RMSE surpass the second-best method on both numeric datasets (17.3%) and image dataset (24.1%). Besides, the extensive ablation study validates the superior performance for dealing with missing data imputation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Fortuin V, Baranchuk D, Rätsch G, et al. (2020) Gp-vae: Deep probabilistic time series imputation[C]//International Conference on artificial intelligence and statistics. PMLR, pp 1651–1661

  2. Yonghong Luo, Ying Zhang, Xiangrui Cai, and Xiaojie Yuan. (2019) EGAN: End-to-end generative adversarial network for multivariate time series imputation. In: 12th International joint conference on artificial intelligence IJCAI-19

  3. Rubanova Y, Chen R T Q, Duvenaud D. 2019 Latent odes for irregularly-sampled time series[J]. arXiv preprint arXiv:1907.03907

  4. Liu Y, Yu R, Zheng S, et al. Naomi 2019 Non-auto regressive multiresolution sequence imputation[J]. arXiv preprint arXiv:1901.10946

  5. Fedus W, Goodfellow I, Dai A M. Maskgan 2018 better text generation via filling in the_[J]. arXiv preprint arXiv:1801.07736

  6. Lee D, Kim J, Moon W J, et al. 2019 CollaGAN: Collaborative GAN for missing image data imputation[C] In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 2487–2496

  7. Becker P, Pandya H, Gebhardt G, et al. 2019 Recurrent kalman networks: Factorized inference in high-dimensional deep feature spaces[C]//International conference on machine learning. PMLR pp 544–552

  8. Dalca AV, Bouman KL, Freeman WT et al (2018) Medical image imputation from image collections[J]. IEEE Trans Med Imaging 38(2):504–514

    Article  Google Scholar 

  9. Lee D, Moon W J, Ye J C. 2019 Which contrast does matter? towards a deep understanding of MR contrast using collaborative GAN[J]. arXiv preprint arXiv:1905.04105

  10. Khosravi P, Liang Y, Choi Y J, et al. 2019 What to expect of classifiers? reasoning about logistic regression with missing features[J]. arXiv preprint arXiv:1903.01620

  11. Cortes D. 2019 Imputing missing values with unsupervised random trees[J]. arXiv preprint arXiv:1911.06646

  12. Brown T B, Mann B, Ryder N, et al. 2020 Language models are few-shot learners[J]. arXiv preprint arXiv:2005.14165

  13. Tran K, Bisazza A, Monz C. 2016 Recurrent memory networks for language modeling[J]. arXiv preprint arXiv:1601.01272

  14. Zhang X, Lu L, Lapata M. 2015 Top-down tree long short-term memory networks[J]. arXiv preprint arXiv:1511.00060

  15. Goodfellow IJ, Pouget-Abadie J, Mirza M (2014) Generative Adversarial Networks. Adv Neural Inf Process Syst 3:2672–2680

    Google Scholar 

  16. Seongwook Yoon, and Sanghoon Sull. 2020 GAMIN: Generative adversarial multiple imputation network for highly missing data. 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR). IEEE

  17. Steven Cheng-Xian Li, Bo Jiang, and Benjamin Marlin. 2019 Misgan: Learning from incomplete data with generative adversarial networks

  18. Jinsung Yoon, James Jordon, and Mihaela Schaar. 2018 Gain: Missing data imputation using generative adversarial nets. In International conference on machine learning, pp 5675–5684

  19. Kantardzic Mehmed. 2011 Data mining: concepts, models, methods, and algorithms

  20. White I R, Royston P, Wood A M (2011) Multiple imputation using chained equations: issues and guidance for practice. Statistic Med 30(4):377–399

    Article  MathSciNet  Google Scholar 

  21. Stekhoven DJ, Bühlmann P (2011) Missforest–non-parametric missing value imputation for mixed-type data. Bioinformatics 28(1):112–118

    Article  Google Scholar 

  22. Evrim Acar, Daniel M Dunlavy, and Tamara G Kolda, and Morten Mørup. 2010 Scalable tensor factorizations with missing data. In Proceedings of the 2010 SIAM international conference on data mining, pp 701–712. SIAM

  23. García-Laencina PJ, Sancho-Gómez J-L, Figueiras-Vidal AR (2010) Pattern classification with missing data: a review[J]. Neural Comput Appl 19(2):263–282

    Article  Google Scholar 

  24. Hudak A T, Crookston N L, Evans J S, Hall D E, Falkowski M J (2008) Nearest neighbor imputation of species-level, plot-scale forest structure attributes from lidar data. Remote Sens Environ 112(5):2232–2245

    Article  Google Scholar 

  25. Li M, Lin J, Ding Y, et al. 2020 Gan compression: Efficient architectures for interactive conditional gans[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 5284–5294

  26. Shen Y, Gu J, Tang X, et al. 2020 Interpreting the latent space of gans for semantic face editing[C]//Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition. pp 9243–9252

  27. Daras G, Odena A, Zhang H, et al. 2020 Your local GAN: Designing two dimensional local attention mechanisms for generative models[C]//Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition. pp 14531–14539

  28. Lichman M. 2013 UCI machine learning repository. URL http://archive.ics.uci.edu/ml.

  29. LeCun Y, and Cortes C. 2010 MNIST handwritten digit database. URL http://yann.lecun.com/ exdb/mnist/.

  30. Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. J Mach Learn Res 9:249–256

    Google Scholar 

  31. Diederik P. Kingma, and Jimmy Lei Ba. 2014 Adam: A method for stochastic optimization. Computer Science

  32. Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, et al. 2016 Tensorflow: a system for large-scale machine learning

  33. Suhrid Balakrishnan and S. Chopra. 2012 Collaborative ranking. WSDM ’12, pp 143–152. ACM

  34. Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S. 2017 Gans trained by a two time-scale update rule converge to a local nash equilibrium. Neural Information Processing Systems, pp 6626–6637

  35. XU, Qiantong, et al. 2018 An empirical study on evaluation metrics of generative adversarial networks. arXiv preprint arXiv:1806.07755

  36. Chai T, Draxler RR (2014) Root mean square error (RMSE) or mean absolute error (MAE)?–Arguments against avoiding RMSE in the literature. Geoscientific model development 7(3):1247–1250

    Article  Google Scholar 

  37. Kumar S K. 2017 On weight initialization in deep neural networks[J]. arXiv preprint arXiv:1704.08863

  38. Pajankar A (2021) Useful unix commands and tools[M]//Practical Linux with Raspberry Pi OS. Apress, Berkeley, CA, pp 81–89

    Book  Google Scholar 

  39. Kumar N. 2019 Neural network implementation using CUDA[D]

  40. Yin X et al (2003) A flexible sigmoid function of determinate owth. Annal Botany 91(3):361–371

    Article  Google Scholar 

  41. Gulrajani I, Ahmed F, Arjovsky M, et al. 2017 Improved Training of Wasserstein GANs[J]. arXiv preprint arXiv:1704.00028v3

  42. Vapnik V. 2013 The nature of statistical learning theory[M]. Springer science & business media

  43. Mirza M, Osindero S. 2014 Conditional generative adversarial nets[J]. arXiv preprint arXiv:1411.1784

  44. Liu Y, Gopalakrishnan V (2017) An overview and evaluation of recent machine learning imputation methods using cardiac imaging data. Data 2(1):8

    Article  Google Scholar 

  45. Jerez José M et al (2010) Missing data imputation using statistical and machine learning methods in a real breast cancer problem. Artifi Intell Med 50(2):105–115

    Article  Google Scholar 

  46. Li L, Fu H, Xu X (2021) Active learning with sampling by joint global-local uncertainty for salient object detection. Neural Comput Applic. https://doi.org/10.1007/s00521-021-06395-8

    Article  Google Scholar 

  47. Ma X, Li X, Zhou Y et al (2021) Image smoothing based on global sparsity decomposition and a variable parameter. Comp Visual Media 7:483–497

    Article  Google Scholar 

  48. Wang Q, Hu X, Gao Q et al (2014) Global–local fisher discriminant approach for face recognition. Neural Comput Applic 25:1137–1144

    Article  Google Scholar 

  49. Cheng Y, Song F, Qian K (2021) Missing multi-label learning with non-equilibrium based on two-level autoencoder. Appl Intell 51:6997–7015

    Article  Google Scholar 

  50. Raja PS, Sasirekha K, Thangavel K (2020) A novel fuzzy rough clustering parameter-based missing value imputation. Neural Comput Applic 32:10033–10050

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by Qian Xuesen Laboratory of Space Technology, CAST(GZZKFJJ2020002), National Key Research and Development Program of China under the grant number (2018hjyzkfkt-002).

Author information

Authors and Affiliations

Authors

Contributions

Wei Wang was involved in supervision and project administration. Yimeng Chai was involved in methodology, software, and writing—original draft. Yue Li was involved in conceptualization, methodology, and writing—review & editing.

Corresponding author

Correspondence to Yue Li.

Ethics declarations

Conflict of Interest

We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work, and there is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in, or the review of, the manuscript entitled.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, W., Chai, Y. & Li, Y. GAGIN: generative adversarial guider imputation network for missing data. Neural Comput & Applic 34, 7597–7610 (2022). https://doi.org/10.1007/s00521-021-06862-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-021-06862-2

Keywords

Navigation