Skip to main content
Log in

A systematic review of generative adversarial imputation network in missing data imputation

  • Review
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Data missing has always occurred in data processing. To solve this problem, researchers have improved the process methods of the missing data with diverse strategies, which range from directly deleting missing data samples to using artificial intelligence technology to filling in incomplete data. The processing methods of the missing data have been improved. Generative adversarial imputation network (GAIN) is a kind of neural network which has an excellent performance in missing data imputation. A number of publications that research and cite the GAIN model show a significant growth trend after GAIN was proposed in 2018. GAIN has been studied and improved by many scholars in their specific fields. However, few studies have systematically surveyed the GAIN model's development trends on missing data from its birth to the present, which result in a lack of comprehensive information about GAINs general performance in different fields. In this review, we summarize the development of the GAIN model in missing data imputation from 2018 to 2022. Based on the WOS database, 32 publications are selected according to the PRISMA statement. The outcome of this paper is from the following aspects: (1) analyzing the publication information and application fields quantitatively; (2) expounding the GAIN-based models, classification, and research trends; (3) elaborating the model attributes and missing data mechanism; and (4) summarizing the existing issues and proposing the future directions. Above all, this paper can help scholars gain further insight into the missing data issues and better understand the optimized directions of GAIN models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Data availability

The data that support the findings of this study are available from the Web of Science repository but restrictions apply to the availability of these data, which were used under licence from Beijing Jiaotong University, and so are not publicly available. Data are, however, available from the authors upon reasonable request and with permission of Beijing Jiaotong University.

References

  1. Rubin DB (1976) Inference and missing data. Biometrika 63(3):581–592. https://doi.org/10.1093/biomet/63.3.581

    Article  MathSciNet  MATH  Google Scholar 

  2. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J Roy Stat Soc 39(1):1–38. https://doi.org/10.1111/j.2517-6161.1977.tb01600.x

    Article  MathSciNet  MATH  Google Scholar 

  3. Little RJA, Rubin DB (1987) Statistical analysis with missing data. Wiley, New York

    MATH  Google Scholar 

  4. Graham JW (2009) Missing data analysis: making it work in the real world. Annu Rev Psychol 60:549–576. https://doi.org/10.1146/annurev.psych.58.110405.085530

    Article  Google Scholar 

  5. Donders ART, van der Heijden GJMG, Stijnen T et al (2006) Review: a gentle introduction to imputation of missing values. J Clin Epidemiol 59(10):1087–1091. https://doi.org/10.1016/j.jclinepi.2006.01.014

    Article  Google Scholar 

  6. Schneiderman ED, Kowalski CJ, Willis SM (1993) Regression imputation of missing values in longitudinal data sets. Int J Biomed Comput 32(2):121

    Article  Google Scholar 

  7. Batista GEAP, Monard MC (2003) An analysis of four missing data treatment methods for supervised learning. Appl Artif Intell 17(5/6):519–533

    Article  Google Scholar 

  8. Hastie T, Mazumder R, Lee J et al (2015) Matrix completion and low-rank SVD via fast alternating least squares. J Mach Learn Res 16(1):3367–3402

    MathSciNet  MATH  Google Scholar 

  9. Rubin DB (1990) Multiple imputation for nonresponse in surveys. Wiley, New York

    MATH  Google Scholar 

  10. White IR, Royston P, Wood AM (2011) Multiple imputation using chained equations: Issues and guidance for practice. Stat Med 30(4).

  11. Bian Z, Zhang R (2018) Bone age assessment method based on deep convolutional neural network. In: 8th IEEE International Conference on Electronics Information and Emergency Communication (ICEIEC). IEEE, Beijing, pp 194–197.

  12. Pu Q, Zhu X, Zhang R et al (2020) Speed profile tracking by an adaptive controller for subway train based on neural network and PID algorithm. IEEE Trans Veh Technol 69(10):10656–10667. https://doi.org/10.1109/TVT.2020.3019699

    Article  Google Scholar 

  13. Cappelletti L, Fontana T, Di Donato GW et al (2020) Complex data imputation by auto-encoders and convolutional neural networks-a case study on genome gap-filling. Computers 9(2). https://doi.org/10.3390/computers9020037.

  14. Liu YT (2019) Incomplete big data imputation mining algorithm based on BP neural network. J Intell Fuzzy Syst 37(13):1–10

    Google Scholar 

  15. Sangeetha M, Kumaran MS (2020) Deep learning-based data imputation on time-variant data using recurrent neural network. Soft Comput 24(1).

  16. Che ZP, Purushotham S, Cho K, Sontag D, Liu Y (2018) Recurrent neural networks for multivariate time series with missing values. Scientific Reports 8(1). https://doi.org/10.1038/s41598-018-24271-9.

  17. Tsokov S, Lazarova M, Aleksievapetrova A et al (2022) A hybrid spatiotemporal deep model based on CNN and LSTM for air pollution prediction. Sustainability 14(9):5104. https://doi.org/10.3390/su14095104

    Article  Google Scholar 

  18. Fernando MP, Cesar F, David N et al (2021) Missing the missing values: the ugly duckling of fairness in machine learning. Int J Intell Syst 36(7):3217–3258. https://doi.org/10.1002/int.22415

    Article  Google Scholar 

  19. Goodfellow IJ, Pouget-Abadie J, Mirza M et al (2014) Generative adversarial networks. In: Annual conference on neural information processing systems 2014 (NIPS 2014). Advances in Neural Information Processing Systems 27, Montreal, pp 2672–2680.

  20. Yoon J, Jordon J, van der Schaar M (2018) GAIN: Missing Data Imputation using Generative Adversarial Nets. In Proceedings of the 35th international conference on machine learning. Proceedings of Machine Learning Research 80, Stockholm Sweden, pp 5689–5698.

  21. Moher D, Liberati A, Tetzlaff J et al (2009) Preferred reporting items for systematic reviews and meta-analyses: the PRISMA Statement. J Clin Epidemiol 62(10):1006–1012. https://doi.org/10.1016/j.jclinepi.2009.06.005

    Article  Google Scholar 

  22. Page M, McKenzie J, Bossuyt P et al (2020) The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ 2021 372:n71. https://doi.org/10.1136/bmj.n71.

  23. Huang Y, Tang Y, VanZwieten J et al (2020) Reliable machine prognostic health management in the presence of missing data. Concurr Computation Pract Experience 34(12). https://doi.org/10.1002/cpe.5762.

  24. Liu C, Zhou H, Sun Z et al (2022) GlowImp: Combining GLOW and GAN for multivariate time series imputation. In: 21st International conference on algorithms and architectures for parallel processing (ICA3PP). Springer, Xiamen, pp 50–64

  25. Vasata D, Halama T, Friedjungova M (2021) Image inpainting using Wasserstein generative adversarial imputation network. In: 30th International Conference on Artificial Neural Networks (ICANN). Springer, pp 575–586.

  26. Ge Q, Huang X, Fang S et al (2020) Conditional generative adversarial networks for individualized treatment effect estimation and treatment selection. Front Genetics 11:585804. https://doi.org/10.3389/fgene.2020.585804.

  27. Pan Y, Liu M, Lian C et al (2020) Spatially-constrained fisher representation for brain disease identification with incomplete multi-modal neuroimages. IEEE Trans Med Imaging 39(9):2965–2975. https://doi.org/10.1109/TMI.2020.2983085

    Article  Google Scholar 

  28. Qu F, Liu J, Ma Y et al (2020) A novel wind turbine data imputation method with multiple optimizations based on GANs. Mech Syst Signal Process 139:1–15. https://doi.org/10.1016/j.ymssp.2019.106610

    Article  Google Scholar 

  29. Mendes A, Togelius J, Coelho LDS (2020) Unified multi-domain learning and data imputation using adversarial autoencoder. In: 2020 International joint conference on neural networks (IJCNN). IEEE, Glasgow, pp 1–8.

  30. Tan LZ, Su W, Zhang W et al (2021) A packet loss monitoring system for in-band network telemetry: detection, localization, diagnosis and recovery. IEEE Trans Netw Serv Manage 18(4):4151–4168. https://doi.org/10.1109/TNSM.2021.3125012

    Article  Google Scholar 

  31. Xiao X, Zhang YL, Yang S et al (2021) Efficient missing counts imputation of a bike-sharing system by generative adversarial network. IEEE Trans Intell Transp Syst 23(8):13443–13451. https://doi.org/10.1109/TITS.2021.3124409

    Article  Google Scholar 

  32. Le TP, Rho C, Min Y et al (2021) A2GAN: a deep reinforcement-based learning algorithm for risk-aware in finance. IEEE Access 9:137165–137175. https://doi.org/10.1109/ACCESS.2021.3117593

    Article  Google Scholar 

  33. Li Z, Li D (2022) Action recognition of construction workers under occlusion. J Build Eng 45:1–9. https://doi.org/10.1016/j.jobe.2021.103352

    Article  Google Scholar 

  34. Pan J, Li CB, Tang Y et al (2021) Energy consumption prediction of a CNC machining process with incomplete Data. IEEE-CAA J Automat Sin 8(5):987–1000. https://doi.org/10.1109/JAS.2021.1003970

    Article  Google Scholar 

  35. Kim B, Lee D, Preethaa KRS et al (2021) Predicting wind flow around buildings using deep learning. J Wind Eng Ind Aerodyn 219:104820. https://doi.org/10.1016/j.jweia.2021.104820.

  36. Vinas R, Azevedo T, Gamazon ER et al (2021) Deep learning enables fast and accurate imputation of gene expression. Front Genet 12:624128. https://doi.org/10.3389/fgene.2021.624128.

  37. Kim B, Yuvaraj N, Sri Preethaa KR et al (2021) Wind-induced pressure prediction on tall buildings using generative adversarial imputation network. Sensors 21(7). https://doi.org/10.3390/s21072515.

  38. Wang W, Chai Y (2022) GAGIN generative adversarial guider imputation network for missing data. Neural Comput Appl 34:7597–7610

    Article  Google Scholar 

  39. Hallaji E, Razavi-Far R, Palade V et al (2021) Adversarial learning on incomplete and imbalanced medical data for robust survival prediction of liver transplant patients. IEEE Access 9:73641–73650. https://doi.org/10.1109/ACCESS.2021.3081040

    Article  Google Scholar 

  40. Liu T, Fan J, Luo Y et al (2021) Adaptive data augmentation for supervised learning over missing data. Proc VLDB Endowment 14(7):1202–1214. https://doi.org/10.14778/3450980.3450989.

  41. Gupta M, Bunnell H, Phan T et al (2021) Concurrent imputation and prediction on EHR data using bi-directional GANs bi-GANs for EHR imputation and prediction. In: 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (ACM-BCB). Association for Computing Machinery, New York, pp 7.

  42. Kazemi A, Meidani H (2021) IGANI: iterative generative adversarial networks for imputation with application to traffic data. IEEE Access 9:112966–112977. https://doi.org/10.1109/ACCESS.2021.3103456

    Article  Google Scholar 

  43. Yao Z, Zhao C (2021) FIGAN: A missing industrial data imputation method customized for soft sensor application. IEEE Trans Automat Sci Eng, pp 1–11. https://doi.org/10.1109/TASE.2021.3132037.

  44. Qiu W, Huang Y Li Q (2020) IFGAN: Missing value imputation using feature-specific generative adversarial networks. In: 2020 IEEE International conference on big data (BIG DATA). IEEE, New York, pp 4715–4723.

  45. Yoon J, Sull S (2020) GAMIN: Generative adversarial multiple imputation network for highly missing data. In: 2020 IEEE/CVF conference on computer vision and pattern Recognition (CVPR). IEEE, New York, pp 8453–8461.

  46. Sajeeda A, Ahmed SS, Hossain BMM (2020) Bangla missing data imputation using HexaGAN framework. In: 2020 23rd International conference on computer and information technology (ICCIT 2020). IEEE, New York, pp 1–5.

  47. Low R, Tekler Z, Cheah L (2020) Predicting commercial vehicle parking duration using generative adversarial multiple imputation networks. Transp Res Rec 2674(9):820–831. https://doi.org/10.1177/0361198120932166

    Article  Google Scholar 

  48. Hwang U, Jung D, Yoon J (2019) HexaGAN: Generative adversarial nets for real world classification. In: Proceedings of the 36th international conference on machine learning (ICML 2019). Proceedings of machine learning Research 97, Long Beach, pp 2921–2930.

  49. Zhang W, Zhang P, Yu Y et al (2021) Missing data repairs for traffic flow with self-attention generative adversarial imputation Net. IEEE Trans Intell Transp Syst 23(7):7919–7930. https://doi.org/10.1109/TITS.2021.3074564

    Article  MathSciNet  Google Scholar 

  50. Wang Y, Li D, Li X et al (2021) PC-GAIN: Pseudo-label conditional generative adversarial imputation networks for incomplete data. Neural Netw 141:395–403. https://doi.org/10.1016/j.neunet.2021.05.033

    Article  Google Scholar 

  51. Awan SE, Bennamoun M, Sohel F et al (2021) Imputation of missing data with class imbalance using conditional generative adversarial networks. Neurocomputing 453:164–171. https://doi.org/10.1016/j.neucom.2021.04.010

    Article  Google Scholar 

  52. Zhou X, Liu X, Lan G et al (2021) Federated conditional generative adversarial nets imputation method for air quality missing data. Knowl-Based Syst 228:1–12. https://doi.org/10.1016/j.knosys.2021.107261

    Article  Google Scholar 

  53. Chawla A, Agrawal P, Panigrahi BK et al (2021) Deep-learning-based data-manipulation attack resilient supervisory backup protection of transmission lines. Neural Comput Appl. https://doi.org/10.1007/s00521-021-06106-3

    Article  Google Scholar 

  54. Dong W, Fong DYT, Yoon J et al (2021) Generative adversarial networks for imputing missing data for big data clinical research. BMC Medical Research Methodology 21(1). https://doi.org/10.1186/s12874-021-01272-3.

  55. Oh E, Kim T, Ji Y et al (2021) STING: Self-attention based time-series imputation networks using GAN. In: Proceedings of the 21st IEEE international conference on data mining (IEEE ICDM). IEEE, Auckland, pp 1264–1269.

  56. Deng G, Han CZ, Matteson DS (2022) Extended missing data imputation via GANs for ranking applications. Data Min Knowl Disc 36:1498–1520. https://doi.org/10.1007/s10618-022-00837-0

    Article  MathSciNet  MATH  Google Scholar 

  57. Hu WY, Wang TY, Chu FL (2022) Fault feature recovery with Wasserstein generative adversarial imputation network with gradient penalty for rotating machine health monitoring under signal loss condition. IEEE Trans Instrum Meas 71:1–12. https://doi.org/10.1109/TIM.2022.3168898

    Article  Google Scholar 

  58. Dai Z, Bu Z, Long Q (2021) Multiple imputation via generative adversarial network for high-dimensional blockwise missing value problems. In: 20th IEEE international conference on machine learning and applications (ICMLA). IEEE, Pasadena, pp 1–6.

  59. Li SC, Jiang B, Marlin BM (2019) MisGAN learning from incomplete data with generative adversarial networks. International Conference on Learning Representations (ICLR 2019). OpenReview.net, New Orleans, pp 1–20

  60. Gulrajani I, Ahmed F, Arjovsky M (2017) Improved training of Wasserstein GANs. In: 27th international conference on neural information processing systems. Long Beach, pp 5769–5779.

  61. Arjovsky M, Bottou L (2017) Towards principled methods for training generative adversarial networks. Stat 1050.

  62. Hochreiter S, Schmidhuber J (1997) Long short-rerm memory. Neural Comput 9:1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735

    Article  Google Scholar 

  63. Cho K, van Merriënboer B, Gulcehre C et al (2014) Learning phrase representations using RNN encoder–decoder for statistical machine translation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), Doha, Qatar.

  64. Ackley DH, Hinton GE, Sejnowski TJ (2010) A learning algorithm for Boltzmann machines. Cogn Sci 9(1):147–169

    Article  Google Scholar 

  65. Mao X, Li Q, Xie H et al (2017) Least squares generative adversarial networks. In: 2017 IEEE International conference on computer vision. IEEE, Beijing, pp 2813–2821.

Download references

Acknowledgements

This work was partially supported by National Natural Science Foundation of China with Grant Number 62173025, a major project of National Social Science Foundation of China with Grant Number 18ZDA086, and Beijing Natural Science Foundation with Grant Number of L201003. The authors have no relevant financial or non-financial interests to disclose.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Runtong Zhang.

Ethics declarations

Conflict of interest

All authors declare that there is no conflict of interest in this review.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, Y., Zhang, R. & Zhao, B. A systematic review of generative adversarial imputation network in missing data imputation. Neural Comput & Applic 35, 19685–19705 (2023). https://doi.org/10.1007/s00521-023-08840-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-023-08840-2

Keywords

Navigation