Abstract
Data missing has always occurred in data processing. To solve this problem, researchers have improved the process methods of the missing data with diverse strategies, which range from directly deleting missing data samples to using artificial intelligence technology to filling in incomplete data. The processing methods of the missing data have been improved. Generative adversarial imputation network (GAIN) is a kind of neural network which has an excellent performance in missing data imputation. A number of publications that research and cite the GAIN model show a significant growth trend after GAIN was proposed in 2018. GAIN has been studied and improved by many scholars in their specific fields. However, few studies have systematically surveyed the GAIN model's development trends on missing data from its birth to the present, which result in a lack of comprehensive information about GAINs general performance in different fields. In this review, we summarize the development of the GAIN model in missing data imputation from 2018 to 2022. Based on the WOS database, 32 publications are selected according to the PRISMA statement. The outcome of this paper is from the following aspects: (1) analyzing the publication information and application fields quantitatively; (2) expounding the GAIN-based models, classification, and research trends; (3) elaborating the model attributes and missing data mechanism; and (4) summarizing the existing issues and proposing the future directions. Above all, this paper can help scholars gain further insight into the missing data issues and better understand the optimized directions of GAIN models.
Similar content being viewed by others
Data availability
The data that support the findings of this study are available from the Web of Science repository but restrictions apply to the availability of these data, which were used under licence from Beijing Jiaotong University, and so are not publicly available. Data are, however, available from the authors upon reasonable request and with permission of Beijing Jiaotong University.
References
Rubin DB (1976) Inference and missing data. Biometrika 63(3):581–592. https://doi.org/10.1093/biomet/63.3.581
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J Roy Stat Soc 39(1):1–38. https://doi.org/10.1111/j.2517-6161.1977.tb01600.x
Little RJA, Rubin DB (1987) Statistical analysis with missing data. Wiley, New York
Graham JW (2009) Missing data analysis: making it work in the real world. Annu Rev Psychol 60:549–576. https://doi.org/10.1146/annurev.psych.58.110405.085530
Donders ART, van der Heijden GJMG, Stijnen T et al (2006) Review: a gentle introduction to imputation of missing values. J Clin Epidemiol 59(10):1087–1091. https://doi.org/10.1016/j.jclinepi.2006.01.014
Schneiderman ED, Kowalski CJ, Willis SM (1993) Regression imputation of missing values in longitudinal data sets. Int J Biomed Comput 32(2):121
Batista GEAP, Monard MC (2003) An analysis of four missing data treatment methods for supervised learning. Appl Artif Intell 17(5/6):519–533
Hastie T, Mazumder R, Lee J et al (2015) Matrix completion and low-rank SVD via fast alternating least squares. J Mach Learn Res 16(1):3367–3402
Rubin DB (1990) Multiple imputation for nonresponse in surveys. Wiley, New York
White IR, Royston P, Wood AM (2011) Multiple imputation using chained equations: Issues and guidance for practice. Stat Med 30(4).
Bian Z, Zhang R (2018) Bone age assessment method based on deep convolutional neural network. In: 8th IEEE International Conference on Electronics Information and Emergency Communication (ICEIEC). IEEE, Beijing, pp 194–197.
Pu Q, Zhu X, Zhang R et al (2020) Speed profile tracking by an adaptive controller for subway train based on neural network and PID algorithm. IEEE Trans Veh Technol 69(10):10656–10667. https://doi.org/10.1109/TVT.2020.3019699
Cappelletti L, Fontana T, Di Donato GW et al (2020) Complex data imputation by auto-encoders and convolutional neural networks-a case study on genome gap-filling. Computers 9(2). https://doi.org/10.3390/computers9020037.
Liu YT (2019) Incomplete big data imputation mining algorithm based on BP neural network. J Intell Fuzzy Syst 37(13):1–10
Sangeetha M, Kumaran MS (2020) Deep learning-based data imputation on time-variant data using recurrent neural network. Soft Comput 24(1).
Che ZP, Purushotham S, Cho K, Sontag D, Liu Y (2018) Recurrent neural networks for multivariate time series with missing values. Scientific Reports 8(1). https://doi.org/10.1038/s41598-018-24271-9.
Tsokov S, Lazarova M, Aleksievapetrova A et al (2022) A hybrid spatiotemporal deep model based on CNN and LSTM for air pollution prediction. Sustainability 14(9):5104. https://doi.org/10.3390/su14095104
Fernando MP, Cesar F, David N et al (2021) Missing the missing values: the ugly duckling of fairness in machine learning. Int J Intell Syst 36(7):3217–3258. https://doi.org/10.1002/int.22415
Goodfellow IJ, Pouget-Abadie J, Mirza M et al (2014) Generative adversarial networks. In: Annual conference on neural information processing systems 2014 (NIPS 2014). Advances in Neural Information Processing Systems 27, Montreal, pp 2672–2680.
Yoon J, Jordon J, van der Schaar M (2018) GAIN: Missing Data Imputation using Generative Adversarial Nets. In Proceedings of the 35th international conference on machine learning. Proceedings of Machine Learning Research 80, Stockholm Sweden, pp 5689–5698.
Moher D, Liberati A, Tetzlaff J et al (2009) Preferred reporting items for systematic reviews and meta-analyses: the PRISMA Statement. J Clin Epidemiol 62(10):1006–1012. https://doi.org/10.1016/j.jclinepi.2009.06.005
Page M, McKenzie J, Bossuyt P et al (2020) The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ 2021 372:n71. https://doi.org/10.1136/bmj.n71.
Huang Y, Tang Y, VanZwieten J et al (2020) Reliable machine prognostic health management in the presence of missing data. Concurr Computation Pract Experience 34(12). https://doi.org/10.1002/cpe.5762.
Liu C, Zhou H, Sun Z et al (2022) GlowImp: Combining GLOW and GAN for multivariate time series imputation. In: 21st International conference on algorithms and architectures for parallel processing (ICA3PP). Springer, Xiamen, pp 50–64
Vasata D, Halama T, Friedjungova M (2021) Image inpainting using Wasserstein generative adversarial imputation network. In: 30th International Conference on Artificial Neural Networks (ICANN). Springer, pp 575–586.
Ge Q, Huang X, Fang S et al (2020) Conditional generative adversarial networks for individualized treatment effect estimation and treatment selection. Front Genetics 11:585804. https://doi.org/10.3389/fgene.2020.585804.
Pan Y, Liu M, Lian C et al (2020) Spatially-constrained fisher representation for brain disease identification with incomplete multi-modal neuroimages. IEEE Trans Med Imaging 39(9):2965–2975. https://doi.org/10.1109/TMI.2020.2983085
Qu F, Liu J, Ma Y et al (2020) A novel wind turbine data imputation method with multiple optimizations based on GANs. Mech Syst Signal Process 139:1–15. https://doi.org/10.1016/j.ymssp.2019.106610
Mendes A, Togelius J, Coelho LDS (2020) Unified multi-domain learning and data imputation using adversarial autoencoder. In: 2020 International joint conference on neural networks (IJCNN). IEEE, Glasgow, pp 1–8.
Tan LZ, Su W, Zhang W et al (2021) A packet loss monitoring system for in-band network telemetry: detection, localization, diagnosis and recovery. IEEE Trans Netw Serv Manage 18(4):4151–4168. https://doi.org/10.1109/TNSM.2021.3125012
Xiao X, Zhang YL, Yang S et al (2021) Efficient missing counts imputation of a bike-sharing system by generative adversarial network. IEEE Trans Intell Transp Syst 23(8):13443–13451. https://doi.org/10.1109/TITS.2021.3124409
Le TP, Rho C, Min Y et al (2021) A2GAN: a deep reinforcement-based learning algorithm for risk-aware in finance. IEEE Access 9:137165–137175. https://doi.org/10.1109/ACCESS.2021.3117593
Li Z, Li D (2022) Action recognition of construction workers under occlusion. J Build Eng 45:1–9. https://doi.org/10.1016/j.jobe.2021.103352
Pan J, Li CB, Tang Y et al (2021) Energy consumption prediction of a CNC machining process with incomplete Data. IEEE-CAA J Automat Sin 8(5):987–1000. https://doi.org/10.1109/JAS.2021.1003970
Kim B, Lee D, Preethaa KRS et al (2021) Predicting wind flow around buildings using deep learning. J Wind Eng Ind Aerodyn 219:104820. https://doi.org/10.1016/j.jweia.2021.104820.
Vinas R, Azevedo T, Gamazon ER et al (2021) Deep learning enables fast and accurate imputation of gene expression. Front Genet 12:624128. https://doi.org/10.3389/fgene.2021.624128.
Kim B, Yuvaraj N, Sri Preethaa KR et al (2021) Wind-induced pressure prediction on tall buildings using generative adversarial imputation network. Sensors 21(7). https://doi.org/10.3390/s21072515.
Wang W, Chai Y (2022) GAGIN generative adversarial guider imputation network for missing data. Neural Comput Appl 34:7597–7610
Hallaji E, Razavi-Far R, Palade V et al (2021) Adversarial learning on incomplete and imbalanced medical data for robust survival prediction of liver transplant patients. IEEE Access 9:73641–73650. https://doi.org/10.1109/ACCESS.2021.3081040
Liu T, Fan J, Luo Y et al (2021) Adaptive data augmentation for supervised learning over missing data. Proc VLDB Endowment 14(7):1202–1214. https://doi.org/10.14778/3450980.3450989.
Gupta M, Bunnell H, Phan T et al (2021) Concurrent imputation and prediction on EHR data using bi-directional GANs bi-GANs for EHR imputation and prediction. In: 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (ACM-BCB). Association for Computing Machinery, New York, pp 7.
Kazemi A, Meidani H (2021) IGANI: iterative generative adversarial networks for imputation with application to traffic data. IEEE Access 9:112966–112977. https://doi.org/10.1109/ACCESS.2021.3103456
Yao Z, Zhao C (2021) FIGAN: A missing industrial data imputation method customized for soft sensor application. IEEE Trans Automat Sci Eng, pp 1–11. https://doi.org/10.1109/TASE.2021.3132037.
Qiu W, Huang Y Li Q (2020) IFGAN: Missing value imputation using feature-specific generative adversarial networks. In: 2020 IEEE International conference on big data (BIG DATA). IEEE, New York, pp 4715–4723.
Yoon J, Sull S (2020) GAMIN: Generative adversarial multiple imputation network for highly missing data. In: 2020 IEEE/CVF conference on computer vision and pattern Recognition (CVPR). IEEE, New York, pp 8453–8461.
Sajeeda A, Ahmed SS, Hossain BMM (2020) Bangla missing data imputation using HexaGAN framework. In: 2020 23rd International conference on computer and information technology (ICCIT 2020). IEEE, New York, pp 1–5.
Low R, Tekler Z, Cheah L (2020) Predicting commercial vehicle parking duration using generative adversarial multiple imputation networks. Transp Res Rec 2674(9):820–831. https://doi.org/10.1177/0361198120932166
Hwang U, Jung D, Yoon J (2019) HexaGAN: Generative adversarial nets for real world classification. In: Proceedings of the 36th international conference on machine learning (ICML 2019). Proceedings of machine learning Research 97, Long Beach, pp 2921–2930.
Zhang W, Zhang P, Yu Y et al (2021) Missing data repairs for traffic flow with self-attention generative adversarial imputation Net. IEEE Trans Intell Transp Syst 23(7):7919–7930. https://doi.org/10.1109/TITS.2021.3074564
Wang Y, Li D, Li X et al (2021) PC-GAIN: Pseudo-label conditional generative adversarial imputation networks for incomplete data. Neural Netw 141:395–403. https://doi.org/10.1016/j.neunet.2021.05.033
Awan SE, Bennamoun M, Sohel F et al (2021) Imputation of missing data with class imbalance using conditional generative adversarial networks. Neurocomputing 453:164–171. https://doi.org/10.1016/j.neucom.2021.04.010
Zhou X, Liu X, Lan G et al (2021) Federated conditional generative adversarial nets imputation method for air quality missing data. Knowl-Based Syst 228:1–12. https://doi.org/10.1016/j.knosys.2021.107261
Chawla A, Agrawal P, Panigrahi BK et al (2021) Deep-learning-based data-manipulation attack resilient supervisory backup protection of transmission lines. Neural Comput Appl. https://doi.org/10.1007/s00521-021-06106-3
Dong W, Fong DYT, Yoon J et al (2021) Generative adversarial networks for imputing missing data for big data clinical research. BMC Medical Research Methodology 21(1). https://doi.org/10.1186/s12874-021-01272-3.
Oh E, Kim T, Ji Y et al (2021) STING: Self-attention based time-series imputation networks using GAN. In: Proceedings of the 21st IEEE international conference on data mining (IEEE ICDM). IEEE, Auckland, pp 1264–1269.
Deng G, Han CZ, Matteson DS (2022) Extended missing data imputation via GANs for ranking applications. Data Min Knowl Disc 36:1498–1520. https://doi.org/10.1007/s10618-022-00837-0
Hu WY, Wang TY, Chu FL (2022) Fault feature recovery with Wasserstein generative adversarial imputation network with gradient penalty for rotating machine health monitoring under signal loss condition. IEEE Trans Instrum Meas 71:1–12. https://doi.org/10.1109/TIM.2022.3168898
Dai Z, Bu Z, Long Q (2021) Multiple imputation via generative adversarial network for high-dimensional blockwise missing value problems. In: 20th IEEE international conference on machine learning and applications (ICMLA). IEEE, Pasadena, pp 1–6.
Li SC, Jiang B, Marlin BM (2019) MisGAN learning from incomplete data with generative adversarial networks. International Conference on Learning Representations (ICLR 2019). OpenReview.net, New Orleans, pp 1–20
Gulrajani I, Ahmed F, Arjovsky M (2017) Improved training of Wasserstein GANs. In: 27th international conference on neural information processing systems. Long Beach, pp 5769–5779.
Arjovsky M, Bottou L (2017) Towards principled methods for training generative adversarial networks. Stat 1050.
Hochreiter S, Schmidhuber J (1997) Long short-rerm memory. Neural Comput 9:1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Cho K, van Merriënboer B, Gulcehre C et al (2014) Learning phrase representations using RNN encoder–decoder for statistical machine translation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), Doha, Qatar.
Ackley DH, Hinton GE, Sejnowski TJ (2010) A learning algorithm for Boltzmann machines. Cogn Sci 9(1):147–169
Mao X, Li Q, Xie H et al (2017) Least squares generative adversarial networks. In: 2017 IEEE International conference on computer vision. IEEE, Beijing, pp 2813–2821.
Acknowledgements
This work was partially supported by National Natural Science Foundation of China with Grant Number 62173025, a major project of National Social Science Foundation of China with Grant Number 18ZDA086, and Beijing Natural Science Foundation with Grant Number of L201003. The authors have no relevant financial or non-financial interests to disclose.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
All authors declare that there is no conflict of interest in this review.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, Y., Zhang, R. & Zhao, B. A systematic review of generative adversarial imputation network in missing data imputation. Neural Comput & Applic 35, 19685–19705 (2023). https://doi.org/10.1007/s00521-023-08840-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-023-08840-2