A systematic review of generative adversarial imputation network in missing data imputation

Zhang, Yuqing; Zhang, Runtong; Zhao, Butian

doi:10.1007/s00521-023-08840-2

A systematic review of generative adversarial imputation network in missing data imputation

Review
Published: 21 July 2023

Volume 35, pages 19685–19705, (2023)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

939 Accesses
3 Citations
Explore all metrics

Abstract

Data missing has always occurred in data processing. To solve this problem, researchers have improved the process methods of the missing data with diverse strategies, which range from directly deleting missing data samples to using artificial intelligence technology to filling in incomplete data. The processing methods of the missing data have been improved. Generative adversarial imputation network (GAIN) is a kind of neural network which has an excellent performance in missing data imputation. A number of publications that research and cite the GAIN model show a significant growth trend after GAIN was proposed in 2018. GAIN has been studied and improved by many scholars in their specific fields. However, few studies have systematically surveyed the GAIN model's development trends on missing data from its birth to the present, which result in a lack of comprehensive information about GAINs general performance in different fields. In this review, we summarize the development of the GAIN model in missing data imputation from 2018 to 2022. Based on the WOS database, 32 publications are selected according to the PRISMA statement. The outcome of this paper is from the following aspects: (1) analyzing the publication information and application fields quantitatively; (2) expounding the GAIN-based models, classification, and research trends; (3) elaborating the model attributes and missing data mechanism; and (4) summarizing the existing issues and proposing the future directions. Above all, this paper can help scholars gain further insight into the missing data issues and better understand the optimized directions of GAIN models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MIDA: Multiple Imputation Using Denoising Autoencoders

Numerical Data Imputation: Choose kNN over Deep Learning

SGAIN, WSGAIN-CP and WSGAIN-GP: Novel GAN Methods for Missing Data Imputation

Data availability

The data that support the findings of this study are available from the Web of Science repository but restrictions apply to the availability of these data, which were used under licence from Beijing Jiaotong University, and so are not publicly available. Data are, however, available from the authors upon reasonable request and with permission of Beijing Jiaotong University.

References

Rubin DB (1976) Inference and missing data. Biometrika 63(3):581–592. https://doi.org/10.1093/biomet/63.3.581
Article MathSciNet MATH Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J Roy Stat Soc 39(1):1–38. https://doi.org/10.1111/j.2517-6161.1977.tb01600.x
Article MathSciNet MATH Google Scholar
Little RJA, Rubin DB (1987) Statistical analysis with missing data. Wiley, New York
MATH Google Scholar
Graham JW (2009) Missing data analysis: making it work in the real world. Annu Rev Psychol 60:549–576. https://doi.org/10.1146/annurev.psych.58.110405.085530
Article Google Scholar
Donders ART, van der Heijden GJMG, Stijnen T et al (2006) Review: a gentle introduction to imputation of missing values. J Clin Epidemiol 59(10):1087–1091. https://doi.org/10.1016/j.jclinepi.2006.01.014
Article Google Scholar
Schneiderman ED, Kowalski CJ, Willis SM (1993) Regression imputation of missing values in longitudinal data sets. Int J Biomed Comput 32(2):121
Article Google Scholar
Batista GEAP, Monard MC (2003) An analysis of four missing data treatment methods for supervised learning. Appl Artif Intell 17(5/6):519–533
Article Google Scholar
Hastie T, Mazumder R, Lee J et al (2015) Matrix completion and low-rank SVD via fast alternating least squares. J Mach Learn Res 16(1):3367–3402
MathSciNet MATH Google Scholar
Rubin DB (1990) Multiple imputation for nonresponse in surveys. Wiley, New York
MATH Google Scholar
White IR, Royston P, Wood AM (2011) Multiple imputation using chained equations: Issues and guidance for practice. Stat Med 30(4).
Bian Z, Zhang R (2018) Bone age assessment method based on deep convolutional neural network. In: 8th IEEE International Conference on Electronics Information and Emergency Communication (ICEIEC). IEEE, Beijing, pp 194–197.
Pu Q, Zhu X, Zhang R et al (2020) Speed profile tracking by an adaptive controller for subway train based on neural network and PID algorithm. IEEE Trans Veh Technol 69(10):10656–10667. https://doi.org/10.1109/TVT.2020.3019699
Article Google Scholar
Cappelletti L, Fontana T, Di Donato GW et al (2020) Complex data imputation by auto-encoders and convolutional neural networks-a case study on genome gap-filling. Computers 9(2). https://doi.org/10.3390/computers9020037.
Liu YT (2019) Incomplete big data imputation mining algorithm based on BP neural network. J Intell Fuzzy Syst 37(13):1–10
Google Scholar
Sangeetha M, Kumaran MS (2020) Deep learning-based data imputation on time-variant data using recurrent neural network. Soft Comput 24(1).
Che ZP, Purushotham S, Cho K, Sontag D, Liu Y (2018) Recurrent neural networks for multivariate time series with missing values. Scientific Reports 8(1). https://doi.org/10.1038/s41598-018-24271-9.
Tsokov S, Lazarova M, Aleksievapetrova A et al (2022) A hybrid spatiotemporal deep model based on CNN and LSTM for air pollution prediction. Sustainability 14(9):5104. https://doi.org/10.3390/su14095104
Article Google Scholar
Fernando MP, Cesar F, David N et al (2021) Missing the missing values: the ugly duckling of fairness in machine learning. Int J Intell Syst 36(7):3217–3258. https://doi.org/10.1002/int.22415
Article Google Scholar
Goodfellow IJ, Pouget-Abadie J, Mirza M et al (2014) Generative adversarial networks. In: Annual conference on neural information processing systems 2014 (NIPS 2014). Advances in Neural Information Processing Systems 27, Montreal, pp 2672–2680.
Yoon J, Jordon J, van der Schaar M (2018) GAIN: Missing Data Imputation using Generative Adversarial Nets. In Proceedings of the 35th international conference on machine learning. Proceedings of Machine Learning Research 80, Stockholm Sweden, pp 5689–5698.
Moher D, Liberati A, Tetzlaff J et al (2009) Preferred reporting items for systematic reviews and meta-analyses: the PRISMA Statement. J Clin Epidemiol 62(10):1006–1012. https://doi.org/10.1016/j.jclinepi.2009.06.005
Article Google Scholar
Page M, McKenzie J, Bossuyt P et al (2020) The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ 2021 372:n71. https://doi.org/10.1136/bmj.n71.
Huang Y, Tang Y, VanZwieten J et al (2020) Reliable machine prognostic health management in the presence of missing data. Concurr Computation Pract Experience 34(12). https://doi.org/10.1002/cpe.5762.
Liu C, Zhou H, Sun Z et al (2022) GlowImp: Combining GLOW and GAN for multivariate time series imputation. In: 21st International conference on algorithms and architectures for parallel processing (ICA3PP). Springer, Xiamen, pp 50–64
Vasata D, Halama T, Friedjungova M (2021) Image inpainting using Wasserstein generative adversarial imputation network. In: 30th International Conference on Artificial Neural Networks (ICANN). Springer, pp 575–586.
Ge Q, Huang X, Fang S et al (2020) Conditional generative adversarial networks for individualized treatment effect estimation and treatment selection. Front Genetics 11:585804. https://doi.org/10.3389/fgene.2020.585804.
Pan Y, Liu M, Lian C et al (2020) Spatially-constrained fisher representation for brain disease identification with incomplete multi-modal neuroimages. IEEE Trans Med Imaging 39(9):2965–2975. https://doi.org/10.1109/TMI.2020.2983085
Article Google Scholar
Qu F, Liu J, Ma Y et al (2020) A novel wind turbine data imputation method with multiple optimizations based on GANs. Mech Syst Signal Process 139:1–15. https://doi.org/10.1016/j.ymssp.2019.106610
Article Google Scholar
Mendes A, Togelius J, Coelho LDS (2020) Unified multi-domain learning and data imputation using adversarial autoencoder. In: 2020 International joint conference on neural networks (IJCNN). IEEE, Glasgow, pp 1–8.
Tan LZ, Su W, Zhang W et al (2021) A packet loss monitoring system for in-band network telemetry: detection, localization, diagnosis and recovery. IEEE Trans Netw Serv Manage 18(4):4151–4168. https://doi.org/10.1109/TNSM.2021.3125012
Article Google Scholar
Xiao X, Zhang YL, Yang S et al (2021) Efficient missing counts imputation of a bike-sharing system by generative adversarial network. IEEE Trans Intell Transp Syst 23(8):13443–13451. https://doi.org/10.1109/TITS.2021.3124409
Article Google Scholar
Le TP, Rho C, Min Y et al (2021) A2GAN: a deep reinforcement-based learning algorithm for risk-aware in finance. IEEE Access 9:137165–137175. https://doi.org/10.1109/ACCESS.2021.3117593
Article Google Scholar
Li Z, Li D (2022) Action recognition of construction workers under occlusion. J Build Eng 45:1–9. https://doi.org/10.1016/j.jobe.2021.103352
Article Google Scholar
Pan J, Li CB, Tang Y et al (2021) Energy consumption prediction of a CNC machining process with incomplete Data. IEEE-CAA J Automat Sin 8(5):987–1000. https://doi.org/10.1109/JAS.2021.1003970
Article Google Scholar
Kim B, Lee D, Preethaa KRS et al (2021) Predicting wind flow around buildings using deep learning. J Wind Eng Ind Aerodyn 219:104820. https://doi.org/10.1016/j.jweia.2021.104820.
Vinas R, Azevedo T, Gamazon ER et al (2021) Deep learning enables fast and accurate imputation of gene expression. Front Genet 12:624128. https://doi.org/10.3389/fgene.2021.624128.
Kim B, Yuvaraj N, Sri Preethaa KR et al (2021) Wind-induced pressure prediction on tall buildings using generative adversarial imputation network. Sensors 21(7). https://doi.org/10.3390/s21072515.
Wang W, Chai Y (2022) GAGIN generative adversarial guider imputation network for missing data. Neural Comput Appl 34:7597–7610
Article Google Scholar
Hallaji E, Razavi-Far R, Palade V et al (2021) Adversarial learning on incomplete and imbalanced medical data for robust survival prediction of liver transplant patients. IEEE Access 9:73641–73650. https://doi.org/10.1109/ACCESS.2021.3081040
Article Google Scholar
Liu T, Fan J, Luo Y et al (2021) Adaptive data augmentation for supervised learning over missing data. Proc VLDB Endowment 14(7):1202–1214. https://doi.org/10.14778/3450980.3450989.
Gupta M, Bunnell H, Phan T et al (2021) Concurrent imputation and prediction on EHR data using bi-directional GANs bi-GANs for EHR imputation and prediction. In: 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (ACM-BCB). Association for Computing Machinery, New York, pp 7.
Kazemi A, Meidani H (2021) IGANI: iterative generative adversarial networks for imputation with application to traffic data. IEEE Access 9:112966–112977. https://doi.org/10.1109/ACCESS.2021.3103456
Article Google Scholar
Yao Z, Zhao C (2021) FIGAN: A missing industrial data imputation method customized for soft sensor application. IEEE Trans Automat Sci Eng, pp 1–11. https://doi.org/10.1109/TASE.2021.3132037.
Qiu W, Huang Y Li Q (2020) IFGAN: Missing value imputation using feature-specific generative adversarial networks. In: 2020 IEEE International conference on big data (BIG DATA). IEEE, New York, pp 4715–4723.
Yoon J, Sull S (2020) GAMIN: Generative adversarial multiple imputation network for highly missing data. In: 2020 IEEE/CVF conference on computer vision and pattern Recognition (CVPR). IEEE, New York, pp 8453–8461.
Sajeeda A, Ahmed SS, Hossain BMM (2020) Bangla missing data imputation using HexaGAN framework. In: 2020 23rd International conference on computer and information technology (ICCIT 2020). IEEE, New York, pp 1–5.
Low R, Tekler Z, Cheah L (2020) Predicting commercial vehicle parking duration using generative adversarial multiple imputation networks. Transp Res Rec 2674(9):820–831. https://doi.org/10.1177/0361198120932166
Article Google Scholar
Hwang U, Jung D, Yoon J (2019) HexaGAN: Generative adversarial nets for real world classification. In: Proceedings of the 36th international conference on machine learning (ICML 2019). Proceedings of machine learning Research 97, Long Beach, pp 2921–2930.
Zhang W, Zhang P, Yu Y et al (2021) Missing data repairs for traffic flow with self-attention generative adversarial imputation Net. IEEE Trans Intell Transp Syst 23(7):7919–7930. https://doi.org/10.1109/TITS.2021.3074564
Article MathSciNet Google Scholar
Wang Y, Li D, Li X et al (2021) PC-GAIN: Pseudo-label conditional generative adversarial imputation networks for incomplete data. Neural Netw 141:395–403. https://doi.org/10.1016/j.neunet.2021.05.033
Article Google Scholar
Awan SE, Bennamoun M, Sohel F et al (2021) Imputation of missing data with class imbalance using conditional generative adversarial networks. Neurocomputing 453:164–171. https://doi.org/10.1016/j.neucom.2021.04.010
Article Google Scholar
Zhou X, Liu X, Lan G et al (2021) Federated conditional generative adversarial nets imputation method for air quality missing data. Knowl-Based Syst 228:1–12. https://doi.org/10.1016/j.knosys.2021.107261
Article Google Scholar
Chawla A, Agrawal P, Panigrahi BK et al (2021) Deep-learning-based data-manipulation attack resilient supervisory backup protection of transmission lines. Neural Comput Appl. https://doi.org/10.1007/s00521-021-06106-3
Article Google Scholar
Dong W, Fong DYT, Yoon J et al (2021) Generative adversarial networks for imputing missing data for big data clinical research. BMC Medical Research Methodology 21(1). https://doi.org/10.1186/s12874-021-01272-3.
Oh E, Kim T, Ji Y et al (2021) STING: Self-attention based time-series imputation networks using GAN. In: Proceedings of the 21st IEEE international conference on data mining (IEEE ICDM). IEEE, Auckland, pp 1264–1269.
Deng G, Han CZ, Matteson DS (2022) Extended missing data imputation via GANs for ranking applications. Data Min Knowl Disc 36:1498–1520. https://doi.org/10.1007/s10618-022-00837-0
Article MathSciNet MATH Google Scholar
Hu WY, Wang TY, Chu FL (2022) Fault feature recovery with Wasserstein generative adversarial imputation network with gradient penalty for rotating machine health monitoring under signal loss condition. IEEE Trans Instrum Meas 71:1–12. https://doi.org/10.1109/TIM.2022.3168898
Article Google Scholar
Dai Z, Bu Z, Long Q (2021) Multiple imputation via generative adversarial network for high-dimensional blockwise missing value problems. In: 20th IEEE international conference on machine learning and applications (ICMLA). IEEE, Pasadena, pp 1–6.
Li SC, Jiang B, Marlin BM (2019) MisGAN learning from incomplete data with generative adversarial networks. International Conference on Learning Representations (ICLR 2019). OpenReview.net, New Orleans, pp 1–20
Gulrajani I, Ahmed F, Arjovsky M (2017) Improved training of Wasserstein GANs. In: 27th international conference on neural information processing systems. Long Beach, pp 5769–5779.
Arjovsky M, Bottou L (2017) Towards principled methods for training generative adversarial networks. Stat 1050.
Hochreiter S, Schmidhuber J (1997) Long short-rerm memory. Neural Comput 9:1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Article Google Scholar
Cho K, van Merriënboer B, Gulcehre C et al (2014) Learning phrase representations using RNN encoder–decoder for statistical machine translation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), Doha, Qatar.
Ackley DH, Hinton GE, Sejnowski TJ (2010) A learning algorithm for Boltzmann machines. Cogn Sci 9(1):147–169
Article Google Scholar
Mao X, Li Q, Xie H et al (2017) Least squares generative adversarial networks. In: 2017 IEEE International conference on computer vision. IEEE, Beijing, pp 2813–2821.

Download references

Acknowledgements

This work was partially supported by National Natural Science Foundation of China with Grant Number 62173025, a major project of National Social Science Foundation of China with Grant Number 18ZDA086, and Beijing Natural Science Foundation with Grant Number of L201003. The authors have no relevant financial or non-financial interests to disclose.

Author information

Authors and Affiliations

School of Economics and Management, Beijing Jiaotong University, Beijing, 100044, China
Yuqing Zhang, Runtong Zhang & Butian Zhao

Authors

Yuqing Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Runtong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Butian Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Runtong Zhang.

Ethics declarations

Conflict of interest

All authors declare that there is no conflict of interest in this review.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhang, Y., Zhang, R. & Zhao, B. A systematic review of generative adversarial imputation network in missing data imputation. Neural Comput & Applic 35, 19685–19705 (2023). https://doi.org/10.1007/s00521-023-08840-2

Download citation

Received: 02 October 2022
Accepted: 28 June 2023
Published: 21 July 2023
Issue Date: September 2023
DOI: https://doi.org/10.1007/s00521-023-08840-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A systematic review of generative adversarial imputation network in missing data imputation

Abstract

Access this article

Similar content being viewed by others

MIDA: Multiple Imputation Using Denoising Autoencoders

Numerical Data Imputation: Choose kNN over Deep Learning

SGAIN, WSGAIN-CP and WSGAIN-GP: Novel GAN Methods for Missing Data Imputation

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A systematic review of generative adversarial imputation network in missing data imputation

Abstract

Access this article

Similar content being viewed by others

MIDA: Multiple Imputation Using Denoising Autoencoders

Numerical Data Imputation: Choose kNN over Deep Learning

SGAIN, WSGAIN-CP and WSGAIN-GP: Novel GAN Methods for Missing Data Imputation

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation