Skip to main content
Log in

Missing value estimation of microarray data using Sim-GAN

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Microarray data analysis needs utmost care as it plays a significant role in cancer study. Due to the excessive complexity of the data extraction process, it loses some relevant information (missing values) which leads to a significant irrecoverable disruption from the actual scenario. The imputation of missing values is a crucial preprocessing step in analyzing microarray data. Currently, numerous methodologies have been designed to resolve the problem, but the unsatisfactory outcome is obtained with high missing rates of data. In order to estimate the missing expression to complete the dataset, a novel method has been proposed based on the similarity index and generative adversarial network (Sim-GAN). Firstly, the raw dataset has been divided into two subsets, i.e., the target set (which contains genes with missing expression values) and the candidate set (contains without missing values). In the next step, the similarity index between target genes and candidate genes has been obtained. As microarray data represents several biological factors, three similarity matrices (structural similarity, functional similarity, and semantic similarity) have been derived to find the small subset of candidate genes for each target gene. In structural similarity, a novel approach has been used to reduce the time complexity is O(1) as well as tackle the nonlinearity. Now, the obtained subsets are fed into a generative adversarial network to compute the missing values of the targeted genomes. The experimental outcomes consolidate the claim that the proposed methodology gives a satisfactory performance in terms of meaningful expression values. A detailed comparative study based on several statistical (i.e., NRMSE, AUROC, etc.) and biological (i.e., CPP, BLCI) metrics to confirm that the proposed Sim-GAN outperforms the existing missing value estimation techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Al-Janabi S, Alkaim AF (2020) A nifty collaborative analysis to predicting a novel tool (DRFLLS) for missing values estimation. Soft Comput 24:555–569. https://doi.org/10.1007/s00500-019-03972-x

    Article  Google Scholar 

  2. Bayrak T, Ogul H (2017) Microarray missing data imputation using regression. In: 2017 13th IASTED international conference on biomedical engineering (BioMed), pp 68–73

  3. Bertsimas D, Pawlowski C, Zhuo YD (2018) From predictive methods to missing data imputation: an optimization approach. J Mach Learn Res 18:1–39

    MathSciNet  MATH  Google Scholar 

  4. Bruckmaier G, Krauss S, Binder K et al (2021) Tversky and Kahneman’s cognitive illusions: who can solve them, and why? Front Psychol 12:584689. https://doi.org/10.3389/fpsyg.2021.584689

    Article  Google Scholar 

  5. Chen X, Huang Y-A, Wang X-S et al (2016) FMLNCSIM: fuzzy measure-based lncRNA functional similarity calculation model. Oncotarget 7:45948–45958. https://doi.org/10.18632/oncotarget.10008

    Article  Google Scholar 

  6. de Brevern AG, Hazout S, Malpertuy A (2004) Influence of microarrays experiments missing values on the stability of gene groups by hierarchical clustering. BMC Bioinform 5:114. https://doi.org/10.1186/1471-2105-5-114

    Article  Google Scholar 

  7. Dzulkalnine MF, Sallehuddin R (2019) Missing data imputation with fuzzy feature selection for diabetes dataset. SN Appl Sci 1:362. https://doi.org/10.1007/s42452-019-0383-x

    Article  Google Scholar 

  8. Das AK, Pati SK (2012) Gene subset selection for cancer classification using statsitical and rough set approach. Swarm, evolutionary, and memetic computing. In: SEMCCO 2012. LNCS, vol 7677, pp 294–302. https://doi.org/10.1007/978-3-642-35380-2_35

  9. Ehsani R, Drabløs F (2016) TopoICSim: a new semantic similarity measure based on gene ontology. BMC Bioinform 17:296. https://doi.org/10.1186/s12859-016-1160-0

    Article  Google Scholar 

  10. Faisal S, Tutz G (2017) Missing value imputation for gene expression data by tailored nearest neighbors. Stat Appl Genet Mol Biol 16:95–106. https://doi.org/10.1515/sagmb-2015-0098

    Article  MathSciNet  MATH  Google Scholar 

  11. Gong W, Kwak I-Y, Pota P et al (2018) DrImpute: imputing dropout events in single cell RNAsequencing data. BMC Bioinform 19:220. https://doi.org/10.1186/s12859-018-2226y

    Article  Google Scholar 

  12. Gong Y, Yu X, Ding Y, et al (2021) Effective fusion factor in FPN for tiny object detection. In: 2021 IEEE Winter conference on applications of computer vision (WACV). IEEE, Waikoloa, HI, USA, pp 1159–1167

  13. Goodfellow I, Pouget-Abadie J, Mirza M, et al (2014) Generative adversarial nets. In: Advances in neural information processing systems. Curran Associates, Inc., pp 2672–2680

  14. He C, Li H-H, Zhao C, et al (2015) Triple imputation for microarray missing value estimation. In: 2015 IEEE international conference on bioinformatics and biomedicine (BIBM), pp 208–213

  15. Jin L, Bi Y, Hu C et al (2021) A comparative study of evaluating missing value imputation methods in label-free proteomics. Sci Rep 11:1760. https://doi.org/10.1038/s41598-021-81279-4

    Article  Google Scholar 

  16. Keerin P, Kurutach W, Boongoen T (2016) A cluster-directed framework for neighbour based imputation of missing value in microarray data. IJDMB 15:165. https://doi.org/10.1504/IJDMB.2016.076535

    Article  Google Scholar 

  17. Kim J, Tae D, Seok J (2020) A survey of missing data imputation using generative adversarial networks. In: 2020 International conference on artificial intelligence in information and communication (ICAIIC), pp 454–456

  18. Lee D, Kim J, Moon W-J, Ye JC (2019) CollaGAN: collaborative GAN for missing image data imputation. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 2482–2491

  19. Li J, Liu H (2002) Kent ridge bio-medical data set repository. http://datam.i2r.a-star.edu.sg/datasets/krbd

  20. Liu Z, Lin W, Li X, et al (2021) ADNet: attention-guided deformable convolutional network for high dynamic range imaging. In: 2021 IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW). IEEE, Nashville, TN, USA, pp 463–470

  21. Maguitman AG, Menczer F, Erdinc F et al (2006) Algorithmic computation and approximation of semantic similarity. World Wide Web 9:431–456. https://doi.org/10.1007/s11280-006-8562-2

    Article  Google Scholar 

  22. Mishra A, Naik B, Srichandan SK (2018) Missing value imputation using ANN optimized by genetic algorithm. IJAIE 5:41–57. https://doi.org/10.4018/IJAIE.2018070104

    Article  Google Scholar 

  23. Nikfalazar S, Yeh C-H, Bedingfield S, Khorshidi HA (2020) Missing data imputation using decision trees and fuzzy clustering with iterative learning. Knowl Inf Syst 62:2419–2437. https://doi.org/10.1007/s10115-019-01427-1

    Article  Google Scholar 

  24. Pati SK, Das AK (2017) Missing value estimation for microarray data through cluster analysis. Knowl Inf Syst 52:709–750. https://doi.org/10.1007/s10115-017-1025-5

    Article  Google Scholar 

  25. Purwar A, Singh SK (2015) Hybrid prediction model with missing value imputation for medical data. Expert Syst Appl 42:5621–5631. https://doi.org/10.1016/j.eswa.2015.02.050

    Article  Google Scholar 

  26. Rahman MdG, Islam MZ (2016) Missing value imputation using a fuzzy clustering-based EM approach. Knowl Inf Syst 46:389–422. https://doi.org/10.1007/s10115-015-0822-y

    Article  Google Scholar 

  27. Raimondi D, Passemiers A, Fariselli P, Moreau Y (2021) Current cancer driver variant predictors learn to recognize driver genes instead of functional variants. BMC Biol 19:3. https://doi.org/10.1186/s12915-020-00930-0

    Article  Google Scholar 

  28. Satu MS, Khan MI, Rahman MR et al (2021) Diseasome and comorbidities complexities of SARS-CoV-2 infection with common malignant diseases. Brief Bioinform 22:1415–1429. https://doi.org/10.1093/bib/bbab003

    Article  Google Scholar 

  29. Shang C, Palmer A, Sun J, et al (2017) VIGAN: missing view imputation with generative adversarial networks. In: 2017 IEEE international conference on big data (big data). https://doi.org/10.1109/BigData.2017.8257992

  30. Svedung Wettervik T, Howells T, Lewén A et al (2021) Temporal dynamics of ICP, CPP, PRx, and CPPopt in high-grade aneurysmal subarachnoid hemorrhage and the relation to clinical outcome. Neurocrit Care 34:390–402. https://doi.org/10.1007/s12028-020-01162-4

    Article  Google Scholar 

  31. Teng Z, Guo M, Liu X et al (2013) Measuring gene functional similarity based on group-wise comparison of GO terms. Bioinformatics 29:1424–1432. https://doi.org/10.1093/bioinformatics/btt160

    Article  Google Scholar 

  32. Tsai C-F, Li M-L, Lin W-C (2018) A class center based approach for missing value imputation. Knowl Based Syst 151:124–135. https://doi.org/10.1016/j.knosys.2018.03.026

    Article  Google Scholar 

  33. Van Cleemput E, Vanierschot L, Fernández-Castilla B et al (2018) The functional characterization of grass- and shrubland ecosystems using hyperspectral remote sensing: trends, accuracy and moderating variables. Remote Sens Environ 209:747–763. https://doi.org/10.1016/j.rse.2018.02.030

    Article  Google Scholar 

  34. Vijay SAA, GaneshKumar P (2021) Fuzzy system for classification of microarray data using a hybrid ant stem optimisation algorithm. IJAIP 18:154. https://doi.org/10.1504/IJAIP.2021.112902

    Article  Google Scholar 

  35. Wang A, Chen Y, An N et al (2019) Microarray missing value imputation: a regularized local learning method. IEEE/ACM Trans Comput Biol Bioinform 16:980–993. https://doi.org/10.1109/TCBB.2018.2810205

    Article  Google Scholar 

  36. Wang A, Yang J, An N (2021) Regularized sparse modelling for microarray missing value estimation. IEEE Access 9:16899–16913. https://doi.org/10.1109/ACCESS.2021.3053631

    Article  Google Scholar 

  37. Xu T, Takano W (2021) Graph stacked hourglass networks for 3d human pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 16105–16114

  38. Yang MQ, Weissman SM, Yang W et al (2018) MISC: missing imputation for single-cell RNA sequencing data. BMC Syst Biol 12:114. https://doi.org/10.1186/s12918-018-0638-y

    Article  Google Scholar 

  39. Yang Y, Fu X, Qu W et al (2018) MiRGOFS: a GO-based functional similarity measurement for miRNAs, with applications to the prediction of miRNA subcellular localization and miRNA-disease association. Bioinformatics 34:3547–3556. https://doi.org/10.1093/bioinformatics/bty343

    Article  Google Scholar 

  40. Yang Y, Xu Z, Song D (2016) Missing value imputation for microRNA expression data by using a GO-based similarity measure. BMC Bioinform 17:S10. https://doi.org/10.1186/s12859-015-0853-0

    Article  Google Scholar 

  41. Yao W, Wang Y, Xu Y, Naayagi RT (2020) Communication time-delay stability margin analysis of the islanded microgrid under distributed secondary control. In: 2020 IEEE Power & Energy Society general meeting (PESGM), pp 1–5

  42. Yoon J, Jordon J, van der Schaar M (2018) GAIN: missing data imputation using generative adversarial nets. In: International conference on machine learning, PMLR, pp 5689–5698

  43. Zhu X, Wang J, Sun B et al (2021) An efficient ensemble method for missing value imputation in microarray gene expression data. BMC Bioinform 22:188. https://doi.org/10.1186/s12859-021-04109-4

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank anonymous reviewers for their valuable comments.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Soumen Kumar Pati or Ayan Banerjee.

Ethics declarations

Conflict of interests

The authors declare that there are no conflicts of interest in this paper.

Ethical approval

This article does not contain any studies with human participants performed by any authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pati, S.K., Gupta, M.K., Shai, R. et al. Missing value estimation of microarray data using Sim-GAN. Knowl Inf Syst 64, 2661–2687 (2022). https://doi.org/10.1007/s10115-022-01718-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-022-01718-0

Keywords

Navigation