Abstract
Medical datasets are particularly subject to attribute noise, that is, missing and erroneous values. Attribute noise is known to be largely detrimental to learning performances. To maximize future learning performances, it is primordial to deal with attribute noise before performing any inference. We propose a simple autoencoder-based preprocessing method that can correct mixed-type tabular data corrupted by attribute noise. No other method currently exists to entirely handle attribute noise in tabular data. We experimentally demonstrate that our method outperforms both state-of-the-art imputation methods and noise correction methods on several real-world medical datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
References
Barnard, J., Meng, X.-L.: Applications of multiple imputation in medical studies: from AIDS to NHANES. Stat. Meth. Med. Res. 8(1). ISSN 0962–2802. https://doi.org/10.1177/096228029900800103
van Buuren, S., Groothuis-Oudshoorn, K.: mice: multivariate Imputation by Chained Equations in R. Journal of Statistical Software 45(3), 1–67 (2011). ISSN 1548–7660. https://doi.org/10.18637/jss.v045.i03
Golovenkin, S.E., et al.: Trajectories, bifurcations, and pseudo-time in large clinical datasets: applications to myocardial infarction and diabetes data. GigaScience 9(11), giaa128, November 2020. ISSN 2047–217X. https://doi.org/10.1093/gigascience/giaa128
Gondara, L., Wang, K.: MIDA: multiple imputation using denoising autoencoders. In: Phung, D., Tseng, V.S., Webb, G.I., Ho, B., Ganji, M., Rashidi, L. (eds.) PAKDD 2018. LNCS (LNAI), vol. 10939, pp. 260–272. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93040-4_21
Mazumder, R., Hastie, T., Tibshirani, R.: Spectral regularization algorithms for learning large incomplete matrices. J. Mach. Learn. Res. JMLR 11, 2287–2322 (2010)
Sagheer, S.V.M., George, S.N.: A review on medical image denoising algorithms. Biomed. Sig. Process. Control 61 (2020). ISSN 1746–8094. https://doi.org/10.1016/j.bspc.2020.102036
Muzellec, B., Josse, J., Boyer, C., Cuturi, M.: Missing data imputation using optimal transport. In: Proceedings of the 37th International Conference on Machine Learning, pp. 7130–7140. PMLR, November 2020. ISSN: 2640–3498 (2020)
Pereira, R.C., Santos, M., Rodrigues, P., Abreu, P.H.: Reviewing autoencoders for missing data imputation: technical trends, applications and outcomes. J. Artif. Intell. Res. 69, December 2020. https://doi.org/10.1613/jair.1.12312
Stef, V.B.: Flexible Imputation of Missing Data, 2nd edn.. Chapman & Hall (2018)
Stekhoven, D.J., Bühlmann, P.: MissForest-non-parametric missing value imputation for mixed-type data. Bioinformatics 28(1) (2012). ISSN 1367–4803. https://doi.org/10.1093/bioinformatics/btr597
Teng, C.M.: Polishing Blemishes: issues in data correction. IEEE Intell. Syst. 19(2) (2004). ISSN 1941–1294. https://doi.org/10.1109/MIS.2004.1274909. Conference Name: IEEE Intelligent Systems
Ulyanov, D., Vedaldi, A., Lempitsky, V.: Deep image prior. Int. J. Comput. Vis. 128(7), 1867–1888 (2020). https://doi.org/10.1007/s11263-020-01303-4
Van Hulse, J.D., Khoshgoftaar, T.M., Huang, H.: The pairwise attribute noise detection algorithm. Knowl. Inf. Syst. 11(2), 171–190 (2007). ISSN 0219–1377, 0219–3116. https://doi.org/10.1007/s10115-006-0022-x
Yan, l., et al.: An interpretable mortality prediction model for COVID-19 patients. Nat. Mach. Intell. 2(5), 283–288 (2020). ISSN 2522–5839. https://doi.org/10.1038/s42256-020-0180-7
Yang, Y., Wu, X., Zhu, X.: Dealing with predictive-but-unpredictable attributes in noisy data sources. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 471–483. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30116-5_43
Yoon, J., Jordon, J., Schaar, M.: GAIN: missing data imputation using generative adversarial nets. In: Proceedings of the 35th International Conference on Machine Learning, pp. 5689–5698. PMLR, July 2018. ISSN: 2640–3498
Zhu, X., Wu, X.: Class noise vs. attribute noise: a quantitative study. Artif. Intell. Rev. 22(3), 177–210 (2004). ISSN 1573–7462. https://doi.org/10.1007/s10462-004-0751-8
Acknowledgments
This research is supported by the European Union’s Horizon 2020 research and innovation program under grant agreement No 875171, project QUALITOP (Monitoring multidimensional aspects of QUAlity of Life after cancer ImmunoTherapy - an Open smart digital Platform for personalized prevention and patient management).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Ranvier, T., Elgazel, H., Coquery, E., Benabdeslem, K. (2023). Autoencoder-Based Attribute Noise Handling Method for Medical Data. In: Tanveer, M., Agarwal, S., Ozawa, S., Ekbal, A., Jatowt, A. (eds) Neural Information Processing. ICONIP 2022. Communications in Computer and Information Science, vol 1793. Springer, Singapore. https://doi.org/10.1007/978-981-99-1645-0_18
Download citation
DOI: https://doi.org/10.1007/978-981-99-1645-0_18
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-1644-3
Online ISBN: 978-981-99-1645-0
eBook Packages: Computer ScienceComputer Science (R0)