Abstract
It has been demonstrated that modified denoising stacking autoencoders (MSDAEs) serve to implement high-performance missing value imputation schemes. On the other hand, complete MSDAE (CMSDAE) classifiers, which extend their inputs with target estimates from an auxiliary classifier and are layer by layer trained to recover both the observation and the target estimates, offer classification results that are better than those provided by MSDAEs. As a consequence, investigating whether CMSDAEs can improve the MSDAEs imputation processes has an obvious practical importance. In this correspondence, two types of imputation mechanisms with CMSDAEs are considered. The first is a direct procedure in which the CMSDAE output is just the target. The second mechanism is suggested by the presence of the targets in the vectors to be autoencoded, and it uses the well-known multitask learning (MTL) ideas, including the observations as a secondary task. Experimental results show that these CMSDAE structures increase the quality of the missing value imputations, in particular the MTL versions. They give the best result in 5 out of 6 missing value problems.
Similar content being viewed by others
References
LeCun Y (1987) Modeles connexionistes de l’apprentissage. Ph.D. thesis, Universite de Paris
Hinton GE, Zemel RS (1994) Autoencoders, minimum description length and Helmholtz free energy. In: Cowan JD, Tesauro G, Alspector J (eds) Advances in neural information processing systems, vol 6. Morgan Kaufmann, Burlington, pp 3–10
Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol PA (2010) Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 11:3371–3408
Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35:3371–3408
Tan CC, Eswaran C (2010) Reconstruction and recognition of face and digit images using autoencoders. Neural Comput Appl 19:1069–1079
Hadjahmadi AH, Homayounpour MM (2019) Robust feature extraction and uncertainty estimation based on attractor dynamics in cyclic deep denoising autoencoder. Neural Comput Appl 31:7989–8002
Alvear-Sandoval RF, Figueiras-Vidal AR (2018) On building ensembles of stacked denoising auto-encoding classifiers and their further improvement. Inf Fusion 39:41–52
Alhassan Z, Budgen D, Alshammari R, Daghstani T, McGough AS, Moubayed NA (2018) Stacked denoising autoencoders for mortality risk prediction using imbalanced clinical data. In: Proceedings of the 17th IEEE international conference on machine learning and applications (ICMLA), Orlando, FL, pp 541–546
Sánchez-Morales A, Sancho-Gómez JL, Figueiras-Vidal AR (2019) Exploiting label information to improve auto-encoding based classifiers. Neurocomputing. https://doi.org/10.1016/j.neucom.2019.08.055
Jia C, Shao M, Li S, Zhao H, Fu Y (2018) Stacked denoising tensor auto-encoder for action recognition with spatiotemporal corruptions. IEEE Trans Image Process 27:1878–1887
Rubio JJ (2017) Stable Kalman filter and neural network for the chaotic systems identification. J Frankl Inst 354:7444–7462
Sánchez-Morales A, Sancho-Gómez JL, Martínez-García JA, Figueiras-Vidal AR (2019) Improving deep learning performance with missing values via deletion and compensation. Neural Comput Appl. https://doi.org/10.1007/s00521-019-04013-2 (to appear)
Gondara L, Wang K. Multiple imputation using deep denoising autoencoders. CoRR abs/1705.02737. arXiv:1705.02737
Caruana R (1997) Multitask learning. Mach Learn 28:41–75
Wang C, Liao X, Carin L, Dunson D-B (2010) Classification with incomplete data using Dirichlet process priors. J Mach Learn Res 11:3269–3311
García-Laencina PJ, Sancho-Gómez J-L, Figueiras-Vidal AR (2013) Classifying patterns with missing values using multi-task learning perceptrons. Expert Syst Appl 40:1333–1341
Raghunathan TW, Lepkowksi JM, Hoewyk JV, Solenbeger P (2001) A multivariate technique for multiply imputing missing values using a sequence of regression models. Surv Methodol 27:85–95
Buuren SV (2007) Multiple imputation of discrete and continuous data by fully conditional specification. Stat Methods Med Res 16:219–242
Dua D, Graff C (2017) UCI machine learning repository. University of California, School of Information and Computer Sciences, Irvine. http://archive.ics.uci.edu/ml
Sloan digital sky survey RD14. http://www.kaggle.com/lucidlenn/sloan-digital-sky-survey/
Rectangles data. http://www.iro.umontreal.ca/
Acknowledgements
This work has been partially supported by Network of Excellence MAPAS (TIN2017-90567-REDT, M\(^\circ \) Ciencia, Inn. y Univ.) and Grant 2-BARBAS (BBVA Foundation).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Sánchez-Morales, A., Sancho-Gómez, JL. & Figueiras-Vidal, A.R. Complete autoencoders for classification with missing values. Neural Comput & Applic 33, 1951–1957 (2021). https://doi.org/10.1007/s00521-020-05066-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-020-05066-4