MI2AMI: Missing Data Imputation Using Mixed Deep Gaussian Mixture Models

Fuchs, Robin; Pommeret, Denys; Stocksieker, Samuel

doi:10.1007/978-3-031-25599-1_16

Robin Fuchs¹⁵,
Denys Pommeret^15,16 &
Samuel Stocksieker¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13810))

Included in the following conference series:

International Conference on Machine Learning, Optimization, and Data Science

832 Accesses

Abstract

Imputing missing data is still a challenge for mixed datasets containing variables of different nature such as continuous, count, ordinal, categorical, and binary variables. The recently introduced Mixed Deep Gaussian Mixture Models (MDGMM) explicitly handle such different variable types. MDGMMs learn continuous and low dimensional representations of mixed datasets that capture the inter-variable dependence structure. We propose a model inversion that uses the learned latent representation and maps it with the observed parts of the signal. Latent areas of interest are identified for each missing value using an optimization method and synthetic imputation values are drawn. This new procedure is called MI2AMI (Missing data Imputation using MIxed deep GAussian MIxture models). The approach is tested against state-of-the-art mixed data imputation algorithms based on chained equations, Random Forests, k-Nearest Neighbours, and Generative Adversarial Networks. Two missing values designs were tested, namely the Missing Completly at Random (MCAR) and Missing at Random (MAR) designs, with missing value rates ranging from 10% to 30%.

Granted by the Research Chair NINA under the aegis of the Risk Foundation, an initiative by BNP Cardif.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Audigier, V., Husson, F., Josse, J.: A principal component method to impute missing values for mixed data. Adv. Data Anal. Classif. 10, 5–26 (2016)
Article MathSciNet MATH Google Scholar
van Buuren, S., Groothuis-Oudshoorn, K.: mice: multivariate imputation by chained equations in r. J. Stat. Softw. 45(3), 1–67 (2011). https://doi.org/10.18637/jss.v045.i03. https://www.jstatsoft.org/index.php/jss/article/view/v045i03
Cagnone, S., Viroli, C.: A factor mixture model for analyzing heterogeneity and cognitive structure of dementia. AStA Adv. Stat. Anal. 98(1), 1–20 (2014)
Article MathSciNet MATH Google Scholar
Choudhury, A., Kosorok, M.R.: Missing data imputation for classification problems (2020)
Google Scholar
Christoffersen, B., Clements, M., Humphreys, K., Kjellström, H.: Asymptotically exact and fast gaussian copula models for imputation of mixed data types (2021)
Google Scholar
Conn, A.R., Gould, N.I., Toint, P.L.: Trust region methods. SIAM (2000)
Google Scholar
Deng, G., Han, C., Matteson, D.S.: Learning to rank with missing data via generative adversarial networks. arXiv preprint arXiv:2011.02089 (2020)
Fuchs, R., Pommeret, D., Viroli, C.: Mixed deep gaussian mixture model: a clustering model for mixed datasets. Advances in Data Analysis and Classification, pp. 1–23 (2021)
Google Scholar
Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics, pp. 857–871 (1971)
Google Scholar
Kowarik, A., Templ, M.: Imputation with the r package vim. J. Stat. Softw. 74(7), 1–16 (2016). https://doi.org/10.18637/jss.v074.i07. https://www.jstatsoft.org/index.php/jss/article/view/v074i07
Lee, D., Kim, J., Moon, W.J., Ye, J.C.: Collagan : Collaborative gan for missing image data imputation (2019)
Google Scholar
Li, S.C.X., Jiang, B., Marlin, B.: Learning from incomplete data with generative adversarial networks. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=S1lDV3RcKm
Lim, T., Loh, W., Shih, Y.: A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Mach. Learn. 40(3), 203–228 (2000)
Article MATH Google Scholar
McLachlan, G.J., Basford, K.E.: Mixture Models: Inference and Applications to Clustering. Marcel Dekker, New York (1988)
MATH Google Scholar
Moustaki, I.: A general class of latent variable models for ordinal manifest variables with covariate effects on the manifest and latent variables. Br. J. Math. Stat. Psychol. 56(2), 337–357 (2003)
Article MathSciNet Google Scholar
Moustaki, I., Knott, M.: Generalized latent trait models. Psychometrika 65(3), 391–411 (2000)
Article MathSciNet MATH Google Scholar
Murray, J.S., Reiter, J.P.: Multiple imputation of missing categorical and continuous values via bayesian mixture models with local dependence. J. Am. Stat. Assoc.111(516), 1466–1479 (2016). https://doi.org/10.1080/01621459.2016.117. https://ideas.repec.org/a/taf/jnlasa/v111y2016i516p1466-1479.html
Rubin, D.B.: Inference and missing data. Biometrika 63(3), 581–592 (1976). https://doi.org/10.1093/biomet/63.3.581
Shang, C., Palmer, A., Sun, J., Chen, K.S., Lu, J., Bi, J.: Vigan: Missing view imputation with generative adversarial networks (2017)
Google Scholar
Stekhoven, D.J., Bühlmann, P.: MissForest-non-parametric missing value imputation for mixed-type data. Bioinformatics 28(1), 112–118 (2011). https://doi.org/10.1093/bioinformatics/btr597
Viroli, C., McLachlan, G.J.: Deep gaussian mixture models. Stat. Comput. 29(1), 43–51 (2019)
Article MathSciNet MATH Google Scholar
Yoon, J., Jordon, J., van der Schaar, M.: GAIN: missing data imputation using generative adversarial nets. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 80, pp. 5689–5698. PMLR, 10–15 July 2018. https://proceedings.mlr.press/v80/yoon18a.html
Zhao, Y., Udell, M.: Missing value imputation for mixed data via gaussian copula (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

Aix Marseille Univ, CNRS, Centrale Marseille, I2M, Marseille, France
Robin Fuchs & Denys Pommeret
Lyon 1 Univ, ISFA, Lab. SAF EA2429, 69366, Lyon, France
Denys Pommeret & Samuel Stocksieker

Authors

Robin Fuchs
View author publications
You can also search for this author in PubMed Google Scholar
Denys Pommeret
View author publications
You can also search for this author in PubMed Google Scholar
Samuel Stocksieker
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Denys Pommeret .

Editor information

Editors and Affiliations

University of Catania, Catania, Italy
Giuseppe Nicosia
University of Reading, Reading, UK
Varun Ojha
University of Oxford, Oxford, UK
Emanuele La Malfa
University of Cambridge, Cambridge, UK
Gabriele La Malfa
University of Florida, Gainesville, FL, USA
Panos Pardalos
Free University of Bozen-Bolzano, Bolzano, Italy
Giuseppe Di Fatta
University of Catania, Catania, Italy
Giovanni Giuffrida
Dana-Farber Cancer Institute, Boston, MA, USA
Renato Umeton

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fuchs, R., Pommeret, D., Stocksieker, S. (2023). MI2AMI: Missing Data Imputation Using Mixed Deep Gaussian Mixture Models. In: Nicosia, G., et al. Machine Learning, Optimization, and Data Science. LOD 2022. Lecture Notes in Computer Science, vol 13810. Springer, Cham. https://doi.org/10.1007/978-3-031-25599-1_16

Download citation

DOI: https://doi.org/10.1007/978-3-031-25599-1_16
Published: 09 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-25598-4
Online ISBN: 978-3-031-25599-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics