Abstract
In this work we introduce a copula-based method for imputing missing data by using conditional density functions of the missing variables given the observed ones. In theory, such functions can be derived from the multivariate distribution of the variables of interest. In practice, it is very difficult to model joint distributions and derive conditional distributions, especially when the margins are different. We propose a natural solution to the problem by exploiting copulas so that we derive conditional density functions through the corresponding conditional copulas. The approach is appealing since copula functions enable us (1) to fit any combination of marginal distribution functions, (2) to take into account complex multivariate dependence relationships and (3) to model the marginal distributions and the dependence structure separately. We describe the method and perform a Monte Carlo study in order to compare it with two well-known imputation techniques: the nearest neighbour donor imputation and the regression imputation by EM algorithm. Our results indicate that the proposal compares favourably with classical methods in terms of preservation of microdata, margins and dependence structure.


Similar content being viewed by others
References
Chen J, Shao J (2000) Nearest neighbour imputation for survey data. J Off Stat 16(2):113–131
Cherubini U, Luciano E, Vecchiato W (2004) Copula methods in finance. Wiley, Chichester
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood estimation for incomplete data via the EM algorithm. J R Stat Soc Ser B Stat Methodol 39(1):1–38
Hörmann W, Leydold J, Derflinger G (2007) Inverse transformed density rejection for unbounded monotone densities. ACM Trans Model Comput Simul 18(1):16
Jhun M, Jeong HC, Koo JY (2007) On the use of adaptive nearest neighbors for missing value imputation. Commun Stat Simul Comput 36:1275–1286
Joe H (1997) Multivariate models and multivariate concepts. Chapman & Hall, New York
Joe H, Xu J (1996) The estimation method of inference functions for margins for multivariate models. Technical Report 166, Department of Statistics, University of British Columbia
Käärik E, Käärik M (2009) Modeling dropouts by conditional distribution, a copula-based approach. J Stat Plan Inference 139:3830–3835
Kalton G, Kasprzyk D (1982) Imputing for missing survey responses. Proceedings of the survey research methods section. Washington DC, American Statistical Association, p 22–31
Kalton G, Kasprzyk D (1986) The treatment of missing survey data. Surv Methodol 12:1–16
Little RJA (1988) Missing data adjustments in large surveys. J Bus Econ Stat 6(2):287–295
Muñoz JF, Rueda M (2009) New imputation methods for missing data using quantiles. J Comput Appl Math 232:305–317
Nelsen RB (2006) Introduction to copulas. Springer, New York
Rivero C, Castillo A, Zufiria PJ, Valdés T (2004) Global dynamics of a system governing an algorithm for regression with censored and non-censored data under general errors. J Comput Appl Math 166:535–551
Schafer JL (1997) Analysis of incomplete multivariate data. Chapman & Hall, London
Sklar A (1959) Fonctions de répartition à n dimensions et leurs marges. Publ Inst Stat Univ Paris 8:229–231
Trivedi PK, Zimmer DM (2005) Copula modeling: an introduction for practitioners. Foundations and trends in econometrics, vol 1. Boston, Now Publisher Inc, pp 1–111
Wang Y, Wan W, Wang RS, Feng E (2009) Model, properties and imputation method of missing snp genotype data utilizing mutual information. J Comput Appl Math 229:168–174
Zimmer DM, Trivedi PK (2006) Using trivariate copulas to model sample selection and treatment effects: application to family health care demand. J Bus Econ Stat 24:63–76
Acknowledgments
The authors wish to thank Paola Monari (University of Bologna, Italy) and Antonia Manzari (Italian Statistical Institute, ISTAT) for their support and useful discussions. The first author acknowledges the support of Free University of Bozen-Bolzano, School of Economics and Management via the project “Multivariate analysis techniques based on copula function”.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Di Lascio, F.M.L., Giannerini, S. & Reale, A. Exploring copulas for the imputation of complex dependent data. Stat Methods Appl 24, 159–175 (2015). https://doi.org/10.1007/s10260-014-0287-2
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10260-014-0287-2