Donor Limited Hot Deck Imputation: A Constrained Optimization Problem

Joenssen, Dieter William

doi:10.1007/978-3-662-44983-7_28

Dieter William Joenssen²¹

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

2898 Accesses
1 Citations

Abstract

Hot deck methods impute missing data by matching records that are complete to those that are missing values. Observations absent within the recipient are then replaced by replicating the values from the matched donor. Some hot deck procedures constrain the frequency with which any donor may be matched to increase the precision of post-imputation parameter-estimates. This constraint, called a donor limit, also mitigates risks of exclusively using one donor for all imputations or using one donor with an extreme value or values “too often.” Despite these desirable properties, imputation results of a donor limited hot deck are dependent on the recipients’ order of imputation, an undesirable property. For nearest neighbor type hot deck procedures, the implementation of a constraint on donor usage causes the stepwise matching between each recipient and its closest donor to no longer minimize the sum of all donor–recipient distances. Thus, imputation results may further be improved by procedures that minimize the total donor–recipient distance-sum. The discrete optimization problem is formulated and a simulation detailing possible improvements when solving this integer program is presented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Andridge, R. R., & Little, R. J. A. (2010). A review of hot deck imputation for survey nonresponse. International Statistical Review, 78, 40–64.
Article Google Scholar
Bankhofer, U., & Joenssen, D. W. (2014). On limiting donor usage for imputation of missing data via hot deck methods. In M. Spiliopoulou, L. Schmidt–Thieme, & R. Jannings (Eds.), Data analysis, machine learning and knowledge discovery (pp. 3–11). Berlin: Springer.
Google Scholar
Collins, L., Schafer, J., & Kam, C. (2001). A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychological Methods, 6, 330–351.
Article Google Scholar
Domschke, W. (1995). Logistik: Transport. München: Oldenbourg.
Google Scholar
Enders, C. K. (2010). Applied missing data analysis. New York: Guilford.
Google Scholar
Ford B. (1983). An overview of hot-deck procedures. In W. Madow, H. Nisselson, & I. Olkin (Eds.), Incomplete data in sample surveys (pp. 185–207). New York: Academic Press.
Google Scholar
Genz, A., Bretz, F., Miwa, T., Mi, X., Leisch, F., Scheipl, F., et al. (2013). mvtnorm: Multivariate normal and distributions. R package version 0.9-9995. http://CRAN.R-project.org/package=mvtnorm.
Joenssen, D. W. (2013). HotDeckImputation: Hot deck imputation methods for missing data. R package version 0.1.0. http://CRAN.R-project.org/package=HotDeckImputation.
Kalton, G., & Kish, L. (1984). Some efficient random imputation methods. Communications in Statistics Theory and Methods, 13, 1919–1939.
Article Google Scholar
Kovar, J. G., & Whitridge, J. (1995). Imputation of business survey data. In B. G. Cox, D. A. Binder, B. N. Chinnappa, A. Christianson, M. J. Colledge, & P. S. Kott (Eds.), Business survey methods (pp. 403–423). New York: Wiley.
Google Scholar
Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data. Hoboken: Wiley.
Book MATH Google Scholar
R Core Team. (2013). R: A language and environment for statistical computing. R Vienna: Foundation for Statistical Computing. http://www.R-project.org/
Reinfeld, N. V., & Vogel, W. R. (1958). Mathematical programming. New Jersey: Prentice-Hall.
Google Scholar
Rubin, D. B. (1976). Inference and missing data (with discussion). Biometrika, 63, 581–592.
Article MATH MathSciNet Google Scholar
Sande I. (1983). Hot-deck imputation procedures. In W. Madow, H. Nisselson, & I. Olkin (Eds.), Incomplete data in sample surveys (pp. 339–349). New York: Academic Press.
Google Scholar
Schafer, J., & Graham, J. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7, 147–177.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Ilmenau University of Technology, Helmholtzplatz 3, 98693, Ilmenau, Germany
Dieter William Joenssen

Authors

Dieter William Joenssen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dieter William Joenssen .

Editor information

Editors and Affiliations

University of Essex, Colchester, United Kingdom
Berthold Lausen
University of Luxembourg, Walferdange, Luxembourg
Sabine Krolak-Schwerdt
University of Luxembourg, Walferdange, Luxembourg
Matthias Böhmer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Joenssen, D.W. (2015). Donor Limited Hot Deck Imputation: A Constrained Optimization Problem. In: Lausen, B., Krolak-Schwerdt, S., Böhmer, M. (eds) Data Science, Learning by Latent Structures, and Knowledge Discovery. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44983-7_28

Download citation

DOI: https://doi.org/10.1007/978-3-662-44983-7_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-44982-0
Online ISBN: 978-3-662-44983-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics