Missing data mechanisms and their implications on the analysis of categorical data

Poleto, Frederico Z.; Singer, Julio M.; Paulino, Carlos Daniel

doi:10.1007/s11222-009-9143-x

Missing data mechanisms and their implications on the analysis of categorical data

Published: 25 July 2009

Volume 21, pages 31–43, (2011)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

Frederico Z. Poleto¹,
Julio M. Singer¹ &
Carlos Daniel Paulino²

418 Accesses
13 Citations
Explore all metrics

Abstract

We review some issues related to the implications of different missing data mechanisms on statistical inference for contingency tables and consider simulation studies to compare the results obtained under such models to those where the units with missing data are disregarded. We confirm that although, in general, analyses under the correct missing at random and missing completely at random models are more efficient even for small sample sizes, there are exceptions where they may not improve the results obtained by ignoring the partially classified data. We show that under the missing not at random (MNAR) model, estimates on the boundary of the parameter space as well as lack of identifiability of the parameters of saturated models may be associated with undesirable asymptotic properties of maximum likelihood estimators and likelihood ratio tests; even in standard cases the bias of the estimators may be low only for very large samples. We also show that the probability of a boundary solution obtained under the correct MNAR model may be large even for large samples and that, consequently, we may not always conclude that a MNAR model is misspecified because the estimate is on the boundary of the parameter space.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Ignoring Non-ignorable Missingness

Article Open access 20 December 2022

Sophia Rabe-Hesketh & Anders Skrondal

Missing Data Theory

Markov Chain Monte-Carlo Methods for Missing Data Under Ignorability Assumptions

References

Baker, S.G., Laird, N.M.: Regression analysis for categorical variables with outcome subject to nonignorable nonresponse. J. Am. Stat. Assoc. 83, 62–69 (1988) (p. 1232, correction)
Article MathSciNet Google Scholar
Baker, S.G., Rosenberger, W.F., DerSimonian, R.: Closed-form estimates for missing counts in two-way contingency tables. Stat. Med. 11, 643–657 (1992)
Article Google Scholar
Brown, C.H.: Protecting against nonrandomly missing data in longitudinal studies. Biometrics 46, 143–156 (1990)
Article MATH Google Scholar
Chen, T.T., Fienberg, S.E.: Two-dimensional contingency tables with both completely and partially cross-classified data. Biometrics 30, 629–642 (1974)
Article MATH MathSciNet Google Scholar
Clarke, P.S.: On boundary solutions and identifiability in categorical regression with non-ignorable non-response. Biom. J. 44, 701–717 (2002)
Article MathSciNet Google Scholar
Clarke, P.S., Smith, P.W.F.: Interval estimation for log-linear models with one variable subject to non-ignorable non-response. J. R. Stat. Soc. B 66, 357–368 (2004)
Article MATH MathSciNet Google Scholar
Clarke, P.S., Smith, P.W.F.: On maximum likelihood estimation for log-linear models with non-ignorable non-response. Stat. Probab. Lett. 73, 441–448 (2005)
Article MATH MathSciNet Google Scholar
Cook, R.D.: Assessment of local influence. J. R. Stat. Soc. B 48, 133–169 (1986)
MATH Google Scholar
Fay, R.E.: Causal models for patterns of nonresponse. J. Am. Stat. Assoc. 81, 354–365 (1986)
Article Google Scholar
Fitzmaurice, G.M., Laird, N.M., Zahner, G.E.P.: Multivariate logistic models for incomplete binary responses. J. Am. Stat. Assoc. 91, 99–108 (1996)
Article MATH Google Scholar
Forster, J.J., Smith, P.W.F.: Model-based inference for categorical survey data subject to non-ignorable non-response (with discussion). J. R. Stat. Soc. B 60, 57–79,89–102 (1998)
Article MATH MathSciNet Google Scholar
Glonek, G.F.V.: On identifiability in models for incomplete binary data. Stat. Probab. Lett. 41, 191–197 (1999)
Article MATH MathSciNet Google Scholar
Gustafson, P.: On model expansion, model contraction, identifiability and prior information: two illustrative scenarios involving mismeasured variables (with discussion). Stat. Sci. 20, 111–140 (2005)
Article MATH MathSciNet Google Scholar
Jansen, I., Hens, N., Molenberghs, G., Aerts, M., Verbeke, G., Kenward, M.G.: The nature of sensitivity in monotone missing not at random models. Comput. Stat. Data Anal. 50, 830–858 (2006)
Article MATH MathSciNet Google Scholar
Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data, 2nd ed. Wiley, New York (2002)
MATH Google Scholar
Michiels, B., Molenberghs, G.: Protective estimation of longitudinal categorical data with nonrandom dropout. Commun. Stat. Theory Methods 26, 65–94 (1997)
Article MATH Google Scholar
Molenberghs, G., Kenward, M.G.: Missing data in clinical studies. Wiley, New York (2007)
Book Google Scholar
Molenberghs, G., Goetghebeur, E., Lipsitz, S.R., Kenward, M.G.: Nonrandom missingness in categorical data: strengths and limitations. Am. Stat. 53, 110–118 (1999)
Article Google Scholar
Molenberghs, G., Kenward, M.G., Goetghebeur, E.: Sensitivity analysis for incomplete contingency tables: the Slovenian plebiscite case. Appl. Stat. 50, 15–29 (2001)
MATH Google Scholar
Molenberghs, G., Beunckens, C., Sotto, C., Kenward, M.G.: Every missingness not at random model has a missingness at random counterpart with equal fit. J. R. Stat. Soc. B 70, 371–388 (2008)
Article MATH MathSciNet Google Scholar
Murray, G.D., Findlay, J.G.: Correcting for the bias caused by drop-outs in hypertension trials. Stat. Med. 7, 941–946 (1988)
Article Google Scholar
Park, T.: An approach to categorical data with nonignorable nonresponse. Biometrics 54, 1579–1590 (1998)
Article MATH Google Scholar
Park, T., Brown, M.B.: Models for categorical data with nonignorable nonresponse. J. Am. Stat. Assoc. 89, 44–52 (1994)
Article Google Scholar
Paulino, C.D.: Analysis of incomplete categorical data: a survey of the conditional maximum likelihood and weighted least squares approaches. Braz. J. Probab. Stat. 5, 1–42 (1991)
MATH MathSciNet Google Scholar
Paulino, C.D., Pereira, C.A.B.: Bayesian methods for categorical data under informative general censoring. Biometrika 82, 439–446 (1995)
Article MATH Google Scholar
Rotnitzky, A., Cox, D.R., Bottai, M., Robins, J.M.: Likelihood-based inference with singular information matrix. Bernoulli 6, 243–284 (2000)
Article MATH MathSciNet Google Scholar
Rubin, D.B.: Characterizing the estimation of parameters in incomplete-data problems. J. Am. Stat. Assoc. 69, 467–474 (1974)
Article MATH Google Scholar
Rubin, D.B.: Inference and missing data. Biometrika 63, 581–592 (1976)
Article MATH MathSciNet Google Scholar
Rubin, D.B.: Multiple Imputation for Nonresponse in Surveys. Wiley, New York (1987)
Book Google Scholar
Schafer, J.L.: Analysis of Incomplete Multivariate Data. Chapman & Hall/CRC, Boca Raton (1997)
Book MATH Google Scholar
Smith, P.W.F., Skinner, C.J., Clarke, P.S.: Allowing for non-ignorable non-response in the analysis of voting intention data. Appl. Stat. 48, 563–577 (1999)
MATH Google Scholar
Soares, P., Paulino, C.D.: Incomplete categorical data analysis: a Bayesian perspective. J. Stat. Comput. Simul. 69, 157–170 (2001)
Article MATH MathSciNet Google Scholar
Soares, P., Paulino, C.D.: Log-linear models for coarse categorical data. In: Gomes, M.I., Pestana, D., Silva, P. (eds.) Proc. 56th Session of the Internat. Statist. Inst., Invited Paper Meeting #15 (Bayesian Theory and Practice), LVI Bulletin of the Internat. Statist. Inst., Lisbon (2007)
Vansteelandt, S., Goetghebeur, E., Kenward, M.G., Molenberghs, G.: Ignorance and uncertainty regions as inferential tools in a sensitivity analysis. Stat. Sin. 16, 953–979 (2006)
MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Departamento de Estatística, Instituto de Matemática e Estatística, Universidade de São Paulo, Caixa Postal 66281, São Paulo, SP, 05314-970, Brazil
Frederico Z. Poleto & Julio M. Singer
Departamento de Matemática, Instituto Superior Técnico, Universidade Técnica de Lisboa (and CEAUL-FCUL), Av. Rovisco Pais, 1049-001, Lisboa, Portugal
Carlos Daniel Paulino

Authors

Frederico Z. Poleto
View author publications
You can also search for this author in PubMed Google Scholar
Julio M. Singer
View author publications
You can also search for this author in PubMed Google Scholar
Carlos Daniel Paulino
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Julio M. Singer.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Poleto, F.Z., Singer, J.M. & Paulino, C.D. Missing data mechanisms and their implications on the analysis of categorical data. Stat Comput 21, 31–43 (2011). https://doi.org/10.1007/s11222-009-9143-x

Download citation

Received: 09 November 2008
Accepted: 08 July 2009
Published: 25 July 2009
Issue Date: January 2011
DOI: https://doi.org/10.1007/s11222-009-9143-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Missing data mechanisms and their implications on the analysis of categorical data

Abstract

Access this article

Similar content being viewed by others

Ignoring Non-ignorable Missingness

Missing Data Theory

Markov Chain Monte-Carlo Methods for Missing Data Under Ignorability Assumptions

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Missing data mechanisms and their implications on the analysis of categorical data

Abstract

Access this article

Similar content being viewed by others

Ignoring Non-ignorable Missingness

Missing Data Theory

Markov Chain Monte-Carlo Methods for Missing Data Under Ignorability Assumptions

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation