Skip to main content

A Further Proposal to Perform Multiple Imputation on a Bunch of Polytomous Items Based on Latent Class Analysis

  • Conference paper
  • First Online:
Book cover Statistical Models for Data Analysis

Abstract

This work advances an imputation procedure for categorical scales which relays on the results of Latent Class Analysis and Multiple Imputation Analysis. The procedure allows us to use the information stored in the joint multivariate structure of the data set and to take into account the uncertainty related to the true unobserved values. The accuracy of the results is validated in the Item Response Models framework by assessing the accuracy in estimation of key parameters in a data set in which observations are simulated Missing at Random. The sensitivity of the multiple imputation methods is assessed with respect to the following factors: the number of latent classes set up in the Latent Class Model and the rate of missing observations in each variable. The relative accuracy in estimation is assessed with respect to the Multiple Imputation By Chained Equation missing data handling method for categorical variables.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Linzer, D. A., & Lewis, J. B. (2011). poLCA: An R package for polytomous variable latent class analysis. Journal of Statistical Software, 42(10), 1–29. http://www.jstatsoft.org/v42/i10/.

    Google Scholar 

  • Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data (2nd edn.). New York: Wiley.

    MATH  Google Scholar 

  • Rizopoulos, D. (2006). ltm: latent trait models under IRT. R package version 0.5–0.

    Google Scholar 

  • Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley.

    Book  Google Scholar 

  • Schafer, J. (1997). Analysis of incomplete multivariate data. Boca Raton, FL: Chapman and Hall.

    Book  MATH  Google Scholar 

  • Sulis, I., & Porcu, M. (2008). Assessing the effectiveness of a stochastic regression imputation method for ordered categorical data. Working paper. Quaderni di Ricerca CRENoS, 4. http://crenos.unica.it/crenos/it/node/269.

  • van Buuren, S., & Groothuis-Oudshoorn, K. (2011). Mice: Multivariate imputation by chained equations in R. Journal of Statistical Software, 45(3), 1–67. http://www.jstatsoft.org/v45/i03/.

    Google Scholar 

  • Vermunt, J. K., Van Ginkel, J. R., Van der Ark, L. A., & Sijtsma, K. (2008). Multiple imputation of categorical data using latent class analysis. Sociological Methodology, 33, 269–297.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Isabella Sulis .

Editor information

Editors and Affiliations

Appendix: miLCApol Function Written in the R Language

Appendix: miLCApol Function Written in the R Language

Description: Function to implement the MILCA procedureUse: miLCApol(item, m, K, cl, rep, fs)Arguments:

  • item: A data frame containing the J categorical variables (the same specified in fs formula) all measured on a categorical scale with K − 1 categories. The categorical variables in item must be coded with consecutive values from 1 to K − 1. All missing values should be coded with NA (see poLCA manual Linzer and Lewis (2011) for details)

  • fs: A formula expression which uses as responses the items contained in the data frame item e.g. \(\mathit{fs} <-\mathit{cbind}(Y _{1},\ldots ,Y _{J}) \sim 1\) (see poLCA manual Linzer and Lewis (2011) for details )

  • m: The number of M randomly imputed data sets

  • K: The number of categories of the items plus 1

  • class: The number of latent classes (see poLCA manual Linzer and Lewis (2011) for details)

  • rep: The number of times the poLCA procedure has to be iterated in order to avoid local maxima (see poLCA manual)

Function

miLCApol<-function(m,K, cl, rep, fs, item){

 replacemiss<-function(item){

itemp<-matrix(NA,nrow(item), ncol(item))

for(i in 1:ncol(item)){

itemp[,i]<-ifelse(is.na(item[,i]),K,item[,i])   }

return(itemp)   }

 itempr<-replacemiss(item)

library(poLCA)

itempr<-as.data.frame(itempr)

dimnames(itempr)<-dimnames(item)

##see poLCA manual to specify further options  in poLCA

msim<-poLCA(fs,nclass=cl, itempr, nrep=rep ,na.rm=FALSE)

pr<-msim$probs

classm<-msim$predclass

n<-nrow(itempr)

R<-length(table(classm))

J<-ncol(itempr)

p<-array(NA,c(J,K, R))

for(r in 1:R){

for(j in 1:J){

p[j,,r]<-pr[[j]][r,] }}

 impm<-array(NA, c(n,J,m))

 for(t in 1:m){

 for(i in 1:n){

 r<-classm[i]

 for(j in 1:J){impm[i,j,t]<- if(itempr[i,j]==K){

 cate<-rmultinom(1, 1, p[j,,r])

 for(k in 1:K){

 cate[k]<-ifelse(cate[k]==1, k, cate[k])}

 label<-sum(cate)

 while(label>K-1){  cate<-rmultinom(1, 1, p[j,,r])

 for(k in 1:K){

 cate[k]<-ifelse(cate[k]==1, k, cate[k])}

 label<-sum(cate) }

 label }

 else(itempr[i,j])}}}

 return(impm) }

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer International Publishing Switzerland

About this paper

Cite this paper

Sulis, I. (2013). A Further Proposal to Perform Multiple Imputation on a Bunch of Polytomous Items Based on Latent Class Analysis. In: Giudici, P., Ingrassia, S., Vichi, M. (eds) Statistical Models for Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Heidelberg. https://doi.org/10.1007/978-3-319-00032-9_41

Download citation

Publish with us

Policies and ethics