Abstract
This work advances an imputation procedure for categorical scales which relays on the results of Latent Class Analysis and Multiple Imputation Analysis. The procedure allows us to use the information stored in the joint multivariate structure of the data set and to take into account the uncertainty related to the true unobserved values. The accuracy of the results is validated in the Item Response Models framework by assessing the accuracy in estimation of key parameters in a data set in which observations are simulated Missing at Random. The sensitivity of the multiple imputation methods is assessed with respect to the following factors: the number of latent classes set up in the Latent Class Model and the rate of missing observations in each variable. The relative accuracy in estimation is assessed with respect to the Multiple Imputation By Chained Equation missing data handling method for categorical variables.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Linzer, D. A., & Lewis, J. B. (2011). poLCA: An R package for polytomous variable latent class analysis. Journal of Statistical Software, 42(10), 1–29. http://www.jstatsoft.org/v42/i10/.
Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data (2nd edn.). New York: Wiley.
Rizopoulos, D. (2006). ltm: latent trait models under IRT. R package version 0.5–0.
Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley.
Schafer, J. (1997). Analysis of incomplete multivariate data. Boca Raton, FL: Chapman and Hall.
Sulis, I., & Porcu, M. (2008). Assessing the effectiveness of a stochastic regression imputation method for ordered categorical data. Working paper. Quaderni di Ricerca CRENoS, 4. http://crenos.unica.it/crenos/it/node/269.
van Buuren, S., & Groothuis-Oudshoorn, K. (2011). Mice: Multivariate imputation by chained equations in R. Journal of Statistical Software, 45(3), 1–67. http://www.jstatsoft.org/v45/i03/.
Vermunt, J. K., Van Ginkel, J. R., Van der Ark, L. A., & Sijtsma, K. (2008). Multiple imputation of categorical data using latent class analysis. Sociological Methodology, 33, 269–297.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix: miLCApol Function Written in the R Language
Appendix: miLCApol Function Written in the R Language
Description: Function to implement the MILCA procedureUse: miLCApol(item, m, K, cl, rep, fs)Arguments:
-
item: A data frame containing the J categorical variables (the same specified in fs formula) all measured on a categorical scale with K − 1 categories. The categorical variables in item must be coded with consecutive values from 1 to K − 1. All missing values should be coded with NA (see poLCA manual Linzer and Lewis (2011) for details)
-
fs: A formula expression which uses as responses the items contained in the data frame item e.g. \(\mathit{fs} <-\mathit{cbind}(Y _{1},\ldots ,Y _{J}) \sim 1\) (see poLCA manual Linzer and Lewis (2011) for details )
-
m: The number of M randomly imputed data sets
-
K: The number of categories of the items plus 1
-
class: The number of latent classes (see poLCA manual Linzer and Lewis (2011) for details)
-
rep: The number of times the poLCA procedure has to be iterated in order to avoid local maxima (see poLCA manual)
Function
miLCApol<-function(m,K, cl, rep, fs, item){
replacemiss<-function(item){
itemp<-matrix(NA,nrow(item), ncol(item))
for(i in 1:ncol(item)){
itemp[,i]<-ifelse(is.na(item[,i]),K,item[,i]) }
return(itemp) }
itempr<-replacemiss(item)
library(poLCA)
itempr<-as.data.frame(itempr)
dimnames(itempr)<-dimnames(item)
##see poLCA manual to specify further options in poLCA
msim<-poLCA(fs,nclass=cl, itempr, nrep=rep ,na.rm=FALSE)
pr<-msim$probs
classm<-msim$predclass
n<-nrow(itempr)
R<-length(table(classm))
J<-ncol(itempr)
p<-array(NA,c(J,K, R))
for(r in 1:R){
for(j in 1:J){
p[j,,r]<-pr[[j]][r,] }}
impm<-array(NA, c(n,J,m))
for(t in 1:m){
for(i in 1:n){
r<-classm[i]
for(j in 1:J){impm[i,j,t]<- if(itempr[i,j]==K){
cate<-rmultinom(1, 1, p[j,,r])
for(k in 1:K){
cate[k]<-ifelse(cate[k]==1, k, cate[k])}
label<-sum(cate)
while(label>K-1){ cate<-rmultinom(1, 1, p[j,,r])
for(k in 1:K){
cate[k]<-ifelse(cate[k]==1, k, cate[k])}
label<-sum(cate) }
label }
else(itempr[i,j])}}}
return(impm) }
Rights and permissions
Copyright information
© 2013 Springer International Publishing Switzerland
About this paper
Cite this paper
Sulis, I. (2013). A Further Proposal to Perform Multiple Imputation on a Bunch of Polytomous Items Based on Latent Class Analysis. In: Giudici, P., Ingrassia, S., Vichi, M. (eds) Statistical Models for Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Heidelberg. https://doi.org/10.1007/978-3-319-00032-9_41
Download citation
DOI: https://doi.org/10.1007/978-3-319-00032-9_41
Published:
Publisher Name: Springer, Heidelberg
Print ISBN: 978-3-319-00031-2
Online ISBN: 978-3-319-00032-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)