Statistical inference in constrained latent class models for multinomial data based on $$\phi $$ -divergence measures

Felipe, A.; Martín, N.; Miranda, P.; Pardo, L.

doi:10.1007/s11634-017-0289-7

Statistical inference in constrained latent class models for multinomial data based on $\phi $-divergence measures

Regular Article
Published: 04 July 2017

Volume 12, pages 605–636, (2018)
Cite this article

Advances in Data Analysis and Classification Aims and scope Submit manuscript

A. Felipe¹,
N. Martín²,
P. Miranda¹ &
…
L. Pardo¹

293 Accesses
1 Citation
Explore all metrics

Abstract

In this paper we explore the possibilities of applying $\phi $-divergence measures in inferential problems in the field of latent class models (LCMs) for multinomial data. We first treat the problem of estimating the model parameters. As explained below, minimum $\phi $-divergence estimators (M$\phi $Es) considered in this paper are a natural extension of the maximum likelihood estimator (MLE), the usual estimator for this problem; we study the asymptotic properties of M$\phi $Es, showing that they share the same asymptotic distribution as the MLE. To compare the efficiency of the M$\phi $Es when the sample size is not big enough to apply the asymptotic results, we have carried out an extensive simulation study; from this study, we conclude that there are estimators in this family that are competitive with the MLE. Next, we deal with the problem of testing whether a LCM for multinomial data fits a data set; again, $\phi $-divergence measures can be used to generate a family of test statistics generalizing both the classical likelihood ratio test and the chi-squared test statistics. Finally, we treat the problem of choosing the best model out of a sequence of nested LCMs; as before, $\phi $-divergence measures can handle the problem and we derive a family of $\phi $-divergence test statistics based on them; we study the asymptotic behavior of these test statistics, showing that it is the same as the classical test statistics. A simulation study for small and moderate sample sizes shows that there are some test statistics in the family that can compete with the classical likelihood ratio and the chi-squared test statistics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Minimum $$\phi $$ -Divergence Estimation in Constrained Latent Class Models for Binary Data

Article 28 February 2015

Phi-divergence Test Statistics Applied to Latent Class Models for Binary Data

Equivalence Tests for Multinomial Data Based on $$\phi $$ -Divergences

Notes

In Formann (1992) it is proposed the use of EM algorithm developed in Dempster et al. (1977) when dealing with MLE; however, we have preferred to use our own algorithm in order to compare the different estimators. Our Fortran 95 algorithm can be found at http://www.sites.google.com/site/nirianmartinswebsite/software.

References

Abar B, Loken E (2010) Self-regulated learning and self-directed study in a pre-college sample. Learn Individ Differ 20:25–29
Article Google Scholar
Agresti A (1996) An introduction to categorical data analysis. Wiley, New York
MATH Google Scholar
Bartolucci F, Forcina A (2002) Extended RC association models allowing for order restrictions and marginal modelling. J Am Math Assoc 97(460):1192–1199
MATH Google Scholar
Berkson J (1980) Minimum chi-square, not maximum likelihood! Ann Stat 8(3):457–487
Article MathSciNet Google Scholar
Biemer P (2011) Latent class analysis and survey error. Wiley, New York
MATH Google Scholar
Birch M (1964) A new proof of the Pearson-Fisher theorem. Ann Math Stat 35:817–824
Article MathSciNet Google Scholar
Bryant F, Satorra A (2012) Principles and practice of scaled difference chi-square testing. Struct Equ Model 19:372–398
Article MathSciNet Google Scholar
Clogg C (1988) Latent class models for measuring. In: Latent trait and class models. Plenum, New York, pp 173–205
Chapter Google Scholar
Collins L, Lanza S (2010) Latent class and latent transition analysis for the social, behavioral, and health sciences. Wiley, New York
Google Scholar
Cressie N, Pardo L (2000) Minimum phi-divergence estimator and hierarchical testing in loglinear models. Stat Sin 10:867–884
MATH Google Scholar
Cressie N, Pardo L (2002) Phi-divergence statistics. In: Elshaarawi A, Plegorich W (eds) Encyclopedia of environmetrics, vol 13. Wiley, New York, pp 1551–1555
Google Scholar
Cressie N, Read T (1984) Multinomial goodness-of-fit tests. J R Stat Soc Ser B 8:440–464
MathSciNet MATH Google Scholar
Cressie N, Pardo L, Pardo M (2003) Size and power considerations for testing loglinear models using $\phi $-divergence test statistics. Stat Sin 13(2):555–570
MathSciNet MATH Google Scholar
Csiszár I (1967) Information-type measures of difference of probability distributions and indirect observations. Stud Sci Math Hung 2:299–318
MathSciNet MATH Google Scholar
Dale J (1986) Asymptotic normality of goodness-of-fit statistics for sparse product multinomials. J R Stat Soc Ser B 41:48–59
MathSciNet MATH Google Scholar
Dempster A, Laird N, Rubin D (1977) Maximum likelihood for incomplete data via the EM algorithm. J R Stat Soc Ser B 39:1–38
MathSciNet MATH Google Scholar
Feldman B, Masyn K, Conger R (2009) New approaches to studying behaviors: a comparison of methods for modelling longitudinal, categorical and adolescent drinking data. Dev Psycol 45(3):652–676
Article Google Scholar
Felipe A, Miranda P, Martín N, Pardo L (2014) Phi-divergence test statistics for testing the validity of latent class models for binary data. arXiv:1407.2165
Felipe A, Miranda P, Pardo L (2015) Minimum $\phi $-divergence estimation in constrained latent class models for binary data. Psychometrika 80(4):1020–1042
Article MathSciNet Google Scholar
Formann A (1982) Linear logistic latent class analysis. Biom J 24:171–190
Article Google Scholar
Formann A (1985) Constrained latent class models: theory and applications. Br J Math Stat Psycol 38:87–111
Article MathSciNet Google Scholar
Formann A (1992) Linear logistic latent class analysis for polytomous data. J Am Stat Assoc 87:476–486
Article Google Scholar
Genge E (2014) A latent class analysis of the public attitude towards the euro adoption in Poland. Adv Data Anal Classif 8(4):427–442
Article MathSciNet Google Scholar
Gnaldi M, Bacci S, Bartolucci F (2016) A multilevel finite mixture item response model to cluster examinees and schools. Adv Data Anal Classif 10(1):53–70
Article MathSciNet Google Scholar
Goodman L (1974) Exploratory latent structure analysis using Goth identifiable and unidentifiable models. Biometrika 61:215–231
Article MathSciNet Google Scholar
Goodman L (1979) Simple models for the analysis of association in cross-classifications having ordered categories. J Am Stat Assoc 74:537–552
Article MathSciNet Google Scholar
Hagenaars JA, Cutcheon A (2002) Applied latent class analysis. Cambridge University Press, Cambridge
Book Google Scholar
Laska M, Pash K, Lust K, Story M, Ehlinger E (2009) Latent class analysis of lifestyle characteristics and health risk behaviors among college youth. Prev Sci 10:376–386
Article Google Scholar
Lazarsfeld P (1950) The logical and mathematical foundation of latent structure analysis. Studies in social psycology in World War II, vol IV. Princeton University Press, Princeton, pp 362–412
Martin N, Mata R, Pardo L (2015) Comparing two treatments in terms of the likelihood ratio order. J Stat Comput Simul 85(17):3512–3534
Article MathSciNet Google Scholar
Moon K, Hero A (2014) Multivariate $f$-divergence estimation with confidence. In: Advances in neural information processing systems, pp 2420–2428
Morales D, Pardo L, Vajda I (1995) Asymptotic divergence of estimators of discrete distributions. J Stat Plan Inference 48:347–369
Article Google Scholar
Oberski DL (2016) Beyond the number of classes: separating substantive from non-substantive dependence in latent class analysis. Adv Data Anal Classif 10(2):171–182
Article MathSciNet Google Scholar
Pardo L (2006) Statistical inference based on divergence measures. Chapman and Hall CRC, Boca Raton
MATH Google Scholar
Satorra A, Bentler P (2010) Ensuring positiveness of the scaled difference chi-square test statistic. Psychometrika 75(2):243–248
Article MathSciNet Google Scholar
Uebersax J (2009) Latent structure analysis. Web document: http://www.john-uebersax.com/stat/index.htm

Download references

Acknowledgements

We are very grateful to the associate editor as well as the anonymous referees for fruitful comments and remarks that have improved the final version of the paper.

Author information

Authors and Affiliations

Department of Statistics and O.R., Complutense University of Madrid, Madrid, Spain
A. Felipe, P. Miranda & L. Pardo
Department of Statistics and O.R. II: Decision Methods, Complutense University of Madrid, Madrid, Spain
N. Martín

Authors

A. Felipe
View author publications
You can also search for this author in PubMed Google Scholar
N. Martín
View author publications
You can also search for this author in PubMed Google Scholar
P. Miranda
View author publications
You can also search for this author in PubMed Google Scholar
L. Pardo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to L. Pardo.

Additional information

This paper was supported by the Spanish Grant MTM-2012-33740 and MTM-2015-67057.

Appendix

1.1 Proof of Theorem 1

We denote by $l^{g}$ the interior of the g-dimensional unit cube, where ${g:=\prod _{i=1}^{k}g_{i}}$. The interior of $\varDelta _{g}$ defined in (10) is contained in $l^{g}$. Let $W_{(\varvec{\lambda }_{0},\varvec{\eta }_{0})}$ be a neighborhood of $(\varvec{\lambda } _{0},\varvec{\eta }_{0})$, the true value of the unknown parameter $(\varvec{\lambda },\varvec{\eta })$, on which

$$\begin{aligned} \varvec{p}:\Theta&\rightarrow \varDelta _{g}\\ (\varvec{\lambda }^{T},\varvec{\eta }^{T})^{T}&\mapsto \varvec{p}^{T}(\varvec{\lambda },\varvec{\eta }):=(p_{1} (\varvec{\lambda },\varvec{\eta }),{\ldots },p_{g}(\varvec{\lambda },\varvec{\eta }))^{T} \end{aligned}$$

has continuous second partial derivatives. Consider the application

$$\begin{aligned} \varvec{F}:=(F_{1},{\ldots },F_{t+u})^{T}:l^{g}\times W_{(\varvec{\lambda }_{0},\varvec{\eta }_{0})}\rightarrow \mathbb {R}^{t+u}, \end{aligned}$$

whose components $F_{j},\,j=1,{\ldots },t+u$ are defined by

$$\begin{aligned} F_{j}(\varvec{p};(\varvec{\lambda },\varvec{\eta })):={\frac{\partial D_{\phi }(\varvec{p},\varvec{p}(\varvec{\lambda },\varvec{\eta }))}{\partial \theta _{j}}},\,j=1,{\ldots },t+u, \end{aligned}$$

where $\theta _{j}$ is either $\lambda _{j}$ if $j\le t$ or $\eta _{j-t}$ if $j>t$ and $\varvec{p}$ is a g-dimensional probability vector.

Then $F_{j},\,j=1,{\ldots },t+u$ vanishes at $(\varvec{p}(\varvec{\lambda }_{0},\varvec{\eta }_{0}),(\varvec{\lambda }_{0},\varvec{\eta } _{0}))$. Since

$$\begin{aligned} {\frac{\partial ^{2}D_{\phi }(\varvec{p},\varvec{p}(\varvec{\lambda },\varvec{\eta }))}{\partial \theta _{r}\partial \theta _{j}}}=\sum _{\nu =1} ^{g}\phi ^{\prime \prime }\left( {\frac{\tilde{p}_{\nu }}{p_{\nu } (\varvec{\lambda },\varvec{\eta })}}\right) {\frac{\tilde{p}_{\nu } }{p_{\nu }(\varvec{\lambda },\varvec{\eta })^{2}}}{\frac{\partial p_{\nu }(\varvec{\lambda },\varvec{\eta })}{\partial \theta _{r}}}{\frac{\partial p_{\nu }(\varvec{\lambda },\varvec{\eta })}{\partial \theta _{j}}} {\frac{\tilde{p}_{\nu }}{p_{\nu }(\varvec{\lambda },\varvec{\eta })},} \end{aligned}$$

the $(t+u)\times (t+u)$ matrix $\varvec{J}_{\varvec{F}} (\varvec{\theta }_{0})$ associated with function $\varvec{F}$ at point $(\varvec{p}(\varvec{\lambda }_{0},\varvec{\eta }_{0} ),(\varvec{\lambda }_{0},\varvec{\eta }_{0}))$ is given by

$$\begin{aligned} {\frac{\partial \varvec{F}}{\partial (\varvec{\lambda }_{0} ,\varvec{\eta }_{0})}}&=\left. {\frac{\partial \varvec{F}}{\partial (\varvec{\lambda },\varvec{\eta })}}\right| _{(\varvec{p},(\varvec{\lambda },\varvec{\eta }))=(\varvec{p} (\varvec{\lambda }_{0},\varvec{\eta }_{0}),(\varvec{\lambda } _{0},\varvec{\eta }_{0}))}\\&=\phi ^{\prime \prime }(1)\left( \sum _{l=1}^{g}{\frac{1}{p_{l} (\varvec{\lambda }_{0},\varvec{\eta }_{0})}}{\frac{\partial p_{l}(\varvec{\lambda }_{0},\varvec{\eta }_{0})}{\partial \theta _{r}} }{\frac{\partial p_{l}(\varvec{\lambda }_{0},\varvec{\eta }_{0} )}{\partial \theta _{j}}}\right) _{\overset{j=1,{\ldots },t+u}{r=1,{\ldots },t+u}}. \end{aligned}$$

Next, it is a simple algebra exercise to prove that $\varvec{J} _{\varvec{F}}(\varvec{\theta }_{0})$ is nonsingular. As $\varvec{J} _{\varvec{F}}(\varvec{\theta }_{0})=\varvec{A}(\varvec{\lambda }_{0},\varvec{\eta }_{0})^{T}\varvec{A}(\varvec{\lambda } _{0},\varvec{\eta }_{0})\phi ^{\prime \prime }(1)$, we conclude that this matrix is nonsingular at point $(\varvec{p}(\varvec{\lambda } _{0},\varvec{\eta }_{0}),(\varvec{\lambda }_{0},\varvec{\eta }_{0}))$.

Applying the Implicit Function Theorem, there exists a neighborhood U of $(\varvec{p}(\varvec{\lambda }_{0},\varvec{\eta }_{0} ),(\varvec{\lambda }_{0},\varvec{\eta }_{0}))$ such that the matrix $\varvec{J}_{\varvec{F}}$ is nonsingular (in our case $\varvec{J} _{\varvec{F}}$ at $(\varvec{p}(\varvec{\lambda }_{0} ,\varvec{\eta }_{0}),(\varvec{\lambda }_{0},\varvec{\eta }_{0}))$ is positive definite and then it is continuously differentiable). Also, there exists a continuously differentiable function and a set A such that

$$\begin{aligned} \widetilde{\varvec{\theta }}:A\subset l^{g}\rightarrow \mathbb {R}^{t+u} \end{aligned}$$

such that $\varvec{p}(\varvec{\lambda }_{0},\varvec{\eta }_{0})\in A$ and

$$\begin{aligned} \left\{ (\varvec{p},(\varvec{\lambda },\varvec{\eta }))\in U:\varvec{F}(\varvec{p},(\varvec{\lambda },\varvec{\eta }))=\varvec{0}\right\} =\left\{ (\varvec{p} ,\widetilde{\varvec{\theta }}(\varvec{p})):\varvec{p}\in A\right\} . \end{aligned}$$

(23)

Let us define

$$\begin{aligned} \psi (\varvec{\lambda },\varvec{\eta }):=D_{\phi }(\varvec{p} (\varvec{\lambda }_{0},\varvec{\eta }_{0}),\varvec{p} (\varvec{\lambda },\varvec{\eta })). \end{aligned}$$

As $\varvec{p}(\varvec{\lambda }_{0},\varvec{\eta }_{0})\in A$, we conclude that

$$\begin{aligned} \varvec{F}(\varvec{p}(\varvec{\lambda }_{0},\varvec{\eta } _{0}),\widetilde{\varvec{\theta }}(\varvec{p}(\varvec{\lambda } _{0},\varvec{\eta }_{0})))={\frac{\partial D_{\phi }(\varvec{p} (\varvec{\lambda }_{0},\varvec{\eta }_{0}),\varvec{p} (\widetilde{\varvec{\theta }}(\varvec{p}(\varvec{\lambda } _{0},\varvec{\eta }_{0}))))}{\partial (\varvec{\lambda },\varvec{\eta })}}=\varvec{0}. \end{aligned}$$

Briefly speaking,$\widehat{\varvec{\theta }}(\varvec{p} (\varvec{\lambda }_{0},\varvec{\eta }_{0}))$ is the minimum of function $\psi $. On the other hand, applying (23),

$$\begin{aligned} (\varvec{p}(\varvec{\lambda }_{0},\varvec{\eta }_{0} ),\widetilde{\varvec{\theta }}(\varvec{p}(\varvec{\lambda } _{0},\varvec{\eta }_{0})))\in U, \end{aligned}$$

and then $\varvec{J}_{\varvec{F}}$ is positive definite at $(\varvec{p}(\varvec{\lambda }_{0},\varvec{\eta }_{0} ),\widetilde{\varvec{\theta }}(\varvec{p}(\varvec{\lambda } _{0},\varvec{\eta }_{0})))$. Therefore,

$$\begin{aligned} D_{\phi }(\varvec{p}(\varvec{\lambda }_{0},\varvec{\eta } _{0}),\varvec{p}(\widetilde{\varvec{\theta }}(\varvec{p} (\varvec{\lambda }_{0},\varvec{\eta }_{0}))))=\inf _{(\varvec{\lambda },\varvec{\eta })\in \Theta }D_{\phi }(\varvec{p}(\varvec{\lambda } _{0},\varvec{\eta }_{0}),\varvec{p}(\varvec{\lambda } ,\varvec{\eta })), \end{aligned}$$

and by the $\phi $-divergence properties $\widehat{\varvec{\theta } }(\varvec{p}(\varvec{\lambda }_{0},\varvec{\eta }_{0} ))=(\varvec{\lambda }_{0},\varvec{\eta }_{0})^{T}$, and

$$\begin{aligned} {\frac{\partial \varvec{F}}{\partial \varvec{p}(\varvec{\lambda } _{0},\varvec{\eta }_{0})}}-{\frac{\partial \varvec{F}}{\partial (\varvec{\lambda }_{0},\varvec{\eta }_{0})}}{\frac{\partial (\varvec{\lambda }_{0},\varvec{\eta }_{0})}{\partial \varvec{p} (\varvec{\lambda }_{0},\varvec{\eta }_{0})}}=\varvec{0}. \end{aligned}$$

Further, we know that

$$\begin{aligned} {\frac{\partial \varvec{F}}{\partial (\varvec{\lambda }_{0} ,\varvec{\eta }_{0})}}=\phi ^{\prime \prime }(1)\varvec{A} (\varvec{\lambda }_{0},\varvec{\eta }_{0})^{T}\varvec{A} (\varvec{\lambda }_{0},\varvec{\eta }_{0}). \end{aligned}$$

The (i, j)th element of the $(t+u)\times g$ matrix ${\frac{\partial F_{j} }{\partial p_{i}}}$ is given by

$$\begin{aligned} {\frac{\partial }{\partial p_{i}}}\left( {\frac{\partial D_{\phi }(\varvec{p},\varvec{p}(\varvec{\lambda },\varvec{\eta } ))}{\partial \theta _{j}}}\right) ={\frac{1}{p_{i}(\varvec{\lambda },\varvec{\eta })}}\left( -{\frac{p_{i}}{p_{i}(\varvec{\lambda },\varvec{\eta })}}\phi ^{\prime \prime }\left( {\frac{p_{i}}{p_{i} (\varvec{\lambda },\varvec{\eta })}}\right) \right) {\frac{\partial p_{i}(\varvec{\lambda },\varvec{\eta })}{\partial \theta _{j}}} \end{aligned}$$

and for $(\varvec{p}(\varvec{\lambda }_{0},\varvec{\eta } _{0}),(\varvec{\lambda }_{0},\varvec{\eta }_{0}))$ we have

$$\begin{aligned} {\frac{\partial }{\partial p_{i}}}\left( {\frac{\partial D_{\phi }(\varvec{p},\varvec{p}(\varvec{\lambda },\varvec{\eta } ))}{\partial \theta _{j}}}\right) ={\frac{1}{p_{i}(\varvec{\lambda } _{0},\varvec{\eta }_{0})}}\phi ^{\prime \prime }\left( 1\right) {\frac{\partial p_{i}(\varvec{\lambda }_{0},\varvec{\eta }_{0} )}{\partial \theta _{j}}}. \end{aligned}$$

Since $\varvec{A}(\varvec{\lambda }_{0},\varvec{\eta } _{0})=\varvec{D}_{\varvec{p}(\varvec{\lambda }_{0},\varvec{\eta }_{0})}^{-{\frac{1}{2}}}\varvec{J}(\varvec{\lambda }_{0} ,\varvec{\eta }_{0})$, then

$$\begin{aligned} {\frac{\partial \varvec{F}}{\partial \varvec{p}(\varvec{\lambda } _{0},\varvec{\eta }_{0})}}=-\phi ^{\prime \prime }(1)\varvec{A} (\varvec{\lambda }_{0},\varvec{\eta }_{0})^{T}\varvec{D} _{\varvec{p}(\varvec{\lambda }_{0},\varvec{\eta }_{0})}^{-{\frac{1}{2}}} \end{aligned}$$

and

$$\begin{aligned} {\frac{\partial (\varvec{\lambda }_{0},\varvec{\eta }_{0})}{\partial \varvec{p}(\varvec{\lambda }_{0},\varvec{\eta }_{0})} }=(\varvec{A}(\varvec{\lambda }_{0},\varvec{\eta }_{0} )^{T}\varvec{A}(\varvec{\lambda }_{0},\varvec{\eta }_{0} ))^{-1}\varvec{A}(\varvec{\lambda }_{0},\varvec{\eta }_{0} )^{T}\varvec{D}_{\varvec{p}(\varvec{\lambda }_{0},\varvec{\eta }_{0})}^{-{\frac{1}{2}}}. \end{aligned}$$

A first order Taylor expansion of the function $\widehat{\varvec{\theta }}$ around $\varvec{p}(\varvec{\lambda }_{0},\varvec{\eta }_{0})$ yields

$$\begin{aligned} \widetilde{\varvec{\theta }}(\varvec{p})=\widetilde{\varvec{\theta }}(\varvec{p}(\varvec{\lambda }_{0},\varvec{\eta }_{0}))+\left. {\frac{\partial \widetilde{\varvec{\theta }}(\varvec{p})}{\varvec{p} }}\right| _{\varvec{p}=\varvec{\pi }}(\varvec{p}-\varvec{p} (\varvec{\lambda }_{0},\varvec{\eta }_{0}))+o(\Vert \varvec{p} -\varvec{p}(\varvec{\lambda }_{0},\varvec{\eta }_{0})\Vert ). \end{aligned}$$

But $\widehat{\varvec{\theta }}(\varvec{p}(\varvec{\lambda } _{0},\varvec{\eta }_{0}))=(\varvec{\lambda }_{0}^{T},\varvec{\eta }_{0}^{T})^{T}$, hence

$$\begin{aligned} \widetilde{\varvec{\theta }}(\varvec{p})&=(\varvec{\lambda } _{0}^{T},\varvec{\eta }_{0}^{T})^{T}+(\varvec{A}(\varvec{\lambda }_{0},\varvec{\eta }_{0})^{T}\varvec{A}(\varvec{\lambda } _{0},\varvec{\eta }_{0}))^{-1}\varvec{A}(\varvec{\lambda } _{0},\varvec{\eta }_{0})^{T}\varvec{D}_{\varvec{p} (\varvec{\lambda }_{0},\varvec{\eta }_{0}}^{-{\frac{1}{2}} }(\varvec{p}-\varvec{p}(\varvec{\lambda }_{0},\varvec{\eta } _{0}))\\&\quad +\,o(\Vert \varvec{p}-\varvec{p}(\varvec{\lambda }_{0} ,\varvec{\eta }_{0})\Vert ). \end{aligned}$$

It is well-known that the nonparametric estimation $\widehat{\varvec{p}}$ converges almost sure to the probability vector $\varvec{p} (\varvec{\lambda }_{0},\varvec{\eta }_{0})$. Therefore $\widehat{\varvec{p}}\in A$ and $\widehat{\varvec{\theta } }(\varvec{p})$ is the unique solution of the system of equations

$$\begin{aligned} {\frac{\partial D_{\phi }(\widehat{\varvec{p}},\varvec{p} (\widetilde{\varvec{\theta }}(\varvec{p})))}{\theta _{j}}} =0,\,j=1,{\ldots },t+u, \end{aligned}$$

and also $(\widehat{\varvec{p}},\widetilde{\varvec{\theta } }(\varvec{p}))\in U$. Therefore,$\widehat{\varvec{\theta } }(\widehat{\varvec{p}})$ is the minimum $\phi $-divergence estimator, $\widehat{\varvec{\theta }}_{\phi }$, satisfying the relation

$$\begin{aligned} \widetilde{\varvec{\theta }}_{\phi }&=(\varvec{\lambda }_{0} ^{T},\varvec{\eta }_{0}^{T})^{T}+(\varvec{A}(\varvec{\lambda } _{0},\varvec{\eta }_{0})^{T}\varvec{A}(\varvec{\lambda } _{0},\varvec{\eta }_{0}))^{-1}\varvec{A}(\varvec{\lambda } _{0},\varvec{\eta }_{0})^{T}\varvec{D}_{\varvec{p} (\varvec{\lambda }_{0},\varvec{\eta }_{0})}^{-{\frac{1}{2}} }(\widehat{\varvec{p}}-\varvec{p}(\varvec{\lambda }_{0} ,\varvec{\eta }_{0}))\\&\quad +O_{p}(N^{-1/2}). \end{aligned}$$

This finishes the proof. $\square $

1.2 Proof of Theorem 2

Based on the BAN decomposition of the previous theorem it holds

$$\begin{aligned} \sqrt{N}(\widetilde{\varvec{\theta }}_{\phi }-(\varvec{\lambda }_{0} ^{T},\varvec{\eta }_{0}^{T})^{T})&=\left( \varvec{A} (\varvec{\lambda }_{0},\varvec{\eta }_{0})^{T}\varvec{A} (\varvec{\lambda }_{0},\varvec{\eta }_{0})\right) ^{-1}\\&\quad \times \varvec{A} (\varvec{\lambda }_{0},\varvec{\eta }_{0})\varvec{D}_{\varvec{p} (\varvec{\lambda }_{0},\varvec{\eta }_{0})}^{-{\frac{1}{2}}}\sqrt{N}(\widehat{\varvec{p}}-\varvec{p}(\varvec{\lambda } _{0},\varvec{\eta }_{0})) +O_{p}(1). \end{aligned}$$

By the Central Limit Theorem we conclude $\sqrt{N}(\widehat{\varvec{p}}-\varvec{p}(\varvec{\lambda }_{0},\varvec{\eta }_{0} ))\overset{\mathcal {L}}{\underset{N\rightarrow \infty }{\!}{\longrightarrow } }\mathcal {N}(\varvec{0},\varvec{\Sigma }_{\varvec{p} (\varvec{\lambda }_{0},\varvec{\eta }_{0})})$, where $\varvec{\Sigma }_{\varvec{p}(\varvec{\lambda }_{0},\varvec{\eta }_{0} )}=\varvec{D}_{\varvec{p}(\varvec{\lambda }_{0},\varvec{\eta }_{0})}-\varvec{p}(\varvec{\lambda }_{0},\varvec{\eta } _{0})\varvec{p}(\varvec{\lambda }_{0},\varvec{\eta }_{0})^{T}$. Now the result holds after some algebra. $\square $

1.3 Proof of Theorem 3

A second order Taylor expansion of $D_{\phi _{1}}\left( \varvec{p} ,\varvec{q}\right) $ around $\left( \varvec{p}\left( \varvec{\theta }_{0}\right) ,\varvec{p}\left( \varvec{\theta } _{0}\right) \right) $ at $\left( \widehat{\varvec{p}},\varvec{p} (\widehat{\varvec{\theta }}_{\phi _{2}})\right) $ is given by

$$\begin{aligned} D_{\phi _{1}}\left( \widehat{\varvec{p}},\varvec{p} (\widehat{\varvec{\theta }}_{\phi _{2}})\right) ={\frac{\phi _{1} ^{\prime \prime }(1)}{2}}\text { }\left( \widehat{\varvec{p}}-\varvec{p} (\widehat{\varvec{\theta }}_{\phi _{2}})\right) ^{T}\varvec{D} _{\varvec{p}(\varvec{\theta }_{0})}^{-1}\left( \widehat{\varvec{p} }-\varvec{p}(\widehat{\varvec{\theta }}_{\phi _{2}})\right) +o_{p}(N^{-1}). \end{aligned}$$

By Theorem 1 we have

$$\begin{aligned} \widetilde{\varvec{\theta }}_{\phi }= & {} (\varvec{\lambda }_{0} ^{T},\varvec{\eta }_{0}^{T})^{T}+(\varvec{A}(\varvec{\lambda } _{0},\varvec{\eta }_{0})^{T}\varvec{A}(\varvec{\lambda } _{0},\varvec{\eta }_{0}))^{-1}\varvec{A}(\varvec{\lambda } _{0},\varvec{\eta }_{0})^{T}\varvec{D}_{\varvec{p} (\varvec{\lambda }_{0},\varvec{\eta }_{0})}^{-{\frac{1}{2}} }(\widehat{\varvec{p}}-\varvec{p}(\varvec{\lambda }_{0} ,\varvec{\eta }_{0}))\nonumber \\&+\,O_{p}(N^{-1/2}). \end{aligned}$$

Therefore,

$$\begin{aligned} \varvec{p}(\widehat{\varvec{\theta }}_{\phi _{2}})-\varvec{p}\left( \varvec{\theta }_{0}\right) =\varvec{V}\left( \varvec{\theta } _{0}\right) \left( \hat{\varvec{p}}-\varvec{p}\left( \varvec{\theta }_{0}\right) \right) +o_{p}(N^{-1/2}) \end{aligned}$$

with $\varvec{V}\left( \varvec{\theta }_{0}\right) :=\varvec{D} _{\varvec{p}(\varvec{\theta }_{0})}^{1/2}\varvec{A}\left( \varvec{\ \theta }_{0}\right) \left( \varvec{A}\left( \varvec{\theta }_{0}\right) ^{T}\varvec{A}\left( \varvec{\theta }_{0}\right) \right) ^{-1}\varvec{A}\left( \varvec{\theta } _{0}\right) ^{T}\varvec{D}_{\varvec{p}(\varvec{\theta }_{0} )}^{-1/2}$. On the other hand,

$$\begin{aligned} \sqrt{N}\left( \hat{\varvec{p}}-\varvec{p}\left( \varvec{\theta }_{0}\right) \right) \overset{\mathcal {L}}{\underset{N\rightarrow \infty }{\longrightarrow }}\mathcal {N}\left( \varvec{0},\varvec{\Sigma }_{\varvec{p}(\varvec{\theta }_{0})}\right) . \end{aligned}$$

Then we have

$$\begin{aligned} \widehat{\varvec{p}}-\varvec{p}(\widehat{\varvec{\theta }} _{\phi _{2}})=\left( \varvec{I}-\mathbf {V}\left( {\varvec{\theta } _{0}}\right) \right) \left( \hat{\varvec{p}}-\varvec{p} (\varvec{\theta }_{0})\right) +o_{p}(N^{-1/2}), \end{aligned}$$

and we conclude that

$$\begin{aligned} \sqrt{N}\left( \widehat{\varvec{p}}-\varvec{p} (\widehat{\varvec{\theta }}_{\phi _{2}})\right) \overset{\mathcal {L} }{\underset{N\rightarrow \infty }{\longrightarrow }}\mathcal {N}\left( \varvec{0},\left( \varvec{I}-\varvec{V}\left( \varvec{\theta }_{0}\right) ^{T}\right) \varvec{\Sigma }_{\varvec{p} (\varvec{\ \theta }_{0})}\left( \varvec{I}-\varvec{V}\left( \varvec{\theta }_{0}\right) ^{T}\right) \right) . \end{aligned}$$

Notice that the asymptotic distribution of

$$\begin{aligned} \frac{2N}{\phi _{1}^{\prime \prime }(1)}D_{\phi _{1}}\left( \widehat{\varvec{p}},\varvec{p}(\widehat{\varvec{\theta }} _{\phi _{2}})\right) \end{aligned}$$

coincides with the asymptotic distribution of the quadratic form

$$\begin{aligned} N\left( \widehat{\varvec{p}}-\varvec{p}(\widehat{\varvec{\theta } }_{\phi _{2}})\right) ^{T}\varvec{D}_{\varvec{p}(\varvec{\theta }_{0})}^{-1}\left( \widehat{\varvec{p}}-\varvec{p} (\widehat{\varvec{\theta }}_{\phi _{2}})\right) =\varvec{X} ^{T}\varvec{X}, \end{aligned}$$

with

$$\begin{aligned} \varvec{X}:=\sqrt{N}\varvec{D}_{\varvec{p}(\varvec{\theta } _{0})}^{-1/2}\left( \widehat{\varvec{p}}-\varvec{p} (\widehat{\varvec{\theta }}_{\phi _{2}})\right) . \end{aligned}$$

Now, as

$$\begin{aligned} \varvec{X}\overset{\mathcal {L}}{\underset{N\rightarrow \infty }{\longrightarrow }}\mathcal {N}\left( \varvec{0},\varvec{D} _{\varvec{p}(\varvec{\theta }_{0})}^{-1/2}\left( \varvec{I} -\varvec{V}\left( \varvec{\theta }_{0}\right) ^{T}\right) \varvec{\Sigma }_{\varvec{p}(\varvec{\ \theta }_{0})}\left( \varvec{I}-\varvec{V}\left( \varvec{\theta }_{0}\right) ^{T}\right) \varvec{D}_{\varvec{p}(\varvec{\theta }_{0})} ^{-1/2}\right) , \end{aligned}$$

we conclude that the asymptotic distribution of $\varvec{x}^{T} \varvec{x}$ will be a chi-square distribution if the matrix

$$\begin{aligned} \mathbf {Q}\left( {\varvec{\theta }}_{0}\right) :=\varvec{D} _{\varvec{p}(\varvec{\theta }_{0})}^{-1/2}\left( \varvec{I} -\varvec{V}\left( \varvec{\theta }_{0}\right) ^{T}\right) \varvec{\Sigma }_{\varvec{p}(\varvec{\ \theta }_{0})}\left( \varvec{I}-\varvec{V}\left( \varvec{\theta }_{0}\right) ^{T}\right) \varvec{D}_{\varvec{p}(\varvec{\theta }_{0})}^{-1/2} \end{aligned}$$

is idempotent and symmetric, and in this case de degrees of freedom will be the trace of the matrix $\mathbf {Q}\left( {\varvec{\theta }}_{0}\right) $. Symmetry is evident. Establishing that the matrix $\mathbf {Q}\left( {\varvec{\theta }}_{0}\right) $ is idempotent and that its trace is $g-(u+t)-1$ is a simple but long and tedious exercise.

1.4 Proof of Theorem 4

In Felipe et al. (2014) we established the asymptotic distribution of $S_{A-B}^{\phi _{1},\phi _{2}}$ for LCMs for binary data. Let us then establish in this case the asymptotic distribution of $T_{A-B}^{\phi _{1},\phi _{2}}$, the other being similar. A second order Taylor expansion of $D_{\phi _{1}}\left( \varvec{p},\varvec{q}\right) $ around $\left( \varvec{p}\left( \varvec{\theta }_{0}\right) ,\varvec{p}\left( \varvec{\theta } _{0}\right) \right) $ at $\left( \varvec{p}(\widehat{\varvec{\theta }}_{\phi _{2}}^{A}),\varvec{p}(\widehat{\varvec{\theta }}_{\phi _{2}} ^{B})\right) $ is given by (see the proof of Theorem 1)

$$\begin{aligned} D_{\phi _{1}}\left( \varvec{p}(\widehat{\varvec{\theta }}_{\phi _{2} }^{A}),\varvec{p}(\widehat{\varvec{\theta }}_{\phi _{2}}^{B})\right)&={\frac{\phi _{1}^{\prime \prime }(1)}{2}}\text { }\left( \varvec{p} (\widehat{\varvec{\theta }}_{\phi _{2}}^{A})-\varvec{p} (\widehat{\varvec{\theta }}_{\phi _{2}}^{B})\right) ^{T}\varvec{D} _{\varvec{p}(\varvec{\theta }_{0})}^{-1}\left( \varvec{p} (\widehat{\varvec{\theta }}_{\phi _{2}}^{A})-\varvec{p} (\widehat{\varvec{\theta }}_{\phi _{2}}^{B})\right) \\&\quad +o(||\varvec{p}(\widehat{\varvec{\theta }}_{\phi _{2}}^{A} )-\varvec{p}(\varvec{\theta }_{0})||^{2})+o(||\varvec{p} (\widehat{\varvec{\theta }}_{\phi _{2}}^{B})-\varvec{p} (\varvec{\theta }_{0})||^{2}), \end{aligned}$$

Therefore,

$$\begin{aligned} T_{A-B}^{\phi _{1,}\phi _{2}}=\varvec{X}_{A-B}^{T}\varvec{X}_{A-B} +o_{p}(1), \end{aligned}$$

with

$$\begin{aligned} \varvec{X}_{A-B}:=\sqrt{N}\varvec{D}_{\varvec{p} (\varvec{\theta }_{0})}^{-1/2}\left( \varvec{p} (\widehat{\varvec{\theta }}_{\phi _{2}}^{A})-\varvec{p} (\widehat{\varvec{\theta }}_{\phi _{2}}^{B})\right) . \end{aligned}$$

On the other hand, we already know that

$$\begin{aligned} \widehat{\varvec{\theta }}_{\phi _{2}}^{A}-\varvec{\theta }_{0}&=\varvec{J}\left( \varvec{\theta }_{0}\right) \left( \varvec{A} _{A}^{T}\varvec{A}_{A}\right) ^{-1}\varvec{A}_{A}^{T}\varvec{D} _{\varvec{p}(\varvec{\theta }_{0})}^{-1/2}\left( \hat{\varvec{p} }-\varvec{p}\left( \varvec{\theta }_{0}\right) \right) +o_{p}(N^{-1/2}),\\ \widehat{\varvec{\theta }}_{\phi _{2}}^{B}-\varvec{\theta }_{0}&=\varvec{J}\left( \varvec{\theta }_{0}\right) \left( \varvec{A} _{B}^{T}\varvec{A}_{B}\right) ^{-1}\varvec{A}_{B}^{T}\varvec{D} _{\varvec{p}(\varvec{\theta }_{0})}^{-1/2}\left( \hat{\varvec{p} }-\varvec{p}\left( \varvec{\theta }_{0}\right) \right) +o_{p}(N^{-1/2}). \end{aligned}$$

Let us define

$$\begin{aligned} \varvec{W}_{A}:=\varvec{A}_{A}\left( \varvec{A}_{A} ^{T}\varvec{A}_{A}\right) ^{-1}\varvec{A}_{A}^{T},\quad \varvec{W} _{B}:=\varvec{A}_{B}\left( \varvec{A}_{B}^{T}\varvec{A} _{B}\right) ^{-1}\varvec{A}_{B}^{T}. \end{aligned}$$

Consequently, the asymptotic distribution of $\varvec{x}$ coincides with the asymptotic distribution of

$$\begin{aligned} \sqrt{N}\left( \varvec{W}_{A}-\varvec{W}_{B}\right) \varvec{D} _{\varvec{p}(\varvec{\theta }_{0})}^{-1/2}\left( \hat{\varvec{p} }-\varvec{p}\left( \varvec{\theta }_{0}\right) \right) . \end{aligned}$$

Now,

$$\begin{aligned} \sqrt{N}\left( \hat{\varvec{p}}-\varvec{p}\left( \varvec{\theta }_{0}\right) \right) \overset{\mathcal {L}}{\underset{N\rightarrow \infty }{\longrightarrow }}\mathcal {N}\left( \varvec{0,\Sigma }^{*}\right) \end{aligned}$$

being

$$\begin{aligned} \varvec{\Sigma }^{*}=\left( \varvec{W}_{A}-\varvec{W} _{B}\right) \varvec{D}_{\varvec{p}(\varvec{\theta }_{0})} ^{-1/2}\varvec{\Sigma }_{\varvec{p}(\varvec{\theta }_{0} )}\varvec{D}_{\varvec{p}(\varvec{\theta }_{0})}^{{-1/2}}\left( \varvec{W}_{A}-\varvec{W}_{B}\right) . \end{aligned}$$

Consequently, it suffices to show that $\varvec{\Sigma }^{*}$ is a symmetric and idempotent matrix. Symmetry is trivial, hence it suffices to show idempotency. Notice that

$$\begin{aligned} \varvec{D}_{\varvec{p}(\varvec{\theta }_{0})}^{-1/2} \varvec{\Sigma }_{\varvec{p}(\varvec{\theta }_{0})}\varvec{D} _{\varvec{p}(\varvec{\theta }_{0})}^{{-1/2}}&=\varvec{D} _{\varvec{p}(\varvec{\theta }_{0})}^{-1/2}\left( \varvec{D} _{\varvec{p}(\varvec{\theta }_{0})}-\varvec{p}(\varvec{\theta }_{0})\varvec{p}(\varvec{\theta }_{0})^{T}\right) \varvec{D} _{\varvec{p}(\varvec{\theta }_{0})}^{{-1/2}}\\&=\varvec{I}-\sqrt{\varvec{p}(\varvec{\theta }_{0})} \sqrt{\varvec{p}(\varvec{\theta }_{0})}^{T}. \end{aligned}$$

Next, it can be computed that $\varvec{W}_{A}\sqrt{\varvec{p} (\varvec{\theta }_{0})}=\varvec{W}_{B}\sqrt{\varvec{p} (\varvec{\theta }_{0})}=\varvec{0}$. Finally,

$$\begin{aligned} \varvec{\Sigma }^{*}{=}\left( \varvec{W}_{A}{-}\varvec{W} _{B}\right) \left( \mathbf {Id}-\sqrt{\varvec{p}(\varvec{\theta } _{0})}\sqrt{\varvec{p}(\varvec{\theta }_{0})}^{T}\right) \left( \varvec{W}_{A}-\varvec{W}_{B}\right) =\left( \varvec{W} _{A}{-}\varvec{W}_{B}\right) \left( \varvec{W}_{A}{-}\varvec{W} _{B}\right) . \end{aligned}$$

On the other hand,

$$\begin{aligned} \varvec{W}_{A}\varvec{W}_{B}=\varvec{W}_{B},\,\varvec{W} _{B}\varvec{W}_{A}=\varvec{W}_{B},\,\varvec{W}_{A}\varvec{W} _{A}=\varvec{W}_{A},\,\varvec{W}_{B}\varvec{W}_{B}=\varvec{W} _{B}, \end{aligned}$$

hence we conclude that

$$\begin{aligned} \varvec{\Sigma }^{*}=\varvec{W}_{A}-\varvec{W}_{B} \end{aligned}$$

and it is an idempotent matrix. We conclude that

$$\begin{aligned} T_{A-B}^{\phi _{1,}\phi _{2}}\overset{\mathcal {L}}{\underset{N\rightarrow \infty }{\longrightarrow }}\chi _{\hbox {tr}(\varvec{\Sigma }^{*})}^{2}. \end{aligned}$$

To obtain the degrees of freedom we compute

$$\begin{aligned} \hbox {tr}(\varvec{\Sigma }^{*})&=\hbox {tr}\left( \varvec{W} _{A}-\varvec{W}_{B}\right) \\&=\hbox {tr}\left( \varvec{W}_{A}\right) -\hbox {tr}\left( \varvec{W}_{B}\right) \\&=\hbox {tr}\left( \varvec{A}_{A}\left( \varvec{A}_{A} ^{T}\varvec{A}_{A}\right) ^{-1}\varvec{A}_{A}^{T}\right) -\hbox {tr}\left( \varvec{A}_{B}\left( \varvec{A}_{B} ^{T}\varvec{A}_{B}\right) ^{-1}\varvec{A}_{B}^{T}\right) \\&=\hbox {tr}\left( \varvec{A}_{A}^{T}\varvec{A}_{A}\left( \varvec{A}_{A}^{T}\varvec{A}_{A}\right) ^{-1}\right) -\hbox {tr} \left( \varvec{A}_{B}^{T}\varvec{A}_{B}\left( \varvec{A}_{B} ^{T}\varvec{A}_{B}\right) ^{-1}\right) \\&=h_{1}-h_{2}. \end{aligned}$$

This finishes the proof. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Felipe, A., Martín, N., Miranda, P. et al. Statistical inference in constrained latent class models for multinomial data based on $\phi $-divergence measures. Adv Data Anal Classif 12, 605–636 (2018). https://doi.org/10.1007/s11634-017-0289-7

Download citation

Received: 11 March 2016
Revised: 15 June 2017
Accepted: 23 June 2017
Published: 04 July 2017
Issue Date: September 2018
DOI: https://doi.org/10.1007/s11634-017-0289-7

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Statistical inference in constrained latent class models for multinomial data based on \(\phi \)-divergence measures

Abstract

Access this article

Similar content being viewed by others

Minimum $$\phi $$ -Divergence Estimation in Constrained Latent Class Models for Binary Data

Phi-divergence Test Statistics Applied to Latent Class Models for Binary Data

Equivalence Tests for Multinomial Data Based on $$\phi $$ -Divergences

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

1.1 Proof of Theorem 1

1.2 Proof of Theorem 2

1.3 Proof of Theorem 3

1.4 Proof of Theorem 4

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Statistical inference in constrained latent class models for multinomial data based on \(\phi \)-divergence measures

Abstract

Access this article

Similar content being viewed by others

Minimum $$\phi $$ -Divergence Estimation in Constrained Latent Class Models for Binary Data

Phi-divergence Test Statistics Applied to Latent Class Models for Binary Data

Equivalence Tests for Multinomial Data Based on $$\phi $$ -Divergences

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Appendix

1.1 Proof of Theorem 1

1.2 Proof of Theorem 2

1.3 Proof of Theorem 3

1.4 Proof of Theorem 4

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation