An accelerated EM algorithm for mixture models with uncertainty for rating data

Simone, Rosaria

doi:10.1007/s00180-020-01004-z

An accelerated EM algorithm for mixture models with uncertainty for rating data

Original paper
Published: 22 June 2020

Volume 36, pages 691–714, (2021)
Cite this article

Computational Statistics Aims and scope Submit manuscript

Rosaria Simone ORCID: orcid.org/0000-0002-6844-6418¹

582 Accesses
7 Citations
Explore all metrics

Abstract

The paper is framed within the literature around Louis’ identity for the observed information matrix in incomplete data problems, with a focus on the implied acceleration of maximum likelihood estimation for mixture models. The goal is twofold: to obtain direct expressions for standard errors of parameters from the EM algorithm and to reduce the computational burden of the estimation procedure for a class of mixture models with uncertainty for rating variables. This achievement fosters the feasibility of best-subset variable selection, which is an advisable strategy to identify response patterns from regression models for all Mixtures of Experts systems. The discussion is supported by simulation experiments and a real case study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mixture of shifted binomial distributions for rating data

Article 10 February 2023

Shaoting Li & Jiahua Chen

Testing distributional assumptions in CUB models for the analysis of rating data

Article Open access 13 April 2024

Francesca Di Iorio, Riccardo Lucchetti & Rosaria Simone

Model-based clustering via mixtures of unrestricted skew normal factor analyzers with complete and incomplete data

Article 06 December 2022

Wan-Lun Wang & Tsung-I Lin

References

Agresti A (2010) Analysis of ordinal categorical data, 2nd edn. Wiley, Hoboken
Book Google Scholar
Allik J (2014) A mixed-binomial model for Likert-type personality measure. Front Psychol 5:1–13
Article Google Scholar
Baker SG (1992) A simple method for computing the observed information matrix when using the EM algorithm with categorical data. J Comput Graph Statist 1(1):63–76
MathSciNet Google Scholar
Buckland ST, Burnham KP, Augustin NH (1997) Model selection: an integral part of inference. Biometrics 53:603–618
Article Google Scholar
Burnham KP, Anderson DR (2003) Model selection and multimodel inference: a practical information-theoretic approach, 2nd edn. Springer, New York
MATH Google Scholar
Capecchi S, Piccolo D (2017) Dealing with heterogeneity in ordinal responses. Qual Quant 51:2375–2393
Article Google Scholar
Cappelli C, Simone R, Di Iorio F (2019) CUBREMOT: a tool for building model-based trees for ordinal responses. Expert Syst Appl 124:39–49
Article Google Scholar
Colombi R, Giordano S (2016) A class of mixture models for multidimensional ordinal data. Statist Model 16(4):322–340
Article MathSciNet Google Scholar
Corduas M (2011) Assessing similarity of rating distributions by Kullback-Liebler divergence. In: Fichet A et al (eds) Classification and multivariate analysis for complex data structures, studies in classification, data analysis, and knowledge organization. Springer, Berlin, Heidelberg, pp 221–228
D’Elia A, Piccolo D (2005) A mixture model for preference data analysis. Comput. Stat. Data Ann. 49:917–934
Article MathSciNet Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Statist Soc Ser B 39(1):1–38
MathSciNet MATH Google Scholar
GESIS Leibniz Institute for the Social Sciences (2016) German General Social Survey (ALLBUS)—Cumulation 1980-2014, GESIS Data Archive, Cologne. ZA4584 Data file version 1.0.0. https://doi.org/10.4232/1.12574
Gormley IC, Frühwirth-Schnatter S (2019) Mixture of Experts Models, Chapter 12 In: Frühwirth-Schnatter, S, Gilles, C, Robert CP (eds) Handbook of mixture analysis, 1st edn, Chapman & Hall, CRC, Handbooks of Modern Statistical Methods, https://doi.org/10.1201/9780429055911
Gottard A, Iannario M, Piccolo D (2016) Varying uncertainty in cub models. Adv Data Anal Classif 10(2):225–244
Article MathSciNet Google Scholar
Iannario M (2008) Selecting feeling covariates in rating surveys. Statist Appl 20(2):121–134
Google Scholar
Iannario M (2010) On the identifiability of a mixture model for ordinal data. Metron LXVIII(1):87–94
Article MathSciNet Google Scholar
Iannario M (2012) Preliminary estimators for a mixture model of ordinal data. Adv Data Anal Classif 6(3):163–184
Article MathSciNet Google Scholar
Iannario M, Monti AC, Piccolo D, Ronchetti E (2017) Robust inference for ordinal response models. Electron J Statist 11:3407–3445
Article MathSciNet Google Scholar
Iannario M, Piccolo D, Simone R (2018) CUB: a class of mixture models for ordinal data. (R package version 1.1.3), http://CRAN.R-project.org/package=CUB
Ibrahim JC (1990) Incomplete data in generalized linear models. J Am Statist Assoc 85:765–769
Article Google Scholar
Louis TA (1976) Maximum likelihood estimation using pseudo-data interactions. Boston University Research Report, No, pp 2–76
Louis TA (1982) Finding the observed information matrix when using the EM algorithm. J R Statist Soc Ser B 44:226–233
MathSciNet MATH Google Scholar
Khalili A, Chen J (2007) Variable selection in finite mixture of regression models. J Am Statist Assoc 102(479):1025–1038
Article MathSciNet Google Scholar
Mahalanobis PC (1936) On the generalised distance in statistics. Proc National Inst Sci India 2(1):49–55
MathSciNet MATH Google Scholar
Manisera M, Zuccolotto P (2014) Modeling rating data with Non Linear CUB models. Comput Stat Data Ann 78:100–118
Article Google Scholar
McCullagh P (1980) Regression models for ordinal data. J R Statist Soc Ser B 42(2):109–142
MathSciNet MATH Google Scholar
McLachlan GJ, Krishnan T (1997) The EM Algorithm and Extensions, 2nd edn, Wiley Series in Probability and Statistics
Meilijson I (1989) A fast improvement of the EM algorithm on its own terms. J R Statist Soc Ser B 51:127–138
MathSciNet MATH Google Scholar
Meng X, Rubin DB (1991) Using EM to obtain asymptotic variance-covariance matrices: the SEM algorithm. J Am Statist Assoc 86(416):899–909
Article Google Scholar
Miller K (1981) On the inverse of the sum of matrices. Math Mag 54(2):67–72
Article MathSciNet Google Scholar
Oakes D (1999) Direct calculation of the information matrix via the EM. J R Statist Soc Ser B 61(2):479–482
Article MathSciNet Google Scholar
Orchard T, Woodbury MA (1972) A missing information principle: theory and applications, Proc. Sixth Berkeley Symp. on Math. Stat. and Prob. 1, Univ. of Calif. Press, 697–715
Piccolo D (2003) On the moments of a mixture of uniform and shifted binomial random variables. Quaderni di Statistica 5:85–104
Google Scholar
Piccolo D (2006) Observed information matrix for MUB models. Quaderni di Statistica 8:33–78
Google Scholar
Piccolo D, Simone R (2019a) The class of cub models: statistical foundations, inferential issues and empirical evidence. Statist Method Appl 28:389–435 (with discussions)
Article MathSciNet Google Scholar
Piccolo D, Simone R (2019b) Rejoinder to the discussion of The class of cub models: statistical foundations, inferential issues and empirical evidence. Statist Method Appl 28:477–493
Article Google Scholar
Piccolo D, Simone R, Iannario M (2019) Cumulative and cub models for rating data: a comparative analysis. Int Statist Rev 87(2):207–236
Article MathSciNet Google Scholar
Pinto da Costa JF, Alonso H, Cardoso JS (2008) The unimodal model for the classification of ordinal data. Neural Networks, 21, 78–91. Corrigendum in: (2014). Neural Networks, 59, 73–75
Simone R (2020) FastCUB: Fast EM and Best-Subset Selection for CUB Models for Rating Data. R package version 0.0.2. https://CRAN.R-project.org/package=FastCUB
Simone R, Cappelli C, Di Iorio F (2019) Modelling marginal ranking distributions: the uncertainty tree. Pattern Recognit Lett 125(1):278–288
Article Google Scholar
Simone R, Tutz G (2018) Modelling uncertainty and response styles in ordinal data. Statist Neerlandica 72(3):224–245
Article MathSciNet Google Scholar
Simone R, Tutz G, Iannario M (2020) Subjective heterogeneity in response attitude for multivariate ordinal outcomes. Econ Statist 14:145–158
MathSciNet Google Scholar
Sundberg R (1976) An iterative method for solution of the likelihood equations for incomplete data from exponential families. Commun Statist Simul Comput B5(1):55–64
Article MathSciNet Google Scholar
Tibshirani R (1996) Regression Shrinkage and Selection via the LASSO. J R Statist Soc Ser B 58:267–288
MathSciNet MATH Google Scholar
Tutz G (2012) Regression for categorical data. Cambridge University Press, Cambridge
MATH Google Scholar
Zhou H, Lange K (2009) Rating movies and rating the raters who rate them. Am Stat 63:297–307
Article MathSciNet Google Scholar

Download references

Acknowledgements

The research has been partially funded by the ‘cub Regression Model Trees project’ (project number: 000025_ALTRI_DR_1043_2017-C-CAPPELLI) of the University of Naples Federico II, Italy.

Author information

Authors and Affiliations

Department of Political Sciences, University of Naples Federico II, Via Leopoldo Rodinò, 22, 80138, Napoli, Italy
Rosaria Simone

Authors

Rosaria Simone
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rosaria Simone.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix: the EM algorithm for CUB models

Given the notation set in Sects. 2 and 4, for a sample $\varvec{R} = (R_1,\ldots ,R_n)^{\prime }$ of ordinal scores to be collected on a scale with m categories, consider the full cub specification with covariates given in (1)–(2). Then, $\varvec{R}$ denotes the so-called incomplete data; let $\varvec{X} = (\varvec{R}^{\prime }, \varvec{Z}^{\prime })^{\prime }$ be the complete data, with missing data $\varvec{Z} = (Z_{1},\ldots , Z_n)^{\prime }$ given by:

$$\begin{aligned} Z_{i}= {\left\{ \begin{array}{ll} 1 &{}\quad \text {if the i-th observation is drawn from the feeling component} \\ 0 &{}\quad \text {otherwise}. \end{array}\right. } \end{aligned}$$

To be more specific, one should set $\varvec{Z}_{1} = \varvec{Z}$ and $\varvec{Z}_{2} = 1 - \varvec{Z}_1$ for the uncertainty component. Then, with obvious notation, consider the complete log-likelihood:

$$\begin{aligned} l_c(\varvec{\theta }; \varvec{R}, \varvec{Z}) = \sum _{i=1}^n Z_{1i}\, \log \big (\pi _i\;b_{R_i}(\xi _i) \big ) + \sum _{i=1}^n (1-Z_{1i})\, \log \big ((1-\pi _i)\dfrac{1}{m} \big ). \end{aligned}$$

(23)

At the k-th iteration and for realization $\varvec{r} = (r_1,\ldots ,r_n)^{\prime }$ of $\varvec{R}$, the procedure first computes the posterior probabilities that each observation is drawn from each component: In particular:

$$\begin{aligned} \tau _{1i}^{(k)}= & {} \dfrac{\pi _i^{(k)} b_{r_i}(\xi _i^{(k)})}{Pr(R_i=r_i \vert \varvec{\theta }^{(k)}, \varvec{y}_i, \varvec{w}_i)}, \qquad \end{aligned}$$

(24)

$$\begin{aligned} \tau _{2\,i}^{(k)}= & {} 1- \tau _{1\,i}^{(k)} = \dfrac{1}{m}\dfrac{1-\pi _i^{(k)}}{Pr(R_i=r_i \vert \varvec{\theta }^{(k)}, \varvec{y}_i, \varvec{w}_i)} \end{aligned}$$

(25)

where one sets:

$$\begin{aligned} {\text {logit}}(\pi _i^{(k)}) = \bar{\varvec{y}_i}\, \varvec{\beta }^{(k)} \qquad {\text {logit}}(\xi _i^{(k)}) = \bar{\varvec{w}_i}\, \varvec{\gamma }^{(k)} \;\qquad i=1,\ldots ,n. \end{aligned}$$

Thus, at the k-th step, the conditional expectation of the complete log-likelihood (23) to be maximized over $\varvec{\theta }$ is given by:

$$\begin{aligned} Q(\varvec{\theta }; \varvec{\theta }^{(k)})= & {} {\mathbb {E}}_{\varvec{\theta }^{(k)}}[l_c(\varvec{\theta }; \varvec{R}, \varvec{Z})| \varvec{R} = \varvec{r}] \\= & {} \sum _{i=1}^{n} \tau _{1i}^{(k)} \log (\pi _i(\varvec{\beta })) + \sum _{i=1}^{n}(1- \tau _{1i}^{(k)}) \log (1 - \pi _i(\varvec{\beta })) \\&\qquad + \sum _{i=1}^{n} \tau _{1i}^{(k)} \log (b_{r_i}(\xi _i(\varvec{\gamma }))) \; + \;\sum _{i=1}^{n} (1- \tau _{1i}^{(k)}) \log \big (\dfrac{1}{m}\big ) \end{aligned}$$

yielding to the updated estimate $\varvec{\theta }^{(k+1)}$. These steps are iterated until convergence is attained within a certain numerical tolerance. The Nelder–Mead algorithm is considered for the optimization steps in $\varvec{\theta } = (\varvec{\beta },\varvec{\gamma })^{\prime }$.

Appendix: Louis’ identity for CUB models

In full generality, consider cub models specification with $p\ge 0$, $q\ge 0$ covariates for uncertainty and feeling parameters, respectively. Notice that, since ${\text {logit}}(\pi _i) = \bar{\varvec{y}}_i \cdot \varvec{\beta }$ and ${\text {logit}}(\xi _i) = \bar{\varvec{w}}_i \cdot \varvec{\gamma }$, then

$$\begin{aligned} \dfrac{\partial \pi _i}{\beta _j} = {\bar{y}}_{ij}\,\pi _i\,(1-\pi _i), \qquad \dfrac{\partial \xi _i}{\gamma _j} = {\bar{w}}_{ij}\,\xi _i\,(1-\xi _i). \end{aligned}$$

For the complete information matrix (17), the first derivatives with respect to $\beta _j$ and $\gamma _l$ of the complete log-likelihood (23) are given by:

$$\begin{aligned} \dfrac{\partial \ell _c(\varvec{\theta })}{\partial \beta _j} = \sum _{i=1}^n {\bar{y}}_{ij} \big (Z_{i1} - \pi _i\big ), \qquad \dfrac{\partial \ell _c(\varvec{\theta })}{\partial \gamma _l} = \sum _{i=1}^n {\bar{w}}_{il} \,Z_{i1}\big ( m - R_i - \xi _i\,(m-1) \big )\quad \end{aligned}$$

(26)

from which it follows:

$$\begin{aligned}&\dfrac{\partial ^2 \ell _c(\varvec{\theta })}{\partial \beta _j\, \partial \beta _k} = - \sum _{i=1}^n {\bar{y}}_{ij}\,{\bar{y}}_{ik} \,\pi _i\,(1-\pi _i), \\&\quad \dfrac{\partial ^2 \ell _c(\varvec{\theta })}{\partial \gamma _h\partial \gamma _l} = - (m-1) \sum _{i=1}^n {\bar{w}}_{il} {\bar{w}}_{ih}\,Z_{i1}\,\xi _i (1-\xi _i) \end{aligned}$$

Then, taking the conditional expectation given $\varvec{R} = \varvec{r}$ of the negative of these second-order derivatives, the block-wise definition in (17) follows. Then, the matrix specification (19) in Louis’ identity follows straightforwardly since ${\mathbb {E}}[Z_{i1}|\varvec{R}=\varvec{r}] = \tau _i$.

Starting from the complete score vector (26), matrix ${\mathcal {V}}_c$ in (18) can be obtained as follows:

$$\begin{aligned} {\mathcal {V}}_c[j,l]&= {\mathbb {E}}\big [ \big ( \sum _{i=1}^n {\bar{y}}_{ij}(Z_{i1} - \pi _i)\big )\big (\sum _{t=1}^n {\bar{y}}_{tl}(Z_{t1} - \pi _t)\big ) \big | \varvec{R} = \varvec{r} \big ] \\&= \sum _{i=1}^n \sum _{t=1}^n {\bar{y}}_{ij}{\bar{y}}_{tl}Cov(Z_{i1}-\pi _i,Z_{t1} - \pi _t\big | \varvec{R} = \varvec{r} \big ) + \\&\quad + {\mathbb {E}}\big [\sum _{i=1}^n {\bar{y}}_{ij}(Z_{i1} - \pi _i)\big ]\,{\mathbb {E}}\big [\sum _{t=1}^n {\bar{y}}_{tl}(Z_{t1} - \pi _t)\big ] \\&= \sum _{i=1}^n {\bar{y}}_{ij}{\bar{y}}_{il} \tau _{i1}(1-\tau _{i1}) + \big (\sum _{i=1}^n {\bar{y}}_{ij}(\tau _{i1}-\pi _i)\big ) \big (\sum _{i=1}^n {\bar{y}}_{il}(\tau _{i1}-\pi _i)\big ) \\&= {\varvec{Y}}_{\varvec{\tau }}[,j]\cdot \varvec{Y}[,l] + {\mathcal {V}}[j,l] \end{aligned}$$

according to the notation introduced in Sect. 4 (notice that in the second to last row of the above identity, one uses the fact that the $Z_{i1}$’s are independent Bernoulli distributed, given $\varvec{R}=\varvec{r}$, with probability parameter given by $\tau _{1i}$. Similar steps can be easily pursued for the other blocks of the matrix (18).

Finally, the score vector for the incomplete data problem is obtained by taking the first partial derivatives in (3) with respect to $\beta _j$, $j=0,\ldots ,p$ and $\gamma _l$, $l=0,\ldots ,q$:

$$\begin{aligned} \dfrac{\partial \ell (\varvec{\theta })}{\partial \beta _j} = \sum _{i=1}^n {\bar{y}}_{ij} \big ( \tau _i - \pi _i \big ), \qquad \dfrac{\partial \ell (\varvec{\theta })}{\partial \gamma _l} = -\sum _{i=1}^n {\bar{w}}_{il} \,\tau _i\, a_i \end{aligned}$$

with $a_i = 1- r_i + (m-1)(1-\xi _i)$. Accordingly, matrix (19) is obtained from the column-by-row product of the incomplete score vector with its transpose.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Simone, R. An accelerated EM algorithm for mixture models with uncertainty for rating data. Comput Stat 36, 691–714 (2021). https://doi.org/10.1007/s00180-020-01004-z

Download citation

Received: 08 November 2018
Accepted: 16 June 2020
Published: 22 June 2020
Issue Date: March 2021
DOI: https://doi.org/10.1007/s00180-020-01004-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An accelerated EM algorithm for mixture models with uncertainty for rating data

Abstract

Access this article

Similar content being viewed by others

Mixture of shifted binomial distributions for rating data

Testing distributional assumptions in CUB models for the analysis of rating data

Model-based clustering via mixtures of unrestricted skew normal factor analyzers with complete and incomplete data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix: the EM algorithm for CUB models

Appendix: Louis’ identity for CUB models

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An accelerated EM algorithm for mixture models with uncertainty for rating data

Abstract

Access this article

Similar content being viewed by others

Mixture of shifted binomial distributions for rating data

Testing distributional assumptions in CUB models for the analysis of rating data

Model-based clustering via mixtures of unrestricted skew normal factor analyzers with complete and incomplete data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix: the EM algorithm for CUB models

Appendix: Louis’ identity for CUB models

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation