Skip to main content
Log in

An accelerated EM algorithm for mixture models with uncertainty for rating data

  • Original paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

The paper is framed within the literature around Louis’ identity for the observed information matrix in incomplete data problems, with a focus on the implied acceleration of maximum likelihood estimation for mixture models. The goal is twofold: to obtain direct expressions for standard errors of parameters from the EM algorithm and to reduce the computational burden of the estimation procedure for a class of mixture models with uncertainty for rating variables. This achievement fosters the feasibility of best-subset variable selection, which is an advisable strategy to identify response patterns from regression models for all Mixtures of Experts systems. The discussion is supported by simulation experiments and a real case study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Agresti A (2010) Analysis of ordinal categorical data, 2nd edn. Wiley, Hoboken

    Book  Google Scholar 

  • Allik J (2014) A mixed-binomial model for Likert-type personality measure. Front Psychol 5:1–13

    Article  Google Scholar 

  • Baker SG (1992) A simple method for computing the observed information matrix when using the EM algorithm with categorical data. J Comput Graph Statist 1(1):63–76

    MathSciNet  Google Scholar 

  • Buckland ST, Burnham KP, Augustin NH (1997) Model selection: an integral part of inference. Biometrics 53:603–618

    Article  Google Scholar 

  • Burnham KP, Anderson DR (2003) Model selection and multimodel inference: a practical information-theoretic approach, 2nd edn. Springer, New York

    MATH  Google Scholar 

  • Capecchi S, Piccolo D (2017) Dealing with heterogeneity in ordinal responses. Qual Quant 51:2375–2393

    Article  Google Scholar 

  • Cappelli C, Simone R, Di Iorio F (2019) CUBREMOT: a tool for building model-based trees for ordinal responses. Expert Syst Appl 124:39–49

    Article  Google Scholar 

  • Colombi R, Giordano S (2016) A class of mixture models for multidimensional ordinal data. Statist Model 16(4):322–340

    Article  MathSciNet  Google Scholar 

  • Corduas M (2011) Assessing similarity of rating distributions by Kullback-Liebler divergence. In: Fichet A et al (eds) Classification and multivariate analysis for complex data structures, studies in classification, data analysis, and knowledge organization. Springer, Berlin, Heidelberg, pp 221–228

  • D’Elia A, Piccolo D (2005) A mixture model for preference data analysis. Comput. Stat. Data Ann. 49:917–934

    Article  MathSciNet  Google Scholar 

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Statist Soc Ser B 39(1):1–38

    MathSciNet  MATH  Google Scholar 

  • GESIS Leibniz Institute for the Social Sciences (2016) German General Social Survey (ALLBUS)—Cumulation 1980-2014, GESIS Data Archive, Cologne. ZA4584 Data file version 1.0.0. https://doi.org/10.4232/1.12574

  • Gormley IC, Frühwirth-Schnatter S (2019) Mixture of Experts Models, Chapter 12 In: Frühwirth-Schnatter, S, Gilles, C, Robert CP (eds) Handbook of mixture analysis, 1st edn, Chapman & Hall, CRC, Handbooks of Modern Statistical Methods, https://doi.org/10.1201/9780429055911

  • Gottard A, Iannario M, Piccolo D (2016) Varying uncertainty in cub  models. Adv Data Anal Classif 10(2):225–244

    Article  MathSciNet  Google Scholar 

  • Iannario M (2008) Selecting feeling covariates in rating surveys. Statist Appl 20(2):121–134

    Google Scholar 

  • Iannario M (2010) On the identifiability of a mixture model for ordinal data. Metron LXVIII(1):87–94

    Article  MathSciNet  Google Scholar 

  • Iannario M (2012) Preliminary estimators for a mixture model of ordinal data. Adv Data Anal Classif 6(3):163–184

    Article  MathSciNet  Google Scholar 

  • Iannario M, Monti AC, Piccolo D, Ronchetti E (2017) Robust inference for ordinal response models. Electron J Statist 11:3407–3445

    Article  MathSciNet  Google Scholar 

  • Iannario M, Piccolo D, Simone R (2018) CUB: a class of mixture models for ordinal data. (R package version 1.1.3), http://CRAN.R-project.org/package=CUB

  • Ibrahim JC (1990) Incomplete data in generalized linear models. J Am Statist Assoc 85:765–769

    Article  Google Scholar 

  • Louis TA (1976) Maximum likelihood estimation using pseudo-data interactions. Boston University Research Report, No, pp 2–76

  • Louis TA (1982) Finding the observed information matrix when using the EM algorithm. J R Statist Soc Ser B 44:226–233

    MathSciNet  MATH  Google Scholar 

  • Khalili A, Chen J (2007) Variable selection in finite mixture of regression models. J Am Statist Assoc 102(479):1025–1038

    Article  MathSciNet  Google Scholar 

  • Mahalanobis PC (1936) On the generalised distance in statistics. Proc National Inst Sci India 2(1):49–55

    MathSciNet  MATH  Google Scholar 

  • Manisera M, Zuccolotto P (2014) Modeling rating data with Non Linear CUB models. Comput Stat Data Ann 78:100–118

    Article  Google Scholar 

  • McCullagh P (1980) Regression models for ordinal data. J R Statist Soc Ser B 42(2):109–142

    MathSciNet  MATH  Google Scholar 

  • McLachlan GJ, Krishnan T (1997) The EM Algorithm and Extensions, 2nd edn, Wiley Series in Probability and Statistics

  • Meilijson I (1989) A fast improvement of the EM algorithm on its own terms. J R Statist Soc Ser B 51:127–138

    MathSciNet  MATH  Google Scholar 

  • Meng X, Rubin DB (1991) Using EM to obtain asymptotic variance-covariance matrices: the SEM algorithm. J Am Statist Assoc 86(416):899–909

    Article  Google Scholar 

  • Miller K (1981) On the inverse of the sum of matrices. Math Mag 54(2):67–72

    Article  MathSciNet  Google Scholar 

  • Oakes D (1999) Direct calculation of the information matrix via the EM. J R Statist Soc Ser B 61(2):479–482

    Article  MathSciNet  Google Scholar 

  • Orchard T, Woodbury MA (1972) A missing information principle: theory and applications, Proc. Sixth Berkeley Symp. on Math. Stat. and Prob. 1, Univ. of Calif. Press, 697–715

  • Piccolo D (2003) On the moments of a mixture of uniform and shifted binomial random variables. Quaderni di Statistica 5:85–104

    Google Scholar 

  • Piccolo D (2006) Observed information matrix for MUB models. Quaderni di Statistica 8:33–78

    Google Scholar 

  • Piccolo D, Simone R (2019a) The class of cub models: statistical foundations, inferential issues and empirical evidence. Statist Method Appl 28:389–435 (with discussions)

    Article  MathSciNet  Google Scholar 

  • Piccolo D, Simone R (2019b) Rejoinder to the discussion of The class of cub  models: statistical foundations, inferential issues and empirical evidence. Statist Method Appl 28:477–493

    Article  Google Scholar 

  • Piccolo D, Simone R, Iannario M (2019) Cumulative and cub  models for rating data: a comparative analysis. Int Statist Rev 87(2):207–236

    Article  MathSciNet  Google Scholar 

  • Pinto da Costa JF, Alonso H, Cardoso JS (2008) The unimodal model for the classification of ordinal data. Neural Networks, 21, 78–91. Corrigendum in: (2014). Neural Networks, 59, 73–75

  • Simone R (2020) FastCUB: Fast EM and Best-Subset Selection for CUB Models for Rating Data. R package version 0.0.2. https://CRAN.R-project.org/package=FastCUB

  • Simone R, Cappelli C, Di Iorio F (2019) Modelling marginal ranking distributions: the uncertainty tree. Pattern Recognit Lett 125(1):278–288

    Article  Google Scholar 

  • Simone R, Tutz G (2018) Modelling uncertainty and response styles in ordinal data. Statist Neerlandica 72(3):224–245

    Article  MathSciNet  Google Scholar 

  • Simone R, Tutz G, Iannario M (2020) Subjective heterogeneity in response attitude for multivariate ordinal outcomes. Econ Statist 14:145–158

    MathSciNet  Google Scholar 

  • Sundberg R (1976) An iterative method for solution of the likelihood equations for incomplete data from exponential families. Commun Statist Simul Comput B5(1):55–64

    Article  MathSciNet  Google Scholar 

  • Tibshirani R (1996) Regression Shrinkage and Selection via the LASSO. J R Statist Soc Ser B 58:267–288

    MathSciNet  MATH  Google Scholar 

  • Tutz G (2012) Regression for categorical data. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  • Zhou H, Lange K (2009) Rating movies and rating the raters who rate them. Am Stat 63:297–307

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The research has been partially funded by the ‘cub  Regression Model Trees project’ (project number: 000025_ALTRI_DR_1043_2017-C-CAPPELLI) of the University of Naples Federico II, Italy.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rosaria Simone.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix: the EM algorithm for CUB models

Given the notation set in Sects. 2 and 4, for a sample \(\varvec{R} = (R_1,\ldots ,R_n)^{\prime }\) of ordinal scores to be collected on a scale with m categories, consider the full cub specification with covariates given in (1)–(2). Then, \(\varvec{R}\) denotes the so-called incomplete data; let \(\varvec{X} = (\varvec{R}^{\prime }, \varvec{Z}^{\prime })^{\prime }\) be the complete data, with missing data \(\varvec{Z} = (Z_{1},\ldots , Z_n)^{\prime }\) given by:

$$\begin{aligned} Z_{i}= {\left\{ \begin{array}{ll} 1 &{}\quad \text {if the i-th observation is drawn from the feeling component} \\ 0 &{}\quad \text {otherwise}. \end{array}\right. } \end{aligned}$$

To be more specific, one should set \(\varvec{Z}_{1} = \varvec{Z}\) and \(\varvec{Z}_{2} = 1 - \varvec{Z}_1\) for the uncertainty component. Then, with obvious notation, consider the complete log-likelihood:

$$\begin{aligned} l_c(\varvec{\theta }; \varvec{R}, \varvec{Z}) = \sum _{i=1}^n Z_{1i}\, \log \big (\pi _i\;b_{R_i}(\xi _i) \big ) + \sum _{i=1}^n (1-Z_{1i})\, \log \big ((1-\pi _i)\dfrac{1}{m} \big ). \end{aligned}$$
(23)

At the k-th iteration and for realization \(\varvec{r} = (r_1,\ldots ,r_n)^{\prime }\) of \(\varvec{R}\), the procedure first computes the posterior probabilities that each observation is drawn from each component: In particular:

$$\begin{aligned} \tau _{1i}^{(k)}= & {} \dfrac{\pi _i^{(k)} b_{r_i}(\xi _i^{(k)})}{Pr(R_i=r_i \vert \varvec{\theta }^{(k)}, \varvec{y}_i, \varvec{w}_i)}, \qquad \end{aligned}$$
(24)
$$\begin{aligned} \tau _{2\,i}^{(k)}= & {} 1- \tau _{1\,i}^{(k)} = \dfrac{1}{m}\dfrac{1-\pi _i^{(k)}}{Pr(R_i=r_i \vert \varvec{\theta }^{(k)}, \varvec{y}_i, \varvec{w}_i)} \end{aligned}$$
(25)

where one sets:

$$\begin{aligned} {\text {logit}}(\pi _i^{(k)}) = \bar{\varvec{y}_i}\, \varvec{\beta }^{(k)} \qquad {\text {logit}}(\xi _i^{(k)}) = \bar{\varvec{w}_i}\, \varvec{\gamma }^{(k)} \;\qquad i=1,\ldots ,n. \end{aligned}$$

Thus, at the k-th step, the conditional expectation of the complete log-likelihood (23) to be maximized over \(\varvec{\theta }\) is given by:

$$\begin{aligned} Q(\varvec{\theta }; \varvec{\theta }^{(k)})= & {} {\mathbb {E}}_{\varvec{\theta }^{(k)}}[l_c(\varvec{\theta }; \varvec{R}, \varvec{Z})| \varvec{R} = \varvec{r}] \\= & {} \sum _{i=1}^{n} \tau _{1i}^{(k)} \log (\pi _i(\varvec{\beta })) + \sum _{i=1}^{n}(1- \tau _{1i}^{(k)}) \log (1 - \pi _i(\varvec{\beta })) \\&\qquad + \sum _{i=1}^{n} \tau _{1i}^{(k)} \log (b_{r_i}(\xi _i(\varvec{\gamma }))) \; + \;\sum _{i=1}^{n} (1- \tau _{1i}^{(k)}) \log \big (\dfrac{1}{m}\big ) \end{aligned}$$

yielding to the updated estimate \(\varvec{\theta }^{(k+1)}\). These steps are iterated until convergence is attained within a certain numerical tolerance. The Nelder–Mead algorithm is considered for the optimization steps in \(\varvec{\theta } = (\varvec{\beta },\varvec{\gamma })^{\prime }\).

Appendix: Louis’ identity for CUB models

In full generality, consider cub models specification with \(p\ge 0\), \(q\ge 0\) covariates for uncertainty and feeling parameters, respectively. Notice that, since \({\text {logit}}(\pi _i) = \bar{\varvec{y}}_i \cdot \varvec{\beta }\) and \({\text {logit}}(\xi _i) = \bar{\varvec{w}}_i \cdot \varvec{\gamma }\), then

$$\begin{aligned} \dfrac{\partial \pi _i}{\beta _j} = {\bar{y}}_{ij}\,\pi _i\,(1-\pi _i), \qquad \dfrac{\partial \xi _i}{\gamma _j} = {\bar{w}}_{ij}\,\xi _i\,(1-\xi _i). \end{aligned}$$

For the complete information matrix (17), the first derivatives with respect to \(\beta _j\) and \(\gamma _l\) of the complete log-likelihood (23) are given by:

$$\begin{aligned} \dfrac{\partial \ell _c(\varvec{\theta })}{\partial \beta _j} = \sum _{i=1}^n {\bar{y}}_{ij} \big (Z_{i1} - \pi _i\big ), \qquad \dfrac{\partial \ell _c(\varvec{\theta })}{\partial \gamma _l} = \sum _{i=1}^n {\bar{w}}_{il} \,Z_{i1}\big ( m - R_i - \xi _i\,(m-1) \big )\quad \end{aligned}$$
(26)

from which it follows:

$$\begin{aligned}&\dfrac{\partial ^2 \ell _c(\varvec{\theta })}{\partial \beta _j\, \partial \beta _k} = - \sum _{i=1}^n {\bar{y}}_{ij}\,{\bar{y}}_{ik} \,\pi _i\,(1-\pi _i), \\&\quad \dfrac{\partial ^2 \ell _c(\varvec{\theta })}{\partial \gamma _h\partial \gamma _l} = - (m-1) \sum _{i=1}^n {\bar{w}}_{il} {\bar{w}}_{ih}\,Z_{i1}\,\xi _i (1-\xi _i) \end{aligned}$$

Then, taking the conditional expectation given \(\varvec{R} = \varvec{r}\) of the negative of these second-order derivatives, the block-wise definition in (17) follows. Then, the matrix specification (19) in Louis’ identity follows straightforwardly since \({\mathbb {E}}[Z_{i1}|\varvec{R}=\varvec{r}] = \tau _i\).

Starting from the complete score vector (26), matrix \({\mathcal {V}}_c\) in (18) can be obtained as follows:

$$\begin{aligned} {\mathcal {V}}_c[j,l]&= {\mathbb {E}}\big [ \big ( \sum _{i=1}^n {\bar{y}}_{ij}(Z_{i1} - \pi _i)\big )\big (\sum _{t=1}^n {\bar{y}}_{tl}(Z_{t1} - \pi _t)\big ) \big | \varvec{R} = \varvec{r} \big ] \\&= \sum _{i=1}^n \sum _{t=1}^n {\bar{y}}_{ij}{\bar{y}}_{tl}Cov(Z_{i1}-\pi _i,Z_{t1} - \pi _t\big | \varvec{R} = \varvec{r} \big ) + \\&\quad + {\mathbb {E}}\big [\sum _{i=1}^n {\bar{y}}_{ij}(Z_{i1} - \pi _i)\big ]\,{\mathbb {E}}\big [\sum _{t=1}^n {\bar{y}}_{tl}(Z_{t1} - \pi _t)\big ] \\&= \sum _{i=1}^n {\bar{y}}_{ij}{\bar{y}}_{il} \tau _{i1}(1-\tau _{i1}) + \big (\sum _{i=1}^n {\bar{y}}_{ij}(\tau _{i1}-\pi _i)\big ) \big (\sum _{i=1}^n {\bar{y}}_{il}(\tau _{i1}-\pi _i)\big ) \\&= {\varvec{Y}}_{\varvec{\tau }}[,j]\cdot \varvec{Y}[,l] + {\mathcal {V}}[j,l] \end{aligned}$$

according to the notation introduced in Sect. 4 (notice that in the second to last row of the above identity, one uses the fact that the \(Z_{i1}\)’s are independent Bernoulli distributed, given \(\varvec{R}=\varvec{r}\), with probability parameter given by \(\tau _{1i}\). Similar steps can be easily pursued for the other blocks of the matrix (18).

Finally, the score vector for the incomplete data problem is obtained by taking the first partial derivatives in (3) with respect to \(\beta _j\), \(j=0,\ldots ,p\) and \(\gamma _l\), \(l=0,\ldots ,q\):

$$\begin{aligned} \dfrac{\partial \ell (\varvec{\theta })}{\partial \beta _j} = \sum _{i=1}^n {\bar{y}}_{ij} \big ( \tau _i - \pi _i \big ), \qquad \dfrac{\partial \ell (\varvec{\theta })}{\partial \gamma _l} = -\sum _{i=1}^n {\bar{w}}_{il} \,\tau _i\, a_i \end{aligned}$$

with \(a_i = 1- r_i + (m-1)(1-\xi _i)\). Accordingly, matrix (19) is obtained from the column-by-row product of the incomplete score vector with its transpose.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Simone, R. An accelerated EM algorithm for mixture models with uncertainty for rating data. Comput Stat 36, 691–714 (2021). https://doi.org/10.1007/s00180-020-01004-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-020-01004-z

Keywords

Navigation