Skip to main content

Standard and Novel Model Selection Criteria in the Pairwise Likelihood Estimation of a Mixture Model for Ordinal Data

  • Conference paper
  • First Online:
Analysis of Large and Complex Data

Abstract

The model selection in a mixture setting was extensively studied in literature in order to assess the number of components. There exist different classes of criteria; we focus on those penalizing the log-likelihood with a penalty term, that accounts for model complexity. However, a full likelihood is not always computationally feasible. To overcome this issue, the likelihood is replaced with a surrogate objective function. Thus, a question arises naturally: how the use of a surrogate objective function affects the definition of model selection criteria? The model selection and the model estimation are distinct issues. Even if it is not possible to establish a cause and effect relationship between them, they are linked to each other by the likelihood. In both cases, we need to approximate the likelihood; to this purpose, it is computationally efficient to use the same surrogate function. The aim of this paper is not to provide an exhaustive survey of model selection, but to show the main used criteria in a standard mixture setting and how they can be adapted to a non-standard context. In the last decade two criteria based on the observed composite likelihood were introduced. Here, we propose some new extensions of the standard criteria based on the expected complete log-likelihood to the non-standard context of a pairwise likelihood approach. The main advantage is a less demanding and more stable estimation. Finally, a simulation study is conducted to test and compare the performances of the proposed criteria with those existing in literature. As discussed in detail in Sect. 7, the novel criteria work very well in all scenarios considered.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In Second International Symposium on Information Theory. Akademinai Kiado (pp. 267–281).

    Google Scholar 

  • Akaike, H. (1978). A bayesian analysis of the minimum AIC procedure. Annals of the Institute of Statistical Mathematics, 30(1), 9–14.

    Article  MathSciNet  MATH  Google Scholar 

  • Banfield, J. D., & Raftery, A. E. (1993). Model-based Gaussian and non-Gaussian clustering. Biometrics, 49(3), 803–821.

    Article  MathSciNet  MATH  Google Scholar 

  • Biernacki, C., Celeux, G., & Govaert, G. (1999). An improvement of the NEC criterion for assessing the number of clusters in a mixture model. Pattern Recognition Letters, 20(3), 267–272.

    Article  MATH  Google Scholar 

  • Biernacki, C., Celeux, G., & Govaert, G. (2000). Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(7), 719–725.

    Article  Google Scholar 

  • Biernacki, C., & Govaert, G. (1997). Using the classification likelihood to choose the number of clusters. Computing Science and Statistics, 29, 451–457.

    Google Scholar 

  • Böhning, D., Dietz, E., Schaub, R., Schlattmann, P., & Lindsay, B. G. (1994). The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family. Annals of the Institute of Statistical Mathematics, 46(2), 373–388.

    Article  MATH  Google Scholar 

  • Bozdogan, H. (1983). Determining the Number of Component Clusters in the Standard Multivariate Normal Mixture Model Using Model-Selection Criteria. No. UIC/DQM/A83-1. Illinois Univ at Chicago Circle Dept of Quantitative Methods.

    Google Scholar 

  • Bozdogan, H. (1993). Choosing the number of component clusters in the mixture-model using a new informational complexity criterion of the inverse-Fisher information matrix. In O. Opitz, B. Lausen, & R. Klar Information and classification. Studies in classification, data analysis and knowledge organization (pp. 40–54). Berlin, Heidelberg: Springer.

    Google Scholar 

  • Celeux, G., & Soromenho, G. (1996). An entropy criterion for assessing the number of clusters in a mixture model. Journal of classification 13(2), 195–212.

    Google Scholar 

  • Everitt, B. (1988). A finite mixture model for the clustering of mixed-mode data. Statistics & Probability Letters, 6(5), 305–309.

    Google Scholar 

  • Everitt, B., & Merette, C. (1990). The clustering of mixed-mode data: a comparison of possible approaches. Journal of Applied Statistics, 17(3), 283–297.

    Article  Google Scholar 

  • Gao, X., & Song, P. X. K. (2010). Composite likelihood bayesian information criteria for model selection in high-dimensional data. Journal of the American Statistical Association, 105(492), 1531–1540.

    Article  MathSciNet  MATH  Google Scholar 

  • Hathaway, R. J. (1986). Another interpretation of the EM algorithm for mixture distributions. Statistics and Probability Letters, 4(2), 53–56.

    Article  MathSciNet  MATH  Google Scholar 

  • Keribin, C. (2000). Consistent estimation of the order of mixture models. Sankhya: The Indian Journal of Statistics, Series A, 62, 49–66.

    MathSciNet  MATH  Google Scholar 

  • Leroux, B. G. (1992). Consistent estimation of a mixing distribution. The Annals of Statistics, 20(3), 1350–1360.

    Article  MathSciNet  MATH  Google Scholar 

  • Liang, Z., Jaszczak, R. J., & Coleman, R. E. (1992). Parameter estimation of finite mixtures using the EM algorithm and information criteria with application to medical image processing. IEEE Transactions on Nuclear Science, 39(4), 1126–1133.

    Article  Google Scholar 

  • Lindsay, B. G. (1983). Efficiency of the conditional score in a mixture setting. The Annals of Statistics, 11, 486–497.

    Article  MathSciNet  MATH  Google Scholar 

  • Lindsay, B. G. (1988). Composite likelihood methods. Contemporary Mathematics, 80, 221–239.

    Article  MathSciNet  MATH  Google Scholar 

  • Lubke, G., & Neale, M. (2008). Distinguishing between latent classes and continuous factors with categorical outcomes: Class invariance of parameters of factor mixture models. Multivariate Behavioral Research, 43(4), 592–620.

    Article  Google Scholar 

  • Mclachlan, G. J. (1987): On bootstrapping the likelihood ratio test statistic for the number of components in a normal mixture. Applied Statistics, 36, 318–324.

    Article  Google Scholar 

  • Ranalli, M., & Rocci, R. (2014). Mixture models for ordinal data: a pairwise likelihood approach. Statistics and Computing. doi: 10.1007/s11222-014-9543-4.

    MATH  Google Scholar 

  • Sawa, T. (1978). Information criteria for discriminating among alternative regression models. Econometrica: Journal of the Econometric Society, 46, 1273–1291.

    Article  MathSciNet  MATH  Google Scholar 

  • Schwarz, G. et al. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461–464.

    Article  MathSciNet  MATH  Google Scholar 

  • Shibata, R. (1980). Asymptotically efficient selection of the order of the model for estimating parameters of a linear process. The Annals of Statistics, 8, 147–164.

    Article  MathSciNet  MATH  Google Scholar 

  • Sugiura, N. (1978). Further analysts of the data by Akaike’s information criterion and the finite corrections. Communications in Statistics-Theory and Methods, 7(1), 13–26.

    Article  MathSciNet  MATH  Google Scholar 

  • Takeuchi, K. (1976). Distribution of informational statistics and a criterion of model fitting. Mathematical Sciences, 153, 1, 12–18.

    Google Scholar 

  • Varin, C., Reid, N., & Firth, D. (2011). An overview of composite likelihood methods. Statistica Sinica, 21(1), 1–41.

    MathSciNet  MATH  Google Scholar 

  • Varin, C., & Vidoni, P. (2005) A note on composite likelihood inference and model selection. Biometrika, 92(3), 519–528.

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Appendix

Appendix

Maximizing the observed pairwise log-likelihood is equivalent to maximize the fuzzy classification pairwise log-likelihood. This partially justifies the behaviour of the criteria based on the expected complete pairwise log-likelihood. In this appendix we derive the pairwise EN term. This is useful to two things: if we define the pairwise EN, the criteria based on the expected complete pairwise log-likelihood can be seen as the observed pairwise likelihood penalized by the pairwise EN term. Moreover, it gives us an idea about the separation between the mixture components.

$$\displaystyle\begin{array}{rcl} p\ell(\boldsymbol{\psi };\mathbf{X})& =& \sum _{i=1}^{P-1}\sum _{ j=i+1}^{P}\sum _{ c_{i}=1}^{C_{i} }\sum _{c_{j}=1}^{C_{j} }n_{c_{i}c_{j}}^{(ij)}\log \left [\sum _{ h=1}^{G}p_{ h}\boldsymbol{\pi }_{c_{i}c_{j}}^{(ij)}(\boldsymbol{\mu }_{ h},\boldsymbol{\varSigma }_{h},\boldsymbol{\gamma })\right ] {}\\ & =& \sum _{i=1}^{P-1}\sum _{ j=i+1}^{P}\sum _{ c_{i}=1}^{C_{i} }\sum _{c_{j}=1}^{C_{j} }n_{c_{i}c_{j}}^{(ij)}\sum _{ g=1}^{G}p_{ c_{i}c_{j};g}^{(ij)}\log \left [\sum _{ h=1}^{G}p_{ h}\boldsymbol{\pi }_{c_{i}c_{j}}^{(ij)}(\boldsymbol{\mu }_{ h},\boldsymbol{\varSigma }_{h},\boldsymbol{\gamma })\right ] {}\\ & =& \sum _{i=1}^{P-1}\sum _{ j=i+1}^{P}\sum _{ c_{i}=1}^{C_{i} }\sum _{c_{j}=1}^{C_{j} }n_{c_{i}c_{j}}^{(ij)}\sum _{ g=1}^{G}p_{ c_{i}c_{j};g}^{(ij)}\log \left [\sum _{ h=1}^{G}p_{ h}\boldsymbol{\pi }_{c_{i}c_{j}}^{(ij)}(\boldsymbol{\mu }_{ h},\boldsymbol{\varSigma }_{h},\boldsymbol{\gamma })\right ] {}\\ & & +\sum _{i=1}^{P-1}\sum _{ j=i+1}^{P}\sum _{ c_{i}=1}^{C_{i} }\sum _{c_{j}=1}^{C_{j} }n_{c_{i}c_{j}}^{(ij)}\sum _{ g=1}^{G}p_{ c_{i}c_{j};g}^{(ij)}\log \left [p_{ g}\boldsymbol{\pi }_{c_{i}c_{j}}^{(ij)}(\boldsymbol{\mu }_{ g},\boldsymbol{\varSigma }_{g},\boldsymbol{\gamma })\right ] {}\\ & & -\sum _{i=1}^{P-1}\sum _{ j=i+1}^{P}\sum _{ c_{i}=1}^{C_{i} }\sum _{c_{j}=1}^{C_{j} }n_{c_{i}c_{j}}^{(ij)}\sum _{ g=1}^{G}p_{ c_{i}c_{j};g}^{(ij)}\log \left [p_{ g}\boldsymbol{\pi }_{c_{i}c_{j}}^{(ij)}(\boldsymbol{\mu }_{ g},\boldsymbol{\varSigma }_{g},\boldsymbol{\gamma })\right ] {}\\ & =& \sum _{i=1}^{P-1}\sum _{ j=i+1}^{P}\sum _{ c_{i}=1}^{C_{i} }\sum _{c_{j}=1}^{C_{j} }n_{c_{i}c_{j}}^{(ij)}\sum _{ g=1}^{G}p_{ c_{i}c_{j};g}^{(ij)}\log \left [p_{ g}\boldsymbol{\pi }_{c_{i}c_{j}}^{(ij)}(\boldsymbol{\mu }_{ g},\boldsymbol{\varSigma }_{g},\boldsymbol{\gamma })\right ] {}\\ & & -\sum _{i=1}^{P-1}\sum _{ j=i+1}^{P}\sum _{ c_{i}=1}^{C_{i} }\sum _{c_{j}=1}^{C_{j} }n_{c_{i}c_{j}}^{(ij)}\sum _{ g=1}^{G}p_{ c_{i}c_{j};g}^{(ij)}\biggl [\log \left (p_{ g}\boldsymbol{\pi }_{c_{i}c_{j}}^{(ij)}(\boldsymbol{\mu }_{ g},\boldsymbol{\varSigma }_{g},\boldsymbol{\gamma })\right ) {}\\ & & -\log \sum _{h=1}^{G}p_{ h}\boldsymbol{\pi }_{c_{i}c_{j}}^{(ij)}(\boldsymbol{\mu }_{ h},\boldsymbol{\varSigma }_{h},\boldsymbol{\gamma })\biggr ] {}\\ & =& \sum _{i=1}^{P-1}\sum _{ j=i+1}^{P}\sum _{ c_{i}=1}^{C_{i} }\sum _{c_{j}=1}^{C_{j} }n_{c_{i}c_{j}}^{(ij)}\sum _{ g=1}^{G}p_{ c_{i}c_{j};g}^{(ij)}\log \left [p_{ g}\boldsymbol{\pi }_{c_{i}c_{j}}^{(ij)}(\boldsymbol{\mu }_{ g},\boldsymbol{\varSigma }_{g},\boldsymbol{\gamma })\right ] {}\\ & & -\sum _{i=1}^{P-1}\sum _{ j=i+1}^{P}\sum _{ c_{i}=1}^{C_{i} }\sum _{c_{j}=1}^{C_{j} }n_{c_{i}c_{j}}^{(ij)}\sum _{ g=1}^{G}p_{ c_{i}c_{j};g}^{(ij)}\log \dfrac{p_{g}\boldsymbol{\pi }_{c_{i}c_{j}}^{(ij)}(\boldsymbol{\mu }_{g},\boldsymbol{\varSigma }_{g},\boldsymbol{\gamma })} {\sum _{h=1}^{G}p_{h}\boldsymbol{\pi }_{c_{i}c_{j}}^{(ij)}(\boldsymbol{\mu }_{h},\boldsymbol{\varSigma }_{h},\boldsymbol{\gamma })} {}\\ & =& p\ell_{c}(\boldsymbol{\psi };\mathbf{X}) - EN_{pl}(\mathbf{p}) {}\\ \end{array}$$

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Ranalli, M., Rocci, R. (2016). Standard and Novel Model Selection Criteria in the Pairwise Likelihood Estimation of a Mixture Model for Ordinal Data. In: Wilhelm, A., Kestler, H. (eds) Analysis of Large and Complex Data. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-319-25226-1_5

Download citation

Publish with us

Policies and ethics