Standard and Novel Model Selection Criteria in the Pairwise Likelihood Estimation of a Mixture Model for Ordinal Data

Ranalli, Monia; Rocci, Roberto

doi:10.1007/978-3-319-25226-1_5

Monia Ranalli^20,21 &
Roberto Rocci²²

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

2249 Accesses
3 Citations

Abstract

The model selection in a mixture setting was extensively studied in literature in order to assess the number of components. There exist different classes of criteria; we focus on those penalizing the log-likelihood with a penalty term, that accounts for model complexity. However, a full likelihood is not always computationally feasible. To overcome this issue, the likelihood is replaced with a surrogate objective function. Thus, a question arises naturally: how the use of a surrogate objective function affects the definition of model selection criteria? The model selection and the model estimation are distinct issues. Even if it is not possible to establish a cause and effect relationship between them, they are linked to each other by the likelihood. In both cases, we need to approximate the likelihood; to this purpose, it is computationally efficient to use the same surrogate function. The aim of this paper is not to provide an exhaustive survey of model selection, but to show the main used criteria in a standard mixture setting and how they can be adapted to a non-standard context. In the last decade two criteria based on the observed composite likelihood were introduced. Here, we propose some new extensions of the standard criteria based on the expected complete log-likelihood to the non-standard context of a pairwise likelihood approach. The main advantage is a less demanding and more stable estimation. Finally, a simulation study is conducted to test and compare the performances of the proposed criteria with those existing in literature. As discussed in detail in Sect. 7, the novel criteria work very well in all scenarios considered.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Softcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In Second International Symposium on Information Theory. Akademinai Kiado (pp. 267–281).
Google Scholar
Akaike, H. (1978). A bayesian analysis of the minimum AIC procedure. Annals of the Institute of Statistical Mathematics, 30(1), 9–14.
Article MathSciNet MATH Google Scholar
Banfield, J. D., & Raftery, A. E. (1993). Model-based Gaussian and non-Gaussian clustering. Biometrics, 49(3), 803–821.
Article MathSciNet MATH Google Scholar
Biernacki, C., Celeux, G., & Govaert, G. (1999). An improvement of the NEC criterion for assessing the number of clusters in a mixture model. Pattern Recognition Letters, 20(3), 267–272.
Article MATH Google Scholar
Biernacki, C., Celeux, G., & Govaert, G. (2000). Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(7), 719–725.
Article Google Scholar
Biernacki, C., & Govaert, G. (1997). Using the classification likelihood to choose the number of clusters. Computing Science and Statistics, 29, 451–457.
Google Scholar
Böhning, D., Dietz, E., Schaub, R., Schlattmann, P., & Lindsay, B. G. (1994). The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family. Annals of the Institute of Statistical Mathematics, 46(2), 373–388.
Article MATH Google Scholar
Bozdogan, H. (1983). Determining the Number of Component Clusters in the Standard Multivariate Normal Mixture Model Using Model-Selection Criteria. No. UIC/DQM/A83-1. Illinois Univ at Chicago Circle Dept of Quantitative Methods.
Google Scholar
Bozdogan, H. (1993). Choosing the number of component clusters in the mixture-model using a new informational complexity criterion of the inverse-Fisher information matrix. In O. Opitz, B. Lausen, & R. Klar Information and classification. Studies in classification, data analysis and knowledge organization (pp. 40–54). Berlin, Heidelberg: Springer.
Google Scholar
Celeux, G., & Soromenho, G. (1996). An entropy criterion for assessing the number of clusters in a mixture model. Journal of classification 13(2), 195–212.
Google Scholar
Everitt, B. (1988). A finite mixture model for the clustering of mixed-mode data. Statistics & Probability Letters, 6(5), 305–309.
Google Scholar
Everitt, B., & Merette, C. (1990). The clustering of mixed-mode data: a comparison of possible approaches. Journal of Applied Statistics, 17(3), 283–297.
Article Google Scholar
Gao, X., & Song, P. X. K. (2010). Composite likelihood bayesian information criteria for model selection in high-dimensional data. Journal of the American Statistical Association, 105(492), 1531–1540.
Article MathSciNet MATH Google Scholar
Hathaway, R. J. (1986). Another interpretation of the EM algorithm for mixture distributions. Statistics and Probability Letters, 4(2), 53–56.
Article MathSciNet MATH Google Scholar
Keribin, C. (2000). Consistent estimation of the order of mixture models. Sankhya: The Indian Journal of Statistics, Series A, 62, 49–66.
MathSciNet MATH Google Scholar
Leroux, B. G. (1992). Consistent estimation of a mixing distribution. The Annals of Statistics, 20(3), 1350–1360.
Article MathSciNet MATH Google Scholar
Liang, Z., Jaszczak, R. J., & Coleman, R. E. (1992). Parameter estimation of finite mixtures using the EM algorithm and information criteria with application to medical image processing. IEEE Transactions on Nuclear Science, 39(4), 1126–1133.
Article Google Scholar
Lindsay, B. G. (1983). Efficiency of the conditional score in a mixture setting. The Annals of Statistics, 11, 486–497.
Article MathSciNet MATH Google Scholar
Lindsay, B. G. (1988). Composite likelihood methods. Contemporary Mathematics, 80, 221–239.
Article MathSciNet MATH Google Scholar
Lubke, G., & Neale, M. (2008). Distinguishing between latent classes and continuous factors with categorical outcomes: Class invariance of parameters of factor mixture models. Multivariate Behavioral Research, 43(4), 592–620.
Article Google Scholar
Mclachlan, G. J. (1987): On bootstrapping the likelihood ratio test statistic for the number of components in a normal mixture. Applied Statistics, 36, 318–324.
Article Google Scholar
Ranalli, M., & Rocci, R. (2014). Mixture models for ordinal data: a pairwise likelihood approach. Statistics and Computing. doi: 10.1007/s11222-014-9543-4.
MATH Google Scholar
Sawa, T. (1978). Information criteria for discriminating among alternative regression models. Econometrica: Journal of the Econometric Society, 46, 1273–1291.
Article MathSciNet MATH Google Scholar
Schwarz, G. et al. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461–464.
Article MathSciNet MATH Google Scholar
Shibata, R. (1980). Asymptotically efficient selection of the order of the model for estimating parameters of a linear process. The Annals of Statistics, 8, 147–164.
Article MathSciNet MATH Google Scholar
Sugiura, N. (1978). Further analysts of the data by Akaike’s information criterion and the finite corrections. Communications in Statistics-Theory and Methods, 7(1), 13–26.
Article MathSciNet MATH Google Scholar
Takeuchi, K. (1976). Distribution of informational statistics and a criterion of model fitting. Mathematical Sciences, 153, 1, 12–18.
Google Scholar
Varin, C., Reid, N., & Firth, D. (2011). An overview of composite likelihood methods. Statistica Sinica, 21(1), 1–41.
MathSciNet MATH Google Scholar
Varin, C., & Vidoni, P. (2005) A note on composite likelihood inference and model selection. Biometrika, 92(3), 519–528.
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics, The Pennsylvania State University, University Park, State College, PA, USA
Monia Ranalli
Sapienza, University of Rome, Rome, Italy
Monia Ranalli
Department of Economics and Finance, University of Tor Vergata, Rome, Italy
Roberto Rocci

Authors

Monia Ranalli
View author publications
You can also search for this author in PubMed Google Scholar
Roberto Rocci
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Jacobs University Bremen , Bremen, Germany
Adalbert F.X. Wilhelm
Universität Ulm, Institute of Medical Systems Biology Universität Ulm, Ulm, Baden-Württemberg, Germany
Hans A. Kestler

Appendix

Maximizing the observed pairwise log-likelihood is equivalent to maximize the fuzzy classification pairwise log-likelihood. This partially justifies the behaviour of the criteria based on the expected complete pairwise log-likelihood. In this appendix we derive the pairwise EN term. This is useful to two things: if we define the pairwise EN, the criteria based on the expected complete pairwise log-likelihood can be seen as the observed pairwise likelihood penalized by the pairwise EN term. Moreover, it gives us an idea about the separation between the mixture components.

$$\displaystyle\begin{array}{rcl} p\ell(\boldsymbol{\psi };\mathbf{X})& =& \sum _{i=1}^{P-1}\sum _{ j=i+1}^{P}\sum _{ c_{i}=1}^{C_{i} }\sum _{c_{j}=1}^{C_{j} }n_{c_{i}c_{j}}^{(ij)}\log \left [\sum _{ h=1}^{G}p_{ h}\boldsymbol{\pi }_{c_{i}c_{j}}^{(ij)}(\boldsymbol{\mu }_{ h},\boldsymbol{\varSigma }_{h},\boldsymbol{\gamma })\right ] {}\\ & =& \sum _{i=1}^{P-1}\sum _{ j=i+1}^{P}\sum _{ c_{i}=1}^{C_{i} }\sum _{c_{j}=1}^{C_{j} }n_{c_{i}c_{j}}^{(ij)}\sum _{ g=1}^{G}p_{ c_{i}c_{j};g}^{(ij)}\log \left [\sum _{ h=1}^{G}p_{ h}\boldsymbol{\pi }_{c_{i}c_{j}}^{(ij)}(\boldsymbol{\mu }_{ h},\boldsymbol{\varSigma }_{h},\boldsymbol{\gamma })\right ] {}\\ & =& \sum _{i=1}^{P-1}\sum _{ j=i+1}^{P}\sum _{ c_{i}=1}^{C_{i} }\sum _{c_{j}=1}^{C_{j} }n_{c_{i}c_{j}}^{(ij)}\sum _{ g=1}^{G}p_{ c_{i}c_{j};g}^{(ij)}\log \left [\sum _{ h=1}^{G}p_{ h}\boldsymbol{\pi }_{c_{i}c_{j}}^{(ij)}(\boldsymbol{\mu }_{ h},\boldsymbol{\varSigma }_{h},\boldsymbol{\gamma })\right ] {}\\ & & +\sum _{i=1}^{P-1}\sum _{ j=i+1}^{P}\sum _{ c_{i}=1}^{C_{i} }\sum _{c_{j}=1}^{C_{j} }n_{c_{i}c_{j}}^{(ij)}\sum _{ g=1}^{G}p_{ c_{i}c_{j};g}^{(ij)}\log \left [p_{ g}\boldsymbol{\pi }_{c_{i}c_{j}}^{(ij)}(\boldsymbol{\mu }_{ g},\boldsymbol{\varSigma }_{g},\boldsymbol{\gamma })\right ] {}\\ & & -\sum _{i=1}^{P-1}\sum _{ j=i+1}^{P}\sum _{ c_{i}=1}^{C_{i} }\sum _{c_{j}=1}^{C_{j} }n_{c_{i}c_{j}}^{(ij)}\sum _{ g=1}^{G}p_{ c_{i}c_{j};g}^{(ij)}\log \left [p_{ g}\boldsymbol{\pi }_{c_{i}c_{j}}^{(ij)}(\boldsymbol{\mu }_{ g},\boldsymbol{\varSigma }_{g},\boldsymbol{\gamma })\right ] {}\\ & =& \sum _{i=1}^{P-1}\sum _{ j=i+1}^{P}\sum _{ c_{i}=1}^{C_{i} }\sum _{c_{j}=1}^{C_{j} }n_{c_{i}c_{j}}^{(ij)}\sum _{ g=1}^{G}p_{ c_{i}c_{j};g}^{(ij)}\log \left [p_{ g}\boldsymbol{\pi }_{c_{i}c_{j}}^{(ij)}(\boldsymbol{\mu }_{ g},\boldsymbol{\varSigma }_{g},\boldsymbol{\gamma })\right ] {}\\ & & -\sum _{i=1}^{P-1}\sum _{ j=i+1}^{P}\sum _{ c_{i}=1}^{C_{i} }\sum _{c_{j}=1}^{C_{j} }n_{c_{i}c_{j}}^{(ij)}\sum _{ g=1}^{G}p_{ c_{i}c_{j};g}^{(ij)}\biggl [\log \left (p_{ g}\boldsymbol{\pi }_{c_{i}c_{j}}^{(ij)}(\boldsymbol{\mu }_{ g},\boldsymbol{\varSigma }_{g},\boldsymbol{\gamma })\right ) {}\\ & & -\log \sum _{h=1}^{G}p_{ h}\boldsymbol{\pi }_{c_{i}c_{j}}^{(ij)}(\boldsymbol{\mu }_{ h},\boldsymbol{\varSigma }_{h},\boldsymbol{\gamma })\biggr ] {}\\ & =& \sum _{i=1}^{P-1}\sum _{ j=i+1}^{P}\sum _{ c_{i}=1}^{C_{i} }\sum _{c_{j}=1}^{C_{j} }n_{c_{i}c_{j}}^{(ij)}\sum _{ g=1}^{G}p_{ c_{i}c_{j};g}^{(ij)}\log \left [p_{ g}\boldsymbol{\pi }_{c_{i}c_{j}}^{(ij)}(\boldsymbol{\mu }_{ g},\boldsymbol{\varSigma }_{g},\boldsymbol{\gamma })\right ] {}\\ & & -\sum _{i=1}^{P-1}\sum _{ j=i+1}^{P}\sum _{ c_{i}=1}^{C_{i} }\sum _{c_{j}=1}^{C_{j} }n_{c_{i}c_{j}}^{(ij)}\sum _{ g=1}^{G}p_{ c_{i}c_{j};g}^{(ij)}\log \dfrac{p_{g}\boldsymbol{\pi }_{c_{i}c_{j}}^{(ij)}(\boldsymbol{\mu }_{g},\boldsymbol{\varSigma }_{g},\boldsymbol{\gamma })} {\sum _{h=1}^{G}p_{h}\boldsymbol{\pi }_{c_{i}c_{j}}^{(ij)}(\boldsymbol{\mu }_{h},\boldsymbol{\varSigma }_{h},\boldsymbol{\gamma })} {}\\ & =& p\ell_{c}(\boldsymbol{\psi };\mathbf{X}) - EN_{pl}(\mathbf{p}) {}\\ \end{array}$$

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ranalli, M., Rocci, R. (2016). Standard and Novel Model Selection Criteria in the Pairwise Likelihood Estimation of a Mixture Model for Ordinal Data. In: Wilhelm, A., Kestler, H. (eds) Analysis of Large and Complex Data. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-319-25226-1_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-25226-1_5
Published: 04 August 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25224-7
Online ISBN: 978-3-319-25226-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Standard and Novel Model Selection Criteria in the Pairwise Likelihood Estimation of a Mixture Model for Ordinal Data

Abstract

Access this chapter

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Appendix

Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation