Abstract
A novel criterion for estimating a latent partition of the observed groups based on the output of a hierarchical model is presented. It is based on a loss function combining the Gini income inequality ratio and the predictability index of Goodman and Kruskal in order to achieve maximum heterogeneity of random effects across groups and maximum homogeneity of predicted probabilities inside estimated clusters. The index is compared with alternative approaches in a simulation study and applied in a case study concerning the role of hospital level variables in deciding for a cesarean section.
Similar content being viewed by others
References
Berger M, Tutz G (2018) Tree-structured clustering in fixed effects models. J Comput Graph Stat 27(2):380–392
Bragg F, Cromwell DA, Edozien L (2010) Variation in rates of caesarean section among English NHS trusts after accounting for maternal and clinical risk: cross sectional study. BMJ 341:c5065. https://doi.org/10.1136/bmj.c5065
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth, Belmont
Caceres IA, Arcaya M, Declercq E, Belanoff CM, Janakiraman V, Cohen B, Ecker J, Smith LA, Subramanian SV (2013) Hospital differences in cesarean deliveries in Massachusetts (US) 2004–2006: the case against case-mix artifact. PLoS ONE 8(3):e57817
Cannas M, Conversano C, Mola F, Sironi E (2017) Variation in caesarean delivery rates across hospitals: a Bayesian semi-parametric approach. J Appl Stat 44(12):2095–2107
Dagum C (1997) A new approach to the decomposition of the Gini income inequality ratio. Empir Econ 22:515–531
Dahl DB (2006) Model-based clustering for expression data via a Dirichlet process mixture model. In: Do KA, Muller P, Vannucci M (eds) Bayesian inference for gene expression and proteomics. Cambridge University Press, Cambridge, pp 201–218
Dahl DB (2009) Modal clustering in a class of product partition models. Bayesian Anal 4:243–264
Duncan C, Jones K, Moon G (1998) Context, composition and heterogeneity: using multilevel models in health research. Soc Sci Med 46:97–117
Dunson D (2008) Nonparametric Bayes applications to biostatistics (Tech. Rep.). Biostatistics Branch, National Institute of Environmental Health Sciences, U.S. National, Institute of Health, USA
Egidi L, Pappadá R, Pauli F, Torelli N (2018) Relabelling in Bayesian mixture models by pivotal units. Stat Comput 28(4):957–969
European Perinatal Health Report (2013) The health and care of pregnant women and babies in Europe in 2010. EURO-PERISTAT Project with SCPE and EUROCAT, Bruxelles
Ferguson TS (1973) A bayesian analysis of some nonparametric problems. Ann Stat 1:209–230
Fritsch A, Ickstadt K (2009) Improved criteria for clustering based on the posterior similarity matrix. Bayesian Anal 4:367–392
Goodman LA, Kruskal WH (1954) Measures of association for cross classification. J Am Stat Assoc 48:732–762
Grilli L, Panzera A, Rampichini C (2018) Clustering upper level units in multilevel models for ordinal data. In: Mola F, Conversano C, Vichi M (eds) Classification, (big) data analysis and statistical learning. Springer, Cham, pp 137–144
Guglielmi A, Ieva F, Paganoni AM, Ruggeri F, Soriano J (2014) Semiparametric bayesian models for clustering and classification in the presence of unbalanced in-hospital survival. J R Stat Soc C (Appl Stat) 63:25–46
Heinzl F, Tutz G (2014) Clustering in linear mixed models with a group fused lasso penalty. Biom J 1:44–68
Jara A, Hanson T, Quintana F, Mueller P, Rosner G (2011) DPpackage: Bayesian semi-and nonparametric modeling in R. J Stat Softw 40(5):1–30
Kleinman KP, Ibrahim JG (1998) A semi-parametric Bayesian approach to generalized linear mixed models. Stat Med 17:2579–2596
Kozhimannil KB, Law MR, Virnig BA (2013) Cesarean delivery rates vary among US hospitals: reducing variation may address quality and cost issues. Health Aff 32(3):527–535
Lau JW, Green PJ (2007) Bayesian model-based clustering procedures. J Comput Graph Stat 16:526–558
Lee Y, Roberts CL, Patterson JA, Simpson JM, Nicholl MC, Morris JM, Ford JB (2013) Unexplained variation in hospital caesarean section rates. Med J Aust 199(5):348–353
MacEachern SN (2000) Dependent nonparametric processes, Technical report. Dept. of Statistics, Ohio State University, Ohio
Medvedovic M, Yeung K, Bumgarner R (2004) Bayesian mixture model based clustering of replicated microarray data. Bioinformatics 20:1222–1232
Meila M (2007) Comparing clusterings: an information based distance. J Multivar Anal 98:873–895
Mola F, Siciliano R (1997) A fast splitting procedure for classification trees. Stat Comput 7:209–216
Pauger D, Wagner H (2018) Bayesian effect fusion for categorical predictors. Bayesian Anal. https://doi.org/10.1214/18-BA1096
Pitman J, Yor M (1997) The two-parameter Poisson Dirichlet distribution derived from a stable subordinator. Ann Probab 25:855–900
Rastelli R, Friel N (2017) Optimal Bayesian estimators for latent variable cluster models. Stat Comput 28(6):1169–1186
Roberts CL, Nippita TA (2015) International caesarean section rates: the rising tide. Lancet Glob Health 3(5):111–117
Sturtz S, Ligges U, Gelman A (2005) R2WinBUGS: a package for running WinBUGS from R. J Stat Softw 12(3):1–16
Tutz G, Oelker M (2017) Modeling clustered heterogeneity: fixed effects, random effects and mixtures. Int Stat Rev 85(2):204–227
Wade S, Gahrahmani Z (2018) Bayesian cluster analysis: point estimation and credible balls. Bayesian Anal 13(2):559–626
Acknowledgements
We would like to thank the Autonomous Region of Sardinia for providing the data used in Sect. 6. We also thank the editors and the two anonymous referees for their comments, which allowed us to consistently improve the quality of the paper in several parts.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Conversano, C., Cannas, M., Mola, F. et al. Random effects clustering in multilevel modeling: choosing a proper partition. Adv Data Anal Classif 13, 279–301 (2019). https://doi.org/10.1007/s11634-018-0347-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11634-018-0347-9
Keywords
- Hierarchical modelling
- Model based clustering
- Label switching
- Bayesian nonparametric
- Gini income inequality ratio
- Goodman and Kruskal predictability index