Skip to main content

Advertisement

Log in

Random effects clustering in multilevel modeling: choosing a proper partition

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

A novel criterion for estimating a latent partition of the observed groups based on the output of a hierarchical model is presented. It is based on a loss function combining the Gini income inequality ratio and the predictability index of Goodman and Kruskal in order to achieve maximum heterogeneity of random effects across groups and maximum homogeneity of predicted probabilities inside estimated clusters. The index is compared with alternative approaches in a simulation study and applied in a case study concerning the role of hospital level variables in deciding for a cesarean section.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Berger M, Tutz G (2018) Tree-structured clustering in fixed effects models. J Comput Graph Stat 27(2):380–392

    Article  MathSciNet  Google Scholar 

  • Bragg F, Cromwell DA, Edozien L (2010) Variation in rates of caesarean section among English NHS trusts after accounting for maternal and clinical risk: cross sectional study. BMJ 341:c5065. https://doi.org/10.1136/bmj.c5065

    Article  Google Scholar 

  • Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth, Belmont

    MATH  Google Scholar 

  • Caceres IA, Arcaya M, Declercq E, Belanoff CM, Janakiraman V, Cohen B, Ecker J, Smith LA, Subramanian SV (2013) Hospital differences in cesarean deliveries in Massachusetts (US) 2004–2006: the case against case-mix artifact. PLoS ONE 8(3):e57817

    Article  Google Scholar 

  • Cannas M, Conversano C, Mola F, Sironi E (2017) Variation in caesarean delivery rates across hospitals: a Bayesian semi-parametric approach. J Appl Stat 44(12):2095–2107

    Article  MathSciNet  Google Scholar 

  • Dagum C (1997) A new approach to the decomposition of the Gini income inequality ratio. Empir Econ 22:515–531

    Article  Google Scholar 

  • Dahl DB (2006) Model-based clustering for expression data via a Dirichlet process mixture model. In: Do KA, Muller P, Vannucci M (eds) Bayesian inference for gene expression and proteomics. Cambridge University Press, Cambridge, pp 201–218

    Chapter  Google Scholar 

  • Dahl DB (2009) Modal clustering in a class of product partition models. Bayesian Anal 4:243–264

    Article  MathSciNet  MATH  Google Scholar 

  • Duncan C, Jones K, Moon G (1998) Context, composition and heterogeneity: using multilevel models in health research. Soc Sci Med 46:97–117

    Article  Google Scholar 

  • Dunson D (2008) Nonparametric Bayes applications to biostatistics (Tech. Rep.). Biostatistics Branch, National Institute of Environmental Health Sciences, U.S. National, Institute of Health, USA

  • Egidi L, Pappadá R, Pauli F, Torelli N (2018) Relabelling in Bayesian mixture models by pivotal units. Stat Comput 28(4):957–969

    Article  MathSciNet  MATH  Google Scholar 

  • European Perinatal Health Report (2013) The health and care of pregnant women and babies in Europe in 2010. EURO-PERISTAT Project with SCPE and EUROCAT, Bruxelles

  • Ferguson TS (1973) A bayesian analysis of some nonparametric problems. Ann Stat 1:209–230

    Article  MathSciNet  MATH  Google Scholar 

  • Fritsch A, Ickstadt K (2009) Improved criteria for clustering based on the posterior similarity matrix. Bayesian Anal 4:367–392

    Article  MathSciNet  MATH  Google Scholar 

  • Goodman LA, Kruskal WH (1954) Measures of association for cross classification. J Am Stat Assoc 48:732–762

    MATH  Google Scholar 

  • Grilli L, Panzera A, Rampichini C (2018) Clustering upper level units in multilevel models for ordinal data. In: Mola F, Conversano C, Vichi M (eds) Classification, (big) data analysis and statistical learning. Springer, Cham, pp 137–144

    Chapter  Google Scholar 

  • Guglielmi A, Ieva F, Paganoni AM, Ruggeri F, Soriano J (2014) Semiparametric bayesian models for clustering and classification in the presence of unbalanced in-hospital survival. J R Stat Soc C (Appl Stat) 63:25–46

    Article  MathSciNet  Google Scholar 

  • Heinzl F, Tutz G (2014) Clustering in linear mixed models with a group fused lasso penalty. Biom J 1:44–68

    Article  MathSciNet  MATH  Google Scholar 

  • Jara A, Hanson T, Quintana F, Mueller P, Rosner G (2011) DPpackage: Bayesian semi-and nonparametric modeling in R. J Stat Softw 40(5):1–30

    Article  Google Scholar 

  • Kleinman KP, Ibrahim JG (1998) A semi-parametric Bayesian approach to generalized linear mixed models. Stat Med 17:2579–2596

    Article  Google Scholar 

  • Kozhimannil KB, Law MR, Virnig BA (2013) Cesarean delivery rates vary among US hospitals: reducing variation may address quality and cost issues. Health Aff 32(3):527–535

    Article  Google Scholar 

  • Lau JW, Green PJ (2007) Bayesian model-based clustering procedures. J Comput Graph Stat 16:526–558

    Article  MathSciNet  Google Scholar 

  • Lee Y, Roberts CL, Patterson JA, Simpson JM, Nicholl MC, Morris JM, Ford JB (2013) Unexplained variation in hospital caesarean section rates. Med J Aust 199(5):348–353

    Article  Google Scholar 

  • MacEachern SN (2000) Dependent nonparametric processes, Technical report. Dept. of Statistics, Ohio State University, Ohio

    Google Scholar 

  • Medvedovic M, Yeung K, Bumgarner R (2004) Bayesian mixture model based clustering of replicated microarray data. Bioinformatics 20:1222–1232

    Article  Google Scholar 

  • Meila M (2007) Comparing clusterings: an information based distance. J Multivar Anal 98:873–895

    Article  MathSciNet  MATH  Google Scholar 

  • Mola F, Siciliano R (1997) A fast splitting procedure for classification trees. Stat Comput 7:209–216

    Article  Google Scholar 

  • Pauger D, Wagner H (2018) Bayesian effect fusion for categorical predictors. Bayesian Anal. https://doi.org/10.1214/18-BA1096

  • Pitman J, Yor M (1997) The two-parameter Poisson Dirichlet distribution derived from a stable subordinator. Ann Probab 25:855–900

    Article  MathSciNet  MATH  Google Scholar 

  • Rastelli R, Friel N (2017) Optimal Bayesian estimators for latent variable cluster models. Stat Comput 28(6):1169–1186

    Article  MathSciNet  Google Scholar 

  • Roberts CL, Nippita TA (2015) International caesarean section rates: the rising tide. Lancet Glob Health 3(5):111–117

    Article  Google Scholar 

  • Sturtz S, Ligges U, Gelman A (2005) R2WinBUGS: a package for running WinBUGS from R. J Stat Softw 12(3):1–16

    Article  Google Scholar 

  • Tutz G, Oelker M (2017) Modeling clustered heterogeneity: fixed effects, random effects and mixtures. Int Stat Rev 85(2):204–227

    Article  MathSciNet  Google Scholar 

  • Wade S, Gahrahmani Z (2018) Bayesian cluster analysis: point estimation and credible balls. Bayesian Anal 13(2):559–626

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

We would like to thank the Autonomous Region of Sardinia for providing the data used in Sect. 6. We also thank the editors and the two anonymous referees for their comments, which allowed us to consistently improve the quality of the paper in several parts.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Claudio Conversano.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Conversano, C., Cannas, M., Mola, F. et al. Random effects clustering in multilevel modeling: choosing a proper partition. Adv Data Anal Classif 13, 279–301 (2019). https://doi.org/10.1007/s11634-018-0347-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-018-0347-9

Keywords

Mathematics Subject Classification

Navigation