The determination of uncertainty levels in robust clustering of subjects with longitudinal observations using the Dirichlet process mixture

Rikhtehgaran, Reyhaneh; Kazemi, Iraj

doi:10.1007/s11634-016-0262-x

The determination of uncertainty levels in robust clustering of subjects with longitudinal observations using the Dirichlet process mixture

Regular Article
Published: 28 June 2016

Volume 10, pages 541–562, (2016)
Cite this article

Advances in Data Analysis and Classification Aims and scope Submit manuscript

Reyhaneh Rikhtehgaran¹ &
Iraj Kazemi²

326 Accesses
3 Citations
Explore all metrics

Abstract

In this paper we introduce a new method to the cluster analysis of longitudinal data focusing on the determination of uncertainty levels for cluster memberships. The method uses the Dirichlet-t distribution which notably utilizes the robustness feature of the student-t distribution in the framework of a Bayesian semi-parametric approach together with robust clustering of subjects evaluates the uncertainty level of subjects memberships to their clusters. We let the number of clusters and the uncertainty levels be unknown while fitting Dirichlet process mixture models. Two simulation studies are conducted to demonstrate the proposed methodology. The method is applied to cluster a real data set taken from gene expression studies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

High-dimensional variable selection with the plaid mixture model for clustering

Article 17 May 2018

Mixture of Extended Linear Mixed-Effects Models for Clustering of Longitudinal Data

Bayesian curve fitting and clustering with Dirichlet process mixture models for microarray data

Article 19 December 2018

Notes

The probability density function of a Dirichlet random vector \(\mathbf {u}=(u_1,\ldots ,u_k)^\prime \) with \(u_1,\ldots ,u_k\ge 0\), and \(\sum _{i=1}^k u_i=1\), is given by \(f(\mathbf {u})\propto \prod _{i=1}^k u_i^{\alpha _i-1}\) where \(\alpha _1,\ldots ,\alpha _k>0\), (see, e.g., Sorensen and Gianola 2002).
The probability density function of an inverse-Wishart random matrix \(\mathbf {U}\) of order T is given by \(f(\mathbf {U})\propto |\mathbf {U}|^{-\frac{\tau +T+1}{2} } e^{-\frac{1}{2}tr\left( \varPsi \mathbf {U}^{-1}\right) }\) where \(\tau \) and \(\varPsi \) are the shape and the scale parameters, respectively (see, e.g., Sorensen and Gianola 2002).
The probability density function of an Inverse-Gamma random variable u is given by \(f(u)\propto u^{-(a+1) } e^{-b/u}\) where a and b are the shape and the scale parameters, respectively (see, e.g., Sorensen and Gianola 2002).

References

Andrews JL, McNicholas PD (2011a) Extending mixtures of multivariate \(t\)-factor analyzers. Stat Comput 21(3):361–373
Article MathSciNet MATH Google Scholar
Andrews JL, McNicholas PD (2011b) Mixtures of modified \(t\)-factor analyzers for model-based clustering, classification, and discriminant analysis. J Stat Plan Inference 141(4):1479–1486
Article MathSciNet MATH Google Scholar
Baek J, McLachlan GJ (2011) Mixtures of common \(t\)-factor analyzers for clustering high-dimensional microarray data. Bioinformatics 27(9):1269–1276
Article Google Scholar
Bai X, Chen K, Yao W (2016) Mixture of linear mixed models using multivariate t distribution. J Stat Comput Simul 86(4):771–787
Article MathSciNet Google Scholar
Chen L, Brown SD (2014) Bayesian estimation of membership uncertainty in model-based clustering. J Chemometr 28(5):358–369
Article MathSciNet Google Scholar
Chu S, DeRisi J, Eisen M, Mulholland J, Botstein D, Brown P, Herskowitz I (1998) The transcriptional program of sporulation in budding yeast. Science 282:699–705
Article Google Scholar
Damien P, Wakefield J, Walker S (1999) Gibbs sampling for Bayesian non-conjugate and hierarchical models by using auxiliary variables. J R Stat Soc B 61:331–344
Article MathSciNet MATH Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm (with discussion). J R Stat Soc B 39:1–38
MathSciNet MATH Google Scholar
Dorazio RM (2009) On selecting a prior for the precision parameter of Dirichlet process mixture models. J Stat Plan Inference 139:3384–3390
Article MathSciNet MATH Google Scholar
Escobar MD (1994) Estimating normal means with a Dirichlet process prior. J Am Stat Assoc 89(425):268–277
Article MathSciNet MATH Google Scholar
Ferguson TS (1973) A Bayesian analysis of some nonparametric problems. Ann Stat 1:209–230
Article MathSciNet MATH Google Scholar
Finegold M, Drton M (2014) Robust bayesian graphical modeling using dirichlet t-distributions. Bayesian Anal 9(3):521–550
Article MathSciNet MATH Google Scholar
Fraley C, Raftery AE (1999) How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput J 41:578–588
Article MATH Google Scholar
Geman S, Geman D (1984) Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Trans Pattern Anal Mach Intell 6:721–741
Article MATH Google Scholar
Gilks WR, Wild P (1992) Adaptive rejection sampling for Gibbs sampling. Appl Stat 41(2):337–348
Article MATH Google Scholar
Hastings WK (1970) Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57:97–109
Article MathSciNet MATH Google Scholar
Heinzl F, Tutz G (2013) Clustering in linear mixed models with approximate Dirichlet process mixtures using EM algorithm. Stat Model 13:41–67
Article MathSciNet Google Scholar
Heinzl F, Fahrmeir L, Kneib T (2012) Additive mixed models with Dirichlet process mixture and P-spline priors. Adv Stat Anal 96:47–68
Article MathSciNet Google Scholar
Ishwaran H, James LF (2001) Gibbs sampling methods for stick-breaking priors. J Am Stat Assoc 96(453):161–173
Article MathSciNet MATH Google Scholar
Ishwaran H, James LF (2002) Approximate Dirichlet process computing in finite normal mixtures: smoothing and prior information. Comput Gr Stat 11:508–532
Ismail MMB, Frigui H (2010) Possibilistic clustering based on robust modeling of finite generalized Dirichlet mixture. In: The 20th international conference on pattern recognition, pp 573–576
Ismail MMB, Frigui H (2014) Unsupervised clustering and feature weighting based on generalized Dirichlet mixture modeling. Inf Sci 274:35–54
Article MathSciNet MATH Google Scholar
Laird NM, Ware JH (1982) Random effects models for longitudinal data. Biometrics 38:963–974
Article MATH Google Scholar
Li Y, Müller P, Lin X (2011) Center-adjusted inference for a nonparametric Bayesian random effect distribution. Stat Sinica 21(3):1201–1223
Article MathSciNet MATH Google Scholar
Lin TI (2014) Learning from incomplete data via parameterized t mixture models through eigenvalue decomposition. Comput Stat Data Anal 71:183–195
Article MathSciNet Google Scholar
Lin TI, Ho HJ, Chen CL (2009) Analysis of multivariate skew normal models with incomplete data. J Multivar Anal 100:2337–2351
Article MathSciNet MATH Google Scholar
Lin TI, McNicholas PD, Hsiu JH (2014) Capturing patterns via parsimonious t mixture models. Stat Probab Lett 88:80–87
Article MathSciNet MATH Google Scholar
Lunn D, Spiegelhalter D, Thomas A, Best N (2009) The BUGS project: evolution, critique and future directions (with discussion). Stat Med 28:3049–3082
Article MathSciNet Google Scholar
MacEachern SN (1994) Estimating normal means with a conjugate style Dirichlet process prior. Commun Stat 23:727–741
Article MathSciNet MATH Google Scholar
McNicholas PD (2013) Model-based clustering and classification via mixtures of multivariate \(t\)-distributions. In: Giudici P, Ingrassia S, Vichi M (eds) Statistical models for data analysis, studies in classification, data analysis, and knowledge organization. Springer International Publishing, Heidelberg
Google Scholar
McNicholas PD, Subedi S (2012) Clustering gene expression time course data using mixtures of multivariate \(t\)-distributions. J Stat Plan Inference 142:1114–1127
Article MathSciNet MATH Google Scholar
Morris K, McNicholas PD, Scrucca L (2013) Dimension reduction for model-based clustering via mixtures of multivariate \(t\)-distributions. Adv Data Anal Classif 7(3):321–338
Article MathSciNet MATH Google Scholar
Munoz A, Carey V, Schouten JP, Segal M, Rosner B (1992) A parametric family of correlation structures for the analysis of longitudinal data. Biometrics 48(3):733–742
Article Google Scholar
Rasmussen CE, de la Cruz BJ, Ghahramani Z, Wild DL (2009) Modeling and visualizing uncertainty in gene expression clusters using Dirichlet process mixtures. IEEE/ACM Trans Comput Biol Bioinform 6:615–627
Article Google Scholar
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464
Article MathSciNet MATH Google Scholar
Sethuraman J (1994) A constructive definition of Dirichlet priors. Stat Sinica 4:639–650
MathSciNet MATH Google Scholar
Sorensen D, Gianola D (2002) Likelihood, Bayesian and MCMC methods in quantitative genetics. Springer, New York
Book MATH Google Scholar
Steane MA, McNicholas PD, Yada R (2012) Model-based classification via mixtures of multivariate \(t\)-factor analyzers. Commun Stat Simul Comput 41(4):510–523
Article MathSciNet MATH Google Scholar
Wakefield JC, Zhou C, Self SG (2003) Modelling gene expression over time: curve clustering with informative prior distributions. In: Bernardo JM, Bayarri MJ, Berger JO, Dawid AP, Heckerman D, Smith AFM, West M (eds) Bayesian statistics, vol 7. Oxford University Press, Oxford, pp 721–732
Google Scholar
Wang WL (2013) Multivariate t linear mixed models for irregularly observed multiple repeated measures with missing outcomes. Biometr J 55:554–571
Article MathSciNet MATH Google Scholar
Wang WL, Fan TH (2011) Estimation in multivariate t linear mixed models for multiple longitudinal data. Stat Sinica 21:1857–1880
MathSciNet MATH Google Scholar
Wang WL, Lin TI (2014) Multivariate t nonlinear mixed-effects models for multi-outcome longitudinal data with missing values. Stat Med 33:3029–3046
Article MathSciNet Google Scholar
Wang WL, Lin TI (2015) Robust model-based clustering via mixtures of skew-t distributions with missing information. Adv Data Anal Classif 9(4):423–445
Article MathSciNet Google Scholar
Wang L, Wang X (2013) Hierarchical Dirichlet process model for gene expression clustering. EURASIP J Bioinform Syst Biol 2013:5
Article Google Scholar

Download references

Acknowledgments

The authors gratefully acknowledge two reviewers for their valuable comments and suggestions.

Author information

Authors and Affiliations

Department of Mathematical Sciences, Isfahan University of Technology, Isfahan, 84156-83111, Iran
Reyhaneh Rikhtehgaran
Department of Statistics, University of Isfahan, Isfahan, 81746, Iran
Iraj Kazemi

Authors

Reyhaneh Rikhtehgaran
View author publications
You can also search for this author in PubMed Google Scholar
Iraj Kazemi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Reyhaneh Rikhtehgaran.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rikhtehgaran, R., Kazemi, I. The determination of uncertainty levels in robust clustering of subjects with longitudinal observations using the Dirichlet process mixture. Adv Data Anal Classif 10, 541–562 (2016). https://doi.org/10.1007/s11634-016-0262-x

Download citation

Received: 14 February 2015
Revised: 20 June 2016
Accepted: 21 June 2016
Published: 28 June 2016
Issue Date: December 2016
DOI: https://doi.org/10.1007/s11634-016-0262-x

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The determination of uncertainty levels in robust clustering of subjects with longitudinal observations using the Dirichlet process mixture

Abstract

Access this article

Similar content being viewed by others

High-dimensional variable selection with the plaid mixture model for clustering

Mixture of Extended Linear Mixed-Effects Models for Clustering of Longitudinal Data

Bayesian curve fitting and clustering with Dirichlet process mixture models for microarray data

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

The determination of uncertainty levels in robust clustering of subjects with longitudinal observations using the Dirichlet process mixture

Abstract

Access this article

Similar content being viewed by others

High-dimensional variable selection with the plaid mixture model for clustering

Mixture of Extended Linear Mixed-Effects Models for Clustering of Longitudinal Data

Bayesian curve fitting and clustering with Dirichlet process mixture models for microarray data

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation