Bayesian analysis of finite mixture models of distributions from exponential families

Rufo, M. J.; Martín, J.; Pérez, C. J.

doi:10.1007/s00180-006-0018-8

Bayesian analysis of finite mixture models of distributions from exponential families

Original Paper
Published: 04 November 2006

Volume 21, pages 621–637, (2006)
Cite this article

Computational Statistics Aims and scope Submit manuscript

M. J. Rufo¹,
J. Martín¹ &
C. J. Pérez¹

261 Accesses
12 Citations
Explore all metrics

Abstract

This paper deals with the Bayesian analysis of finite mixture models with a fixed number of component distributions from natural exponential families with quadratic variance function (NEF-QVF). A unified Bayesian framework addressing the two main difficulties in this context is presented, i.e., the prior distribution choice and the parameter unidentifiability problem. In order to deal with the first issue, conjugate prior distributions are used. An algorithm to calculate the parameters in the prior distribution to obtain the least informative one into the class of conjugate distributions is developed. Regarding the second issue, a general algorithm to solve the label-switching problem is presented. These techniques are easily applied in practice as it is shown with an illustrative example.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Inverse Stable Prior for Exponential Models

Article 14 February 2019

Strong consistency of the MLE under two-parameter Gamma mixture models with a structural scale parameter

Article 22 November 2021

Comparing Two Mixing Densities in Nonparametric Mixture Models

Article 04 June 2015

References

Barndorff-Nielsen O (1978) Information and exponential families in statistical theory. Wiley, Newyork
MATH Google Scholar
Bernardo JM (1979) Reference posterior distributions. for Bayesian inference. J R Stat Soc B 41:113–147
MathSciNet MATH Google Scholar
Böhning D, Seidel W (2003) ‘Editorial: recent developments in mixture models. Comput Stat Data Anal 41:349–357
Article MathSciNet Google Scholar
Celeux G, Hurn M, Robert CP (2000) Computational and inferential difficulties with mixture posterior distributions. J Am Stat Assoc 95:957–970
Article MathSciNet Google Scholar
Consonni G, Veronese P (1992) Conjugate priors for exponential families having quadratic variance functions. J Am Stat Assoc 87(420):1123–1127
Article MathSciNet Google Scholar
Diebolt J, Robert CP (1994) Estimation of finite mixture distributions through bayesian sampling. J R Stat Soc Ser B 56:363–375
MathSciNet MATH Google Scholar
Fernández C, Green P (2002)Modelling spatially correlated data via mixtures: a Bayesian approach. J R Stat Soc Ser B 64:805–826
Article MathSciNet Google Scholar
Gelfand AE, Smith AFM (1990) Sampling-based approaches to calculating marginal densities. J Am Stat Assoc 85(410):398–409
Article MathSciNet Google Scholar
Gilks WR, Richardson S, Spiegelhalter DJ (1998) Markov chain Monte Carlo in practice. Chapman and Hall, London
MATH Google Scholar
Gutiérrez-Peña E, Smith AFM (1997) Exponential and Bayesian conjugate families: review and extensions (with discussion). Test 6:1–90
Article MathSciNet Google Scholar
Johnson DH, Gruner CM, Baggerly K, Seshagiri C (2001) Information-theoretic analysis of neural coding. J Comput Neurosci 10:47–69
Article Google Scholar
McLachlan G, Peel D (2000) Finite mixture models. Wiley, Newyork
Book Google Scholar
Morris CN (1982) Natural exponential families with quadratic variance functions. Ann stat 10:65–80
Article MathSciNet Google Scholar
Morris CN (1983) Natural exponential families with quadratic variance functions: statistical theory. Ann Stat 11:515–529
Article MathSciNet Google Scholar
Nobile A (1994) Bayesian analysis of finite mixture distributions. PhD dissertation, Department of Statistics, Carnegie Mellon University
Richarson S, Green PJ (1997) On Bayesian analysis of mixtures with an unknown number of components. J R Stat Soc 59(4):731–792
Article MathSciNet Google Scholar
Roeder K, Wasserman L (1997) Practical Bayesian density estimation using mixtures of normals. J R Stat Soc 92:894–902
MathSciNet MATH Google Scholar
Stephens M (1997) Bayesian methods for mixtures of normal distributions. PhD thesis, University of Oxford
Stephens M (2000) Dealing with label-switching in mixture models. J R Stat Soc Ser B 62:795–809
Article MathSciNet Google Scholar
Student (1906) On the error of counting with a haemocytometer. Biometrika 5:351–360
Article Google Scholar
Titterington DM, Smith AFM, Makov UE (1985) Statistical analysis of finite mixture distributions. Wiley, New York
MATH Google Scholar
Viallefont V, Richarson S, Green PJ (2002) Bayesian analysis of Poisson mixtures. J Nonparametr Stat 14:181–202
Article MathSciNet Google Scholar

Download references

Acknowledgments

Comments from David Ríos-Insua are gratefully acknowledged. This research was partially supported by Ministerio de Educatión y Ciencia, Spain (Project TSI2004-06801-C04-03).

Author information

Authors and Affiliations

Department of Mathematics, University of Extremadura, Avda. de la Universidad s/n, 10071, Cáceres, Spain
M. J. Rufo, J. Martín & C. J. Pérez

Authors

M. J. Rufo
View author publications
You can also search for this author in PubMed Google Scholar
J. Martín
View author publications
You can also search for this author in PubMed Google Scholar
C. J. Pérez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to C. J. Pérez.

Appendix: Theoretical results

Lemma 1 is required for Proposition 1. In Algorithm 3, step 1 follows from Proposition 2 and step 2 follows from Proposition 1.

Lemma 1 The following expression holds:

$$\int \mathrm{NQ}\left(\mu_{l}, V\left(\mu_{l}\right)\right) \log \left(\mathrm{NQ}\left(\mu_{i}, V\left(\mu_{i}\right)\right)\right) \mathrm{d} x=\theta_{i}\left(\mu_{i}\right) \mu_{l}-M\left(\theta_{i}\left(\mu_{i}\right)\right).$$

Proof

$$\begin{array}{l}{\int \mathrm{NQ}\left(\mu_{l}, V\left(\mu_{l}\right)\right) \log \left(\mathrm{NQ}\left(\mu_{i}, V\left(\mu_{i}\right)\right)\right) \mathrm{d} x=E\left(X_{l} \theta_{i}\left(\mu_{i}\right)-M\left(\theta_{i}\left(\mu_{i}\right)\right)\right)} \\ {\quad=E\left(X_{l} \theta_{i}\left(\mu_{i}\right)\right)-E\left(M\left(\theta_{i}\left(\mu_{i}\right)\right)\right)=\theta_{i}\left(\mu_{i}\right) E\left(X_{l}\right)-M\left(\theta_{i}\left(\mu_{i}\right)\right)} \\ {\quad=\theta_{i}\left(\mu_{i}\right) \mu_{l}-M\left(\theta_{i}\left(\mu_{i}\right)\right) \quad \text { where } \quad X_{l} \sim N Q\left(\mu_{l}, V\left(\mu_{l}\right)\right)}.\end{array}$$

Proposition 1 The following expression holds:

$$\begin{array}{r}{D[v(\boldsymbol{\phi}) \| \widehat{\boldsymbol{\phi}}]=C-\sum\limits_{j=1}^{k}\left\{\omega_{v(j)} \log \ \widehat{\omega}_{j}+\left(1-\omega_{v(j)}\right) \log \left(1-\widehat{\omega}_{j}\right)\right.} \\ {+\omega_{v(j)} \widehat{\theta}_{j}\left(\widehat{\mu}_{j}\right) \mu_{v(j)}-\omega_{v(j)} M\left(\widehat{\theta}_{j}\left(\widehat{\mu}_{j}\right)\right) \}}\end{array}$$

where C is a function that does not depend on the permutation v or on $\widehat{\boldsymbol{\phi}}=(\widehat{\boldsymbol{\omega}}, \widehat{\boldsymbol{\mu}})$.

Proof

$$\begin{array}{*{20}{c}} {D\left[ {v(\phi )\left\| {\hat \phi } \right.} \right]} \\ { = \sum\limits_{j = 1}^k {\left[ {{w_{v(j)}}\text{NQ}\left( {{\mu _{v(j)}},V\left( {{\mu _{v(j)}}} \right)} \right)\left\| {{{\hat w}_j}\text{NQ}\left( {{{\hat \mu }_j},V\left( {{{\hat \mu }_j}} \right)} \right)} \right.} \right]} } \\ { = \sum\limits_{j = 1}^k {\left\{ {\left( {{w_{v(j)}}\log \frac{{{w_{v(j)}}}}{{{{\hat w}_j}}}} \right) + \left( {\left( {1 - {w_{v(j)}}} \right)\log \frac{{1 - {w_{v(j)}}}}{{1 - {{\hat w}_j}}}} \right) + \left( {{w_{v(j)}}\int {\text{NQ}\left( {{\mu _{v(j)}},V\left( {{\mu _{v(j)}}} \right)} \right)\log \frac{{\text{NQ}\left( {{\mu _{v(j)}},V\left( {{\mu _{v(j)}}} \right)} \right)}}{{\text{NQ}\left( {{{\hat \mu }_j},V\left( {{{\hat \mu }_j}} \right)} \right)}}dx} } \right)} \right\}} } \\ { = \sum\limits_{j = 1}^k {\left\{ {\left( {{w_{v(j)}}\log {w_{v(j)}}} \right) + \left( {\left( {1 - {w_{v(j)}}} \right)\log \left( {1 - {w_{v(j)}}} \right)} \right) + \left( {{w_{v(j)}}\int {\text{NQ}\left( {{\mu _{v(j)}},V\left( {{\mu _{v(j)}}} \right)} \right)\log \text{NQ}\left( {{\mu _{v(j)}},V\left( {{\mu _{v(j)}}} \right)} \right)dx} } \right)} \right\} - \sum\limits_{j = 1}^k {\left\{ {\left( {{w_{v(j)}}\log {{\hat w}_j}} \right) + \left( {\left( {1 - {w_{v(j)}}} \right)\log \left( {1 - {{\hat w}_j}} \right)} \right) + \left( {{w_{v(j)}}\int {\text{NQ}\left( {{\mu _{v(j)}},V\left( {{\mu _{v(j)}}} \right)} \right)\log \text{NQ}\left( {{{\hat \mu }_j},V\left( {{{\hat \mu }_j}} \right)} \right)dx} } \right)} \right\}} } } \\ { = \sum\limits_{j = 1}^k {\left\{ {\left( {{w_j}\log {w_j}} \right) + \left( {\left( {1 - w} \right)\log \left( {1 - w} \right)} \right) + \left( {{w_j}\int {\text{NQ}\left( {{\mu _j},V\left( {{\mu _j}} \right)} \right)\log \text{NQ}\left( {{\mu _j},V\left( {{\mu _j}} \right)} \right)dx} } \right)} \right\} - \sum\limits_{j = 1}^k {\left\{ {\left( {{w_{v(j)}}\log {{\hat w}_j}} \right) + \left( {\left( {1 - = {w_{v(j)}}} \right)\log \left( {1 - {{\hat w}_j}} \right)} \right) + \left( {{w_{v(j)}}\int {\text{NQ}\left( {{\mu _{v(j)}},V\left( {{\mu _{v(j)}}} \right)} \right)\log \text{NQ}\left( {{{\hat \mu }_j},V\left( {{{\hat \mu }_j}} \right)} \right)dx} } \right)} \right\}} } } \end{array}$$

[By Lemma 1]

$$\begin{array}{*{20}{c}} { = C - \sum\limits_{j = 1}^k {\left\{ {{w_{v(j)}}\log {{\hat w}_j} + \left( {1 - {w_{v(j)}}} \right)\log \left( {1 - {{\hat w}_j}} \right)} \right.} } \\ {\left. {\;\;\;\;\;\; + {w_{v(j)}}{{\hat \theta }_j}\left( {{{\hat \mu }_j}} \right){\mu _{v(j)}}M\left( {{{\hat \theta }_j}\left( {{{\hat \mu }_j}} \right)} \right)} \right\}.} \end{array}$$

Proposition 2 The minimum over $\widehat{\boldsymbol{\phi}}=(\widehat{\boldsymbol{\omega}}, \widehat{\boldsymbol{\mu}})$ of $D=\sum\nolimits_{t=1}^{N} D\left[v_{t}\left(\boldsymbol{\phi}^{(t)}\right) \| \widehat{\boldsymbol{\phi}}\right]$ is achieved at:

$$\widehat{\omega}_{j}=\frac{1}{N} \sum\limits_{t=1}^{N} \omega_{v_{t}(j)}^{(t)} \quad and \quad \widehat{\mu}_{j}=\frac{\sum\nolimits_{t=1}^{N} \omega_{v_{t}(j)}^{(t)} \mu_{v_{t}(j)}^{(t)}}{\sum\nolimits_{t=1}^{N} \omega_{v_{t}(j)}^{(t)}}.$$

Proof By Proposition 1, the problem can be divided into the following steps:

1.
Choose $\widehat{\omega}_{j}(j=1, \ldots, k)$ to maximize
$$\sum\limits_{t=1}^{N} \omega_{v_{t}(j)}^{(t)} \log\ \widehat{\omega}_{j}+\left(1-\omega_{v_{t}(j)}^{(t)}\right) \log \left(1-\widehat{\omega}_{j}\right).$$
(5)
2.
Choose $\widehat{\mu}_{j}(j=1, \ldots, k)$ to minimize
$$\sum\limits_{t=1}^{N}-\omega_{v_{t}(j)}^{(t)} \widehat{\theta}_{j}\left(\widehat{\mu}_{j}\right) \mu_{v_{t}(j)}^{(t)}+\omega_{v_{t}(j)}^{(t)} M\left(\widehat{\theta_{j}}\left(\widehat{\mu}_{j}\right)\right).$$
(6)

Dividing (5) by N, it follows:

$$\frac{\sum\nolimits_{t=1}^{N} \omega_{v_{t}(j)}^{(t)}}{N} \log\ \widehat{\omega}_{j}+\left(1-\frac{\sum\nolimits_{t=1}^{N} \omega_{v_{t}(j)}^{(t)}}{N}\right) \log \left(1-\widehat{\omega}_{j}\right).$$

Deriving and equaling to zero, it follows:

$$\widehat{\omega}_{j}=\frac{\sum\nolimits_{t=1}^{N} \omega_{v_{t}(j)}^{(t)}}{N}.$$

To solve (6), it is calculated:

$$\frac{\partial\left[\widehat{\theta}_{j}\left(\widehat{\mu}_{j}\right)\left(-\sum\nolimits_{t=1}^{N} \omega_{v_{t}(j)}^{(t)} \mu_{v_{t}(j)}^{(t)}\right)+M\left(\widehat{\theta}_{j}\left(\widehat{\mu}_{j}\right)\right) \sum\nolimits_{t=1}^{N} \omega_{v_{t}(j)}^{(t)}\right]}{\partial \widehat{\mu}_{j}},$$

and making the derivative equal to zero, it follows:

$$\widehat{\mu}_{j}=\frac{\sum\nolimits_{t=1}^{N} \omega_{v_{t}(j)}^{(t)} \mu_{v_{t}(j)}^{(t)}}{\sum\nolimits_{t=1}^{N} \omega_{v_{t}(j)}^{(t)}}.$$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rufo, M.J., Martín, J. & Pérez, C.J. Bayesian analysis of finite mixture models of distributions from exponential families. Computational Statistics 21, 621–637 (2006). https://doi.org/10.1007/s00180-006-0018-8

Download citation

Published: 04 November 2006
Issue Date: December 2006
DOI: https://doi.org/10.1007/s00180-006-0018-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bayesian analysis of finite mixture models of distributions from exponential families

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Inverse Stable Prior for Exponential Models

Strong consistency of the MLE under two-parameter Gamma mixture models with a structural scale parameter

Comparing Two Mixing Densities in Nonparametric Mixture Models

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix: Theoretical results

Appendix: Theoretical results

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now