Skip to main content
Log in

Bayesian analysis of finite mixture models of distributions from exponential families

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

This paper deals with the Bayesian analysis of finite mixture models with a fixed number of component distributions from natural exponential families with quadratic variance function (NEF-QVF). A unified Bayesian framework addressing the two main difficulties in this context is presented, i.e., the prior distribution choice and the parameter unidentifiability problem. In order to deal with the first issue, conjugate prior distributions are used. An algorithm to calculate the parameters in the prior distribution to obtain the least informative one into the class of conjugate distributions is developed. Regarding the second issue, a general algorithm to solve the label-switching problem is presented. These techniques are easily applied in practice as it is shown with an illustrative example.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Barndorff-Nielsen O (1978) Information and exponential families in statistical theory. Wiley, Newyork

    MATH  Google Scholar 

  • Bernardo JM (1979) Reference posterior distributions. for Bayesian inference. J R Stat Soc B 41:113–147

    MathSciNet  MATH  Google Scholar 

  • Böhning D, Seidel W (2003) ‘Editorial: recent developments in mixture models. Comput Stat Data Anal 41:349–357

    Article  MathSciNet  Google Scholar 

  • Celeux G, Hurn M, Robert CP (2000) Computational and inferential difficulties with mixture posterior distributions. J Am Stat Assoc 95:957–970

    Article  MathSciNet  Google Scholar 

  • Consonni G, Veronese P (1992) Conjugate priors for exponential families having quadratic variance functions. J Am Stat Assoc 87(420):1123–1127

    Article  MathSciNet  Google Scholar 

  • Diebolt J, Robert CP (1994) Estimation of finite mixture distributions through bayesian sampling. J R Stat Soc Ser B 56:363–375

    MathSciNet  MATH  Google Scholar 

  • Fernández C, Green P (2002)Modelling spatially correlated data via mixtures: a Bayesian approach. J R Stat Soc Ser B 64:805–826

    Article  MathSciNet  Google Scholar 

  • Gelfand AE, Smith AFM (1990) Sampling-based approaches to calculating marginal densities. J Am Stat Assoc 85(410):398–409

    Article  MathSciNet  Google Scholar 

  • Gilks WR, Richardson S, Spiegelhalter DJ (1998) Markov chain Monte Carlo in practice. Chapman and Hall, London

    MATH  Google Scholar 

  • Gutiérrez-Peña E, Smith AFM (1997) Exponential and Bayesian conjugate families: review and extensions (with discussion). Test 6:1–90

    Article  MathSciNet  Google Scholar 

  • Johnson DH, Gruner CM, Baggerly K, Seshagiri C (2001) Information-theoretic analysis of neural coding. J Comput Neurosci 10:47–69

    Article  Google Scholar 

  • McLachlan G, Peel D (2000) Finite mixture models. Wiley, Newyork

    Book  Google Scholar 

  • Morris CN (1982) Natural exponential families with quadratic variance functions. Ann stat 10:65–80

    Article  MathSciNet  Google Scholar 

  • Morris CN (1983) Natural exponential families with quadratic variance functions: statistical theory. Ann Stat 11:515–529

    Article  MathSciNet  Google Scholar 

  • Nobile A (1994) Bayesian analysis of finite mixture distributions. PhD dissertation, Department of Statistics, Carnegie Mellon University

  • Richarson S, Green PJ (1997) On Bayesian analysis of mixtures with an unknown number of components. J R Stat Soc 59(4):731–792

    Article  MathSciNet  Google Scholar 

  • Roeder K, Wasserman L (1997) Practical Bayesian density estimation using mixtures of normals. J R Stat Soc 92:894–902

    MathSciNet  MATH  Google Scholar 

  • Stephens M (1997) Bayesian methods for mixtures of normal distributions. PhD thesis, University of Oxford

  • Stephens M (2000) Dealing with label-switching in mixture models. J R Stat Soc Ser B 62:795–809

    Article  MathSciNet  Google Scholar 

  • Student (1906) On the error of counting with a haemocytometer. Biometrika 5:351–360

    Article  Google Scholar 

  • Titterington DM, Smith AFM, Makov UE (1985) Statistical analysis of finite mixture distributions. Wiley, New York

    MATH  Google Scholar 

  • Viallefont V, Richarson S, Green PJ (2002) Bayesian analysis of Poisson mixtures. J Nonparametr Stat 14:181–202

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

Comments from David Ríos-Insua are gratefully acknowledged. This research was partially supported by Ministerio de Educatión y Ciencia, Spain (Project TSI2004-06801-C04-03).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to C. J. Pérez.

Appendix: Theoretical results

Appendix: Theoretical results

Lemma 1 is required for Proposition 1. In Algorithm 3, step 1 follows from Proposition 2 and step 2 follows from Proposition 1.

Lemma 1 The following expression holds:

$$\int \mathrm{NQ}\left(\mu_{l}, V\left(\mu_{l}\right)\right) \log \left(\mathrm{NQ}\left(\mu_{i}, V\left(\mu_{i}\right)\right)\right) \mathrm{d} x=\theta_{i}\left(\mu_{i}\right) \mu_{l}-M\left(\theta_{i}\left(\mu_{i}\right)\right).$$

Proof

$$\begin{array}{l}{\int \mathrm{NQ}\left(\mu_{l}, V\left(\mu_{l}\right)\right) \log \left(\mathrm{NQ}\left(\mu_{i}, V\left(\mu_{i}\right)\right)\right) \mathrm{d} x=E\left(X_{l} \theta_{i}\left(\mu_{i}\right)-M\left(\theta_{i}\left(\mu_{i}\right)\right)\right)} \\ {\quad=E\left(X_{l} \theta_{i}\left(\mu_{i}\right)\right)-E\left(M\left(\theta_{i}\left(\mu_{i}\right)\right)\right)=\theta_{i}\left(\mu_{i}\right) E\left(X_{l}\right)-M\left(\theta_{i}\left(\mu_{i}\right)\right)} \\ {\quad=\theta_{i}\left(\mu_{i}\right) \mu_{l}-M\left(\theta_{i}\left(\mu_{i}\right)\right) \quad \text { where } \quad X_{l} \sim N Q\left(\mu_{l}, V\left(\mu_{l}\right)\right)}.\end{array}$$

Proposition 1 The following expression holds:

$$\begin{array}{r}{D[v(\boldsymbol{\phi}) \| \widehat{\boldsymbol{\phi}}]=C-\sum\limits_{j=1}^{k}\left\{\omega_{v(j)} \log \ \widehat{\omega}_{j}+\left(1-\omega_{v(j)}\right) \log \left(1-\widehat{\omega}_{j}\right)\right.} \\ {+\omega_{v(j)} \widehat{\theta}_{j}\left(\widehat{\mu}_{j}\right) \mu_{v(j)}-\omega_{v(j)} M\left(\widehat{\theta}_{j}\left(\widehat{\mu}_{j}\right)\right) \}}\end{array}$$

where C is a function that does not depend on the permutation v or on \(\widehat{\boldsymbol{\phi}}=(\widehat{\boldsymbol{\omega}}, \widehat{\boldsymbol{\mu}})\).

Proof

$$\begin{array}{*{20}{c}} {D\left[ {v(\phi )\left\| {\hat \phi } \right.} \right]} \\ { = \sum\limits_{j = 1}^k {\left[ {{w_{v(j)}}\text{NQ}\left( {{\mu _{v(j)}},V\left( {{\mu _{v(j)}}} \right)} \right)\left\| {{{\hat w}_j}\text{NQ}\left( {{{\hat \mu }_j},V\left( {{{\hat \mu }_j}} \right)} \right)} \right.} \right]} } \\ { = \sum\limits_{j = 1}^k {\left\{ {\left( {{w_{v(j)}}\log \frac{{{w_{v(j)}}}}{{{{\hat w}_j}}}} \right) + \left( {\left( {1 - {w_{v(j)}}} \right)\log \frac{{1 - {w_{v(j)}}}}{{1 - {{\hat w}_j}}}} \right) + \left( {{w_{v(j)}}\int {\text{NQ}\left( {{\mu _{v(j)}},V\left( {{\mu _{v(j)}}} \right)} \right)\log \frac{{\text{NQ}\left( {{\mu _{v(j)}},V\left( {{\mu _{v(j)}}} \right)} \right)}}{{\text{NQ}\left( {{{\hat \mu }_j},V\left( {{{\hat \mu }_j}} \right)} \right)}}dx} } \right)} \right\}} } \\ { = \sum\limits_{j = 1}^k {\left\{ {\left( {{w_{v(j)}}\log {w_{v(j)}}} \right) + \left( {\left( {1 - {w_{v(j)}}} \right)\log \left( {1 - {w_{v(j)}}} \right)} \right) + \left( {{w_{v(j)}}\int {\text{NQ}\left( {{\mu _{v(j)}},V\left( {{\mu _{v(j)}}} \right)} \right)\log \text{NQ}\left( {{\mu _{v(j)}},V\left( {{\mu _{v(j)}}} \right)} \right)dx} } \right)} \right\} - \sum\limits_{j = 1}^k {\left\{ {\left( {{w_{v(j)}}\log {{\hat w}_j}} \right) + \left( {\left( {1 - {w_{v(j)}}} \right)\log \left( {1 - {{\hat w}_j}} \right)} \right) + \left( {{w_{v(j)}}\int {\text{NQ}\left( {{\mu _{v(j)}},V\left( {{\mu _{v(j)}}} \right)} \right)\log \text{NQ}\left( {{{\hat \mu }_j},V\left( {{{\hat \mu }_j}} \right)} \right)dx} } \right)} \right\}} } } \\ { = \sum\limits_{j = 1}^k {\left\{ {\left( {{w_j}\log {w_j}} \right) + \left( {\left( {1 - w} \right)\log \left( {1 - w} \right)} \right) + \left( {{w_j}\int {\text{NQ}\left( {{\mu _j},V\left( {{\mu _j}} \right)} \right)\log \text{NQ}\left( {{\mu _j},V\left( {{\mu _j}} \right)} \right)dx} } \right)} \right\} - \sum\limits_{j = 1}^k {\left\{ {\left( {{w_{v(j)}}\log {{\hat w}_j}} \right) + \left( {\left( {1 - = {w_{v(j)}}} \right)\log \left( {1 - {{\hat w}_j}} \right)} \right) + \left( {{w_{v(j)}}\int {\text{NQ}\left( {{\mu _{v(j)}},V\left( {{\mu _{v(j)}}} \right)} \right)\log \text{NQ}\left( {{{\hat \mu }_j},V\left( {{{\hat \mu }_j}} \right)} \right)dx} } \right)} \right\}} } } \end{array}$$

[By Lemma 1]

$$\begin{array}{*{20}{c}} { = C - \sum\limits_{j = 1}^k {\left\{ {{w_{v(j)}}\log {{\hat w}_j} + \left( {1 - {w_{v(j)}}} \right)\log \left( {1 - {{\hat w}_j}} \right)} \right.} } \\ {\left. {\;\;\;\;\;\; + {w_{v(j)}}{{\hat \theta }_j}\left( {{{\hat \mu }_j}} \right){\mu _{v(j)}}M\left( {{{\hat \theta }_j}\left( {{{\hat \mu }_j}} \right)} \right)} \right\}.} \end{array}$$

Proposition 2 The minimum over \(\widehat{\boldsymbol{\phi}}=(\widehat{\boldsymbol{\omega}}, \widehat{\boldsymbol{\mu}})\) of \(D=\sum\nolimits_{t=1}^{N} D\left[v_{t}\left(\boldsymbol{\phi}^{(t)}\right) \| \widehat{\boldsymbol{\phi}}\right]\) is achieved at:

$$\widehat{\omega}_{j}=\frac{1}{N} \sum\limits_{t=1}^{N} \omega_{v_{t}(j)}^{(t)} \quad and \quad \widehat{\mu}_{j}=\frac{\sum\nolimits_{t=1}^{N} \omega_{v_{t}(j)}^{(t)} \mu_{v_{t}(j)}^{(t)}}{\sum\nolimits_{t=1}^{N} \omega_{v_{t}(j)}^{(t)}}.$$

Proof By Proposition 1, the problem can be divided into the following steps:

  1. 1.

    Choose \(\widehat{\omega}_{j}(j=1, \ldots, k)\) to maximize

    $$\sum\limits_{t=1}^{N} \omega_{v_{t}(j)}^{(t)} \log\ \widehat{\omega}_{j}+\left(1-\omega_{v_{t}(j)}^{(t)}\right) \log \left(1-\widehat{\omega}_{j}\right).$$
    (5)
  2. 2.

    Choose \(\widehat{\mu}_{j}(j=1, \ldots, k)\) to minimize

    $$\sum\limits_{t=1}^{N}-\omega_{v_{t}(j)}^{(t)} \widehat{\theta}_{j}\left(\widehat{\mu}_{j}\right) \mu_{v_{t}(j)}^{(t)}+\omega_{v_{t}(j)}^{(t)} M\left(\widehat{\theta_{j}}\left(\widehat{\mu}_{j}\right)\right).$$
    (6)

Dividing (5) by N, it follows:

$$\frac{\sum\nolimits_{t=1}^{N} \omega_{v_{t}(j)}^{(t)}}{N} \log\ \widehat{\omega}_{j}+\left(1-\frac{\sum\nolimits_{t=1}^{N} \omega_{v_{t}(j)}^{(t)}}{N}\right) \log \left(1-\widehat{\omega}_{j}\right).$$

Deriving and equaling to zero, it follows:

$$\widehat{\omega}_{j}=\frac{\sum\nolimits_{t=1}^{N} \omega_{v_{t}(j)}^{(t)}}{N}.$$

To solve (6), it is calculated:

$$\frac{\partial\left[\widehat{\theta}_{j}\left(\widehat{\mu}_{j}\right)\left(-\sum\nolimits_{t=1}^{N} \omega_{v_{t}(j)}^{(t)} \mu_{v_{t}(j)}^{(t)}\right)+M\left(\widehat{\theta}_{j}\left(\widehat{\mu}_{j}\right)\right) \sum\nolimits_{t=1}^{N} \omega_{v_{t}(j)}^{(t)}\right]}{\partial \widehat{\mu}_{j}},$$

and making the derivative equal to zero, it follows:

$$\widehat{\mu}_{j}=\frac{\sum\nolimits_{t=1}^{N} \omega_{v_{t}(j)}^{(t)} \mu_{v_{t}(j)}^{(t)}}{\sum\nolimits_{t=1}^{N} \omega_{v_{t}(j)}^{(t)}}.$$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rufo, M.J., Martín, J. & Pérez, C.J. Bayesian analysis of finite mixture models of distributions from exponential families. Computational Statistics 21, 621–637 (2006). https://doi.org/10.1007/s00180-006-0018-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-006-0018-8

Keywords

Navigation