Fitting a Mixture Distribution to a Variable Subject to Heteroscedastie Measurement Errors

Thamerus, Markus

doi:10.1007/s001800300129

Fitting a Mixture Distribution to a Variable Subject to Heteroscedastie Measurement Errors

Published: 04 November 2019

Volume 18, pages 1–17, (2003)
Cite this article

Computational Statistics Aims and scope Submit manuscript

Markus Thamerus¹

309 Accesses
4 Citations
Explore all metrics

Summary

In a structural errors-in-variables model the true regressors are treated as stochastic variables that can only be measured with an additional error. Therefore the distribution of the latent predictor variables and the distribution of the measurement errors play an important role in the analysis of such models. In this article the conventional assumptions of normality for these distributions are extended in two directions. The distribution of the true regressor variable is assumed to be a mixture of normal distributions and the measurement errors are again taken to be normally distributed but the error variances are allowed to be heteroscedastie. It is shown how an EM algorithm solely based on the error-prone observations of the latent variable can be used to find approximate ML estimates of the distribution parameters of the mixture. The procedure is illustrated by a Swiss data set that consists of regional radon measurements. The mean concentrations of the regions serve as proxies for the true regional averages of radon. The different variability of the measurements within the regions motivated this approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Alternative approaches for econometric modeling of panel data using mixture distributions

Article Open access 01 August 2017

Robust mixture regression modeling based on scale mixtures of skew-normal distributions

Article 19 July 2015

Bayesian inference in measurement error models from objective priors for the bivariate normal distribution

Article 07 December 2016

References

Böhning, D. (1999). Computer-Assisted Analysis of Mixtures and Applications; Meta-Analysis, Disease Mapping and Others. Chapman and Hall, London.
MATH Google Scholar
Bönning, D., Dietz, E., Schoub, R., Schlattman, P., and Lindsay, B.G. (1994). The distribution of the likelihood ratio for mixtures of densities from the one parameter exponential family. Annals of the Institute of Statistical Mathematics. 46, 373–388.
Article Google Scholar
Carroll, R.J., Ruppert, D. and Stefanski, L.A. (1995). Measurement Error in Nonlinear Models. Chapman and Hall, London.
Book Google Scholar
Caudill, S.B. and Acharya, R.N. (1998). Maximum likelihood estimation of a mixture of normal regressions: starting values and singularities. Communications in Statistics, B — Simulation and Computation. 27, 667–674.
Article Google Scholar
Dempster, A.P., Laird, N.M. and Rubin, D.B. (1977). Maximum Likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. B 39, 1–38.
MathSciNet MATH Google Scholar
Feng, Z.D. and McCulloch, C.E. (1994). On the likelihood ratio test statistic for the number of components in a normal mixture with unequal variances. Biometrics. 50, 1158–1162.
Article Google Scholar
Hathaway, R.J. (1985). A constrained formulation of maximum-likelihood estimation for normal mixture distributions. Tie Annals of Statistics. 13, 795–800.
Article MathSciNet Google Scholar
Hosmer Jr., D.W. (1973). On MLE of the parameters of a mixture of two normal distributions when the sample size is small. Communications in Statistics. 1, 217–227.
Article MathSciNet Google Scholar
Kiefer, J. and Wolfowitz, J. (1956). Consistency of the maximum-likelihood estimation in the presence of infinitely many incidental parameters. Annals of Mathematical Statistics. 27, 887–906.
Article MathSciNet Google Scholar
Kiefer, N.M. (1978). Discrete parameter variation: efficient estimation of a switching regression model. Econometrica. 46, 427–434.
Article MathSciNet Google Scholar
Küchenhoff, H. and Carroll, R. J. (1997). Segmented regression with errors in predictors: semi-parametric and parametric methods. Statistics in Medicine. 16, 169–188.
Article Google Scholar
Lindsay, B.G. (1995). Mixture Models: Theory, Geometry and Applications. NSF-CBMS Regional Conference Series in Probability and Statistics, Vol.5. Institute of Mathematical Statistics, Hayward, California.
Google Scholar
Louis, T. A. (1982). Finding the observed information matrix when using the EM algorithm. Journal of the Royal Statistical Society. B 44, 226–233.
MathSciNet MATH Google Scholar
Minder, Ch. E. and Völkle, H. (1995). Radon und Lungenkrebssterblichkeit in der Schweiz. 324. Bericht der mathematisch-statistischen Sektion der Forschungsgesellschaft Johanneum, 115–124.
Pierce, D.A., Stram, D.O., Vaeth, M. und Schafer, D.W. (1992). The errors-in-variables problem: considerations provided by radiation dose-response analyses of the A-bomb survivor data. Journal of the American Statistical Association. 87, 351–359.
Article Google Scholar
Quandt, R.E., and Ramsey, J.B. (1978). Estimating mixtures of normal distributions and switching regressions. Journal of the American Statistical Association. 73, 730–738.
Article MathSciNet Google Scholar
Redner, R.A. and Walker, H.F. (1984). Mixture densities, maximum likelihood and the EM algorithm. SIAM Review. 26, 195–240.
Article MathSciNet Google Scholar
Richardson, S. and Green, P.J. (1997). On Bayesian analysis of mixtures with an unknown number of components. Journal of the Royal Statistical Society. B 59, 731–792.
Article MathSciNet Google Scholar
Roeder, K. and Wasserman, L. (1997). Practical Bayesian density estimation using mixtures of normals. Journal of the American Statistical Association. 92, 894–902.
Article MathSciNet Google Scholar
Thamerus, M. (1998). Nichtlineare Regressionsmodelle mit heteroskedastischen Meßfehlern. Logos Verlag, Berlin.
Google Scholar

Download references

Acknowledgement

This research was partly supported by the Deutsche Forschungsgemeinschaft (German Research Council). I would like to thank Ch. E. Minder for discussion and introducing the problem. I would also like to thank an anonymous referee for directing my attention to some very general problems in the estimation of mixture models. Helpful discussions with H. Schneeweiss and R. Wolf are gratefully acknowledged.

Author information

Authors and Affiliations

Institute of Statistics, University of Munich, Akademiestrasse 1, D-80799, München, Germany
Markus Thamerus

Authors

Markus Thamerus
View author publications
You can also search for this author in PubMed Google Scholar

Appendix

Al Maximization of (7) with respect to α₁,…,α_k: To maximize (7) with respect to the proportion parameters α₁,…, α_m under the restriction $\sum\nolimits_{k = 1}^m {} {\alpha _k} = 1$ we maximize the function

$$La({\alpha _1}, \ldots ,{\alpha _m},{\lambda _L}) = \sum\limits_{k = 1}^m {} \sum\limits_{i = 1}^n {} p_k^{(c)}({W_i})\log {\alpha _k} + {\lambda _L}\left( {\sum\limits_{k = 1}^m {} {\alpha _k} - 1} \right).$$

Partial differentiation yields the conditions

i)
$ - \lambda _L^{ - 1}\sum\limits_{i = 1}^n {} p_k^{(c)}({W_i}) = \alpha _k^{(n)}\;\;\;{\rm{for}}\;\;\;k = 1, \ldots ,m,$ which inserted into the restriction lead to the equation
ii)
$$\sum\limits_{i = 1}^n {} \sum\limits_{k = 1}^m {} p_k^{(c)}({W_i}) = - {\lambda _L}.$$

If we replace the weights $p_k^{(c)}({W_i})$ in ii) with their original expressions (6) it is easily seen that −λ_L = n and from there with this result plugged into i) the solutions (8) follow.

A2 Maximization of (7) with respect to θ_k and ς_k under a homoscedastic measurement error model: In the case of a homoscedastic measurement error model, that is ${U_i} \sim N(0,\sigma _U^2)$ for i = 1,…, n, the solutions $\theta _k^{(n)}$ and $\varsigma _k^{(n)}$ of the maximization problem (9) follow directly from the equations (10). For k = 1,…,m the next approximations $\theta _k^{(n)}$ and $\varsigma _k^{(n)}$, respectively $\varsigma _k^{2(n)}$, are given by

$$\begin{array}{*{20}{c}} {\theta _k^{(n)}}& = &{\frac{{\sum\nolimits_{i = 1}^n {{W_i}p_k^{(c)}({W_i})} }}{{\sum\nolimits_{i = 1}^n {p_k^{(c)}({W_i})} }}\;and\;\;\;\;\;\;\;\;\;\;\;} \\ {\zeta _k^{2(n)}}& = &{\frac{{\sum\nolimits_{i = 1}^n {{{({W_i} - \theta _k^{(n)})}^2}p_k^{(c)}({W_i})} }}{{\sum\nolimits_{i = 1}^n {p_k^{(c)}({W_i})} }} - \sigma _U^2.} \end{array}$$

A3 Computation of the matrices H_k(θ_k, ς_k): The matrices H_k(θ_k, ς_k) of the second derivatives of qk(θ_k, ς_k) used in the Newton approximation of the M Step of the algorithm are given by

$${H_k}({\theta _k},{\zeta _k}) = \left( {\begin{array}{*{20}{c}} {\tfrac{{{\partial ^2}{q_k}}}{{\partial \theta _k^2}}}&{\tfrac{{{\partial ^2}{q_k}}}{{\partial {\theta _k}{\partial _{\zeta k}}}}} \\ {\tfrac{{{\partial ^2}{q_k}}}{{{\partial _{\zeta k}}\partial {\theta _k}}}}&{\tfrac{{{\partial ^2}{q_k}}}{{\partial \zeta _k^2}}} \end{array}} \right)$$

with the elements

$$\begin{array}{lll} \frac{\partial^{2} q_{k}}{\partial \theta_{k}^{2}} &=&-\sum\limits_{i=1}^{n} \frac{1}{\varsigma_{k}^{2}+\sigma_{i}^{2}} p_{k}^{(c)}(W_{i}), \\ \frac{\partial^{2} q_{k}}{\partial\varsigma_{k} \partial\theta_{k}} &=&-2 \sum\limits_{i=1}^{n} \frac{\varsigma_{k}(W_{i}-\theta_{k})}{(\varsigma_{k}^{2}+\sigma_{i}^{2})^{2}} p_{k}^{(c)}(W_{i})\quad\text{and} \\ \frac{\partial^{2} q_{k}}{\partial \varsigma_{k}^{2}} &=&\sum\limits_{i=1}^{n} \frac{1}{(\varsigma_{k}^{2}+\sigma_{i}^{2})^{3}}(\varsigma_{k}^{4}-\sigma_{i}^{4}+(\sigma_{i}^{2}-3 \varsigma_{k}^{2})(W_{i}-\theta_{k})^{2}) p_{k}^{(c)}(W_{i}).\end{array}$$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Thamerus, M. Fitting a Mixture Distribution to a Variable Subject to Heteroscedastie Measurement Errors. Computational Statistics 18, 1–17 (2003). https://doi.org/10.1007/s001800300129

Download citation

Published: 04 November 2019
Issue Date: March 2003
DOI: https://doi.org/10.1007/s001800300129

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fitting a Mixture Distribution to a Variable Subject to Heteroscedastie Measurement Errors

Summary

Access this article

Similar content being viewed by others

Alternative approaches for econometric modeling of panel data using mixture distributions

Robust mixture regression modeling based on scale mixtures of skew-normal distributions

Bayesian inference in measurement error models from objective priors for the bivariate normal distribution

References

Acknowledgement

Author information

Authors and Affiliations

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Fitting a Mixture Distribution to a Variable Subject to Heteroscedastie Measurement Errors

Summary

Access this article

Similar content being viewed by others

Alternative approaches for econometric modeling of panel data using mixture distributions

Robust mixture regression modeling based on scale mixtures of skew-normal distributions

Bayesian inference in measurement error models from objective priors for the bivariate normal distribution

References

Acknowledgement

Author information

Authors and Affiliations

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation