Skip to main content
Log in

Mixture structure analysis using the Akaike Information Criterion and the bootstrap

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Given i.i.d. observations x1,x2,x3,...,xn drawn from a mixture of normal terms, one is often interested in determining the number of terms in the mixture and their defining parameters. Although the problem of determining the number of terms is intractable under the most general assumptions, there is hope of elucidating the mixture structure given appropriate caveats on the underlying mixture. This paper examines a new approach to this problem based on the use of Akaike Information Criterion (AIC) based pruning of data driven mixture models which are obtained from resampled data sets. Results of the application of this procedure to artificially generated data sets and a real world data set are provided.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Akaike, H. (1974) A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19, 716–23.

    Google Scholar 

  • Binder, D. A. (1978) Bayesian cluster analysis. Biometrika, 65(1), 31–8.

    Google Scholar 

  • Bozdogan, H. and Sclove, S. L. (1984) Multi-sample cluster analysis using Akaike's information criterion. Annals of the Institute of Statistics and Mathematics, 36, 163–80.

    Google Scholar 

  • Carmen, C. S. and Merickel, M. (1990) Supervising isodata with an information theoretic stopping rule. Pattern Recognition, 23, 185–97.

    Google Scholar 

  • Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977) Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39, 1–38.

    Google Scholar 

  • Efron, B. and Tibshirani, R. (1993) An Introduction to the Bootstrap, London: Chapman and Hall.

    Google Scholar 

  • Everitt, B. S. and Hand, D. J. (1981) Finite Mixture Distributions, London: Chapman and Hall.

    Google Scholar 

  • Liang, Z., Jaszczak, R. J. and Coleman, R. E. (1992) Parameter estimation of finite mixtures using the EM algorithm and information criteria with applications to medical image processing. IEEE Transactions on Nuclear Science, 39(4), 1126–33.

    Google Scholar 

  • Marron, J. S. and Wand, M. P. (1992) Exact mean integrated squared error. Annals of Statistics, 20(2), 712–36.

    Google Scholar 

  • McLachlan, G. J. (1987) On bootstrapping the likelihood ratio test statistic for the number of components in a normal mixture. Applied Statistics, 36(3), 318–24.

    Google Scholar 

  • McLachlan, G. J. and Basford, K. E. (1988) Mixture Models, New York: Marcel Dekker.

    Google Scholar 

  • Milligan, G. W. and Cooper M. C. (1985) An examination of procedures for determining the number of clusters in a data set. Psychometrika, 50(1), 159–79.

    Google Scholar 

  • Parzen, E. (1979) Nonparametric statistical data modeling. Journal of the American Statistical Association, 74, 105–31.

    Google Scholar 

  • Priebe, C. E. (1994) Adaptive mixtures. Journal of the American Statistical Association, 89, 796–806.

    Google Scholar 

  • Priebe, C. E. and Marchette, D. J. (1993) Adaptive mixture density estimation. Pattern Recognition, 26(5), 771–85.

    Google Scholar 

  • Priebe, C. E., Solka, J. L. and Rogers, G. W. (1993) Discriminant analysis in aerial images using fractal based features. In F. A. Sadjadi (ed.) Adaptive and Learning Systems II, Proc. SPIE 1962, pp. 196–208.

  • Priebe, C. E., Solka, J. L., Lorey, R. A., Rogers, G. W., Poston, W. L., Kallergi, M., Qian, W., Clarke, L. P. and Clark, R. A. (1994) The application of fractal analysis to mammographic tissue classification. Cancer Letters, 77, 183–89.

    Google Scholar 

  • Scott, D. W. (1985b) Frequency polygons. Journal of the American Statistical Association, 80, 348–54

    Google Scholar 

  • Scott, D. W. (1985b) Average shifted histograms: effective non-parametric density estimation in several dimensions. Annals of Statistics, 13, 1024–40.

    Google Scholar 

  • Scott, D. W. (1992) Multivariate Density Estimation, New York: John Wiley.

    Google Scholar 

  • Scott, D. W. (1994) Multivariate Density Estimation, Short Course Interface 1994.

  • Silverman, B. W. (1986) Density Estimation for Statistics and Data Analysis. New York: Chapman and Hall.

    Google Scholar 

  • Solka, J. L. (1995) Matching Model Information Content to Data Information, PhD Dissertation, George Mason University, Fairfax, Virginia.

    Google Scholar 

  • Solka, J. L., Priebe, C. E. and Rogers, G. W. (1992) An initial assessment of discriminant surface complexity for power law features. Simulation, 58(5), 311–18.

    Google Scholar 

  • Solka, J. L., Priebe, C. E. and Rogers, G. W. (1993) A probabilistic approach to fractal based texture discrimination. In F. A. Sadjadi (ed.) Adaptive and Learning Systems II, Proc. SPIE 1962, pp. 209–18.

  • Solka, J. L., Priebe, C. E., Rogers, G. W., Poston, W. L. and Lorey, R. A. (1994) Maximum likelihood density estimation with term creation and annihilation. In Computationally Intensive Statistical Methods, Proceedings of the 26th Symposium on the Interface, pp. 222–25.

  • Solka, J. L., Poston, W. L. and Wegman, E. J. (1995) A visualization technique for studying the iterative estimation of mixture densities. Journal of Computational and Graphical Statistics, 4(3), 180–97.

    Google Scholar 

  • Sturges, H. A. (1926) The choice of a class interval. Journal of the American Statistical Association, 21, 65–6.

    Google Scholar 

  • Titterington, D. M. (1984) Recursive parameter estimation using incomplete data. Journal of the Royal Statistical Society, Series B, 46, 257–67.

    Google Scholar 

  • Titterington, D. M., Smith, A. F. M. and Makov, V. E. (1985) Statistical Analysis of Finite Mixture Distributions, New York: Wiley.

    Google Scholar 

  • Wallace, C. S. and Boulton D. M. (1968) An information measure for classification. Computer Journal, 11, 185–94.

    Google Scholar 

  • Wegman, E. J. (1970) Maximum likelihood estimation of a unimodal density function. Annals of Statistics, 41, 457–71.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Solka, J.L., Wegman, E.J., Priebe, C.E. et al. Mixture structure analysis using the Akaike Information Criterion and the bootstrap. Statistics and Computing 8, 177–188 (1998). https://doi.org/10.1023/A:1008924323509

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1008924323509

Navigation