Abstract
We use minimum message length (MML) estimation for mixture modelling. MML estimates are derived to choose the number of components in the mixture model to best describe the data and to estimate the parameters of the component densities for Gaussian mixture models. An empirical comparison of criteria prominent in the literature for estimating the number of components in a data set is performed. We have found that MML coding considerations allows the derivation of useful results to guide our implementation of a mixture modelling program. These advantages allow model search to be controlled based on the minimum variance for a component and the amount of data required to distinguish two overlapping components.
Similar content being viewed by others
References
Baxter R.A. 1995. Finding overlapping distributions with MML. Technical Report 244, Dept. of Computer Science, Monash University, Clayton 3168, Australia.
Barron A.R. and Cover T.M. 1991. Minimum complexity density estimation. IEEE Trans. on Info. Theory 37: 1034–1054.
Baxter R.A. and Oliver J.J. 1997. Overlapping mixture models. In: The Sixth International Workshop on Artificial Intelligence and Statistics. Ft. Lauderdale, Florida.
Berger J.O. and Pericchi L.R. 1994. The intrinsic bayes factor for linear models. In: Invited Papers of the Fifth International Meeting on Bayesian Statistics, pp. 237–273.
Bezdek J.C. 1981. Pattern Recognition with Fuzzy Objective Function Algorithms. New York, Plenum.
Binder D.A. 1978. Bayesian cluster analysis. Biometrika 65(1): 31–38.
Bozdogan H. 1983. Determining the number of component clusters in the standard multivariate normal mixture model using model-selection criteria. TR UIC/DQM/A83-1, Quantitative Methods Dept., University of Illinois, Chicago, Illinois 60680.
Bozdogan H. 1990. On the information-based measure of covariance complexity and its application to the evaluation of multivariate linear models. Communications in Statistics: Theory and Methods (A) 19(1): 221–278.
Bozdogan H. 1994. Mixture-model cluster analysis using model selection criteria and a new informational measure of complexity. In: Bozdogan H et al. (Eds.), Proc. of the First US/Japan Conf. on the Frontiers of Statistical Modeling: An Informational Approach. Kluwer Academic Publishers, pp. 69–113.
Boulton D.M. and Wallace C.S. 1970. A program for numerical classification. The Computer Journal 13(1): 63–69.
Cheeseman P. and Stutz J. 1996. Bayesian classification (AUTOCLASS): theory and results. In: Advances in Knowledge Discovery and Data Mining, AAAI Press/MIT Press.
Chickering D. and Heckerman D. 1996. Efficient approximation for the marginal likelihood of incomplete data given a Bayesian network. In: UAI'96. Morgan Kaufmann, pp. 158–168.
Chong Y.H. et al. 1989a. Automatic nephanalysis from infrared GMS data. In: Proc. Australian Joint A.I. Conf.
Chong Y.H., Pham B., Manton M., and Maeder A. 1989b. Automatic nephanalysis from infrared gms data. Technical Report 89/125, Dept. of Computer Science, Monash University, Clayton 3168 Australia.
Conway J.H. and Sloane N.J.A. 1988. Sphere Packings, Lattices and Groups. New York, Springer-Verlag.
Cutler A. and Windham M.P. 1994. Information-based validity functionals for mixture analysis. In: Bozdogan H. et al. (Eds.), Proc. of the First US/Japan Conf. on the Frontiers of Statistical Modeling: An Informational Approach. Kluwer Academic Publishers, pp. 149–170.
Day N.E. 1969. Estimating the components of mixture of normal distributions. Biometrika 56: 463–474.
Draper D. 1993. Assessment and propagation of model uncertainty. Technical Report 124, Dept. of Statistics, University of California, Los Angeles.
Everitt B.S. and Hand D.J. 1981. Finite Mixture Distributions. London, UK, London, Chapman and Hall.
Harvey A.C. 1981. The Econometric Analysis of Time Series. Oxford, Philip Allan.
Liang Z. et al. 1992. Parameter estimation of finite mixtures using the EM algorithm and information criteria with applications to medical image processing. IEEE Trans. on Nuclear Science 39(4): 1126–1133.
McLachlan G.I. and Basford K. 1988. Mixture Models: Inference and Applications to Clustering. New York, Marcel Dekker.
Mengersen K.L. and Robert C.P. 1993. Testing for mixtures via entropy distance and Gibbs sampling. Technical Report 9240, Document de travail, Crest, Insee. Paris.
Milligan G.W. and Cooper M.C. 1985. An examination of procedures for determining the number of clusters in a data set. Psychometrika 50(1): 159–179.
Neal R.M. 1995. Bayesian Learning for Neural Networks. PhD thesis, Graduate Dept. of Computer Science, Univ. of Toronto.
Nobile A. 1994. Bayesian analysis of finite mixture distributions. PhD thesis, Carnegie Mellon University.
Oliver J.J. and Baxter R.A. 1994. MML and Bayesianism: similarities and differences. Technical report TR 206, Dept. of Computer Science, Monash University, Clayton, Victoria 3168, Australia. Available on the WWW from http://www.cs.monash.edu.au/~jono.
Papp E., Dowe D., and Cox S.J.D. 1993. Spectral classification of radiometric data using an information theory approach. In: Proc. of the Conf. on Advanced Remote Sensing, Vol. 2, pp. 223–232.
Papp E. and Baxter R. 1994. Selection of variables for gold exploration from deep drill hole data using a novel information theory method. In: Geological Society of Australia Abstracts, Vol. 37.
Papp E.,Weiss J., and Baxter R. 1994. Identification of areas favourable for horehound infestation using novel artificial intelligence unsupervised classification and decision tree inductive inference: In: Proc. of the Conf. on Advanced Remote Sensing, Vol. 2.
Phillips D.B. and Smith A.F.M. 1996. Bayesian model comparison via jump diffusions. In: Practical Markov chain Monte Carlo. London, Chapman and Hall, pp. 215–239.
Richardson S. and Green P.J. 1996. On Bayesian analysis of mixtures with an unknown number of components. Mathematics Research Report S-96-01, University of Bristol. Also to appear at J.R.S.S.B Meeting.
Rissanen J. 1978. Modeling by shortest data description. Automatica 14: 465–471.
Rissanen J. 1987. Stochastic Complexity. J. R. Statist. Soc. B 49(3): 223–239.
Rissanen J. 1989. Stochastic Complexity in Statistical Inquiry. N.J., World Scientific.
Roeder K. and Wasserman L. 1995. Practical Bayesian density estimation using mixtures of normals. Technical Report 633, Dept. of Statistics, Carnegie-Mellon University.
Schwarz G. 1978. Estimating Dimension of a Model. Ann. Stat. 6: 461–464.
Spath H. 1980. Cluster Analysis Algorithms. Chichester,West Sussex, Ellis Horwood.
Wallace C.S. and Boulton D.M. 1968. An information measure for classification. Computer Journal 11(2): 195–209.
Windham M.P. and Cutler A. 1992. Information ratios for validating mixture models. Journal of the American Statistical Association, 87(420): 1188–1192.
Wallace C.S. and Freeman P.R. 1987. Estimation and inference by compact coding. J. R. Statist. Soc. B 49(3): 240–265.
Wolfe J.H. 1970. Pattern clustering by multivariate mixture analysis. Multivariate Behavioural Research 5: 329–350.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Baxter, R.A., Oliver, J.J. Finding overlapping components with MML . Statistics and Computing 10, 5–16 (2000). https://doi.org/10.1023/A:1008928315401
Issue Date:
DOI: https://doi.org/10.1023/A:1008928315401