Abstract
For clustering mixed categorical and continuous data, Lawrence and Krzanowski (1996) proposed a finite mixture model in which component densities conform to the location model. In the graphical models literature the location model is known as the homogeneous Conditional Gaussian model. In this paper it is shown that their model is not identifiable without imposing additional restrictions. Specifically, for g groups and m locations, (g!)m−1 distinct sets of parameter values (not including permutations of the group mixing parameters) produce the same likelihood function. Excessive shrinkage of parameter estimates in a simulation experiment reported by Lawrence and Krzanowski (1996) is shown to be an artifact of the model's non-identifiability. Identifiable finite mixture models can be obtained by imposing restrictions on the conditional means of the continuous variables. These new identified models are assessed in simulation experiments. The conditional mean structure of the continuous variables in the restricted location mixture models is similar to that in the underlying variable mixture models proposed by Everitt (1988), but the restricted location mixture models are more computationally tractable.
Similar content being viewed by others
References
Celeux, G. and Govaert, G. (1995) Gaussian parsimonious clus-tering models. Pattern Recognition, 28, 781–793.
Everitt, B. S. (1988) A finite mixture model for the clustering of mixed-mode data. Statistics and Probability Letters, 6, 305–309.
Everitt, B. S. and Merette, C. (1990) The clustering of mixed-mode data: a comparison of possible approaches. Journal of Applied Statistics, 17, 283–297.
Krzanowski, W. J. (1993) The location model for mixtures of categorical and continuous variables. Journal of Classifica-tion, 10, 25–49.
Lawrence, C. J. and Krzanowski, W. J. (1996) Mixture separation for mixed-mode data. Statistics and Computing, 6, 85–92.
McLachlan, G. J. and Basford, K. E. (1988) Mixture Models: Inference and Applications to Clustering, Marcel Dekker, New York.
McLachlan, G. J. and Krishnan, T. (1997) The EM Algorithm and Extensions, Wiley, New York.
Redner, R. A. and Walker, H. F. (1984) Mixture densities, maximum likelihood and the EM algorithm. SIAM Review, 26, 195–239.
Titterington, D. M., Smith, A. F. M. and Makov, U. E. (1985) Statistical Analysis of Finite Mixture Distributions, Wiley, New York.
Whittaker, J. (1990) Graphical Models in Applied Multivariate Statistics, Wiley, Chichester.
Yakowitz, S. J. and Spragins, J. D. (1968) On the identifiability of finite mixtures. Annals of Mathematical Statistics, 40, 1728–1735.
Rights and permissions
About this article
Cite this article
Willse, A., Boik, R.J. Identifiable finite mixtures of location models for clustering mixed-mode data. Statistics and Computing 9, 111–121 (1999). https://doi.org/10.1023/A:1008842432747
Issue Date:
DOI: https://doi.org/10.1023/A:1008842432747