Abstract
Normal mixture models are often used to cluster continuous data. However, conventional approaches for fitting these models will have problems in producing nonsingular estimates of the component-covariance matrices when the dimension of the observations is large relative to the number of observations. In this case, methods such as principal components analysis (PCA) and the mixture of factor analyzers model can be adopted to avoid these estimation problems. We examine these approaches applied to the Cabernet wine data set of Ashenfelter (1999), considering the clustering of both the wines and the judges, and comparing our results with another analysis. The mixture of factor analyzers model proves particularly effective in clustering the wines, accurately classifying many of the wines by location.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ashenfelter, O.: California Versus All Challengers: The 1999 Cabernet Challenge (1999), http://www.liquidasset.com/report20.html
Banfield, J.D., Raftery, A.E.: Model-based Gaussian and non-Gaussian clustering. Biometrics 49, 803–821 (1993)
Chang, W.C.: On using principal components before separating a mixture of two multivariate normal distributions. Applied Statistics 32, 267–275 (1983)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journalof the Royal Statistical Society B 39, 1–38 (1977)
Healy, M.J.R.: Matrices for Statisticans. Clarendon, Oxford (1986)
Liu, J., Feng, J., Young, S.S. (2005), PowerMV v0.61.http://, http://www.niss.org/PowerMV/
Liu, L., Hawkins, D.M., Ghosh, S., Young, S.S.: Robust singular value decomposition analysis of microarray data. Proceedings of the National Academy of Sciences USA 100, 13167–13172 (2003)
McLachlan, G.J.: On bootstrapping the likelihood ratio test statistic for the number of components in a normal mixture. Applied Statistics 36, 318–324 (1987)
McLachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions. Wiley, New York (1997)
McLachlan, G.J., Peel, D.: Mixtures of factor analyzers. In: Langley, P. (ed.) Proceedings the Seventeenth International Conference on Machine Learning. Morgan Kaufmann, San Francisco (2000a)
McLachlan, G.J., Peel, D.: Finite Mixture Models. Wiley, New York (2000b)
McLachlan, G.J., Peel, D., Bean, R.W.: Modelling high-dimen-sional data by mixtures of factor analyzers. Comput. Statist. Data Anal. 41, 379–388 (2003)
Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., Bostein, D., Altman, R.B.: Missing value estimation methods for DNA microarrays. Bioinformatics 17, 520–525 (2001)
Young, S.: Private communication (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bean, R., McLachlan, G. (2005). Cluster Analysis of High-Dimensional Data: A Case Study. In: Gallagher, M., Hogan, J.P., Maire, F. (eds) Intelligent Data Engineering and Automated Learning - IDEAL 2005. IDEAL 2005. Lecture Notes in Computer Science, vol 3578. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11508069_40
Download citation
DOI: https://doi.org/10.1007/11508069_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26972-4
Online ISBN: 978-3-540-31693-0
eBook Packages: Computer ScienceComputer Science (R0)