Abstract
Clustering is one of the core problems in machine learning. Many clustering algorithms aim to partition data along a single dimension. This approach may become inappropriate when data has higher dimension and is multifaceted. This paper introduces a class of mixture models with multiple dimensions called pouch latent tree models. We use them to perform cluster analysis on a data set consisting of 75 development indicators for 133 countries. We further propose a method that guides the selection of clustering variables due to the existence of multiple latent variables. The analysis results demonstrate that some interesting clusterings of countries can be obtained from mixture models with multiple dimensions but not those with single dimensions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
The NMI can be computed using the empirical distribution after discretizing the continuous attributes.
References
Banfield, J.D., Raftery, A.E.: Model-based Gaussian and non-Gaussian clustering. Biometrics 49(3), 803–821 (1993)
Bouveyrona, C., Brunet-Saumard, C.: Model-based clustering of high-dimensional data: a review. Comput. Stat. Data Anal. 71, 52–78 (2014)
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, Hoboken (2006)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B (Methodol.) 39(1), 1–38 (1977)
Dy, J.G., Brodley, C.E.: Feature selection for unsupervised learning. J. Mach. Learn. Res. 5, 845–889 (2004)
Fraley, C., Raftery, A.E., Murphy, T.B., Scrucca, L.: MCLUST version 4 for R: normal mixture modeling for model-based clustering, classification, and density estimation. Department of Statistics, University of Washington, Technical report (2012)
Galimberti, G., Soffritti, G.: Model-based methods to identify multiple cluster structures in a data set. Comput. Stat. Data Anal. 52, 520–536 (2007)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)
Kriegel, H.P., Kröger, P., Zimek, A.: Clustering high dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans. Knowl. Discov. Data 3(1), 1–58 (2009)
McLachlan, G.J., Peel, D.: Finite Mixture Models. Wiley, New York (2000)
Parsons, L., Haque, E., Liu, H.: Subspace clustering for high dimensional data: a review. ACM SIGKDD Explor. Newsl. 6(1), 90–105 (2004)
Poon, L.K.M., Zhang, N.L., Chen, T., Wang, Y.: Variable selection in model-based clustering: to do or to facilitate. In: Proceedings of the 27th International Conference on Machine Learning, pp. 887–894 (2010)
Poon, L.K.M., Zhang, N.L., Liu, T., Liu, A.H.: Model-based clustering of high-dimensional data: variable selection versus facet determination. Int. J. Approx. Reason. 54(1), 196–215 (2013)
Raftery, A.E., Dean, N.: Variable selection for model-based clustering. J. Am.Stat. Assoc. 101(473), 168–178 (2006)
Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978)
Shirkhorshidi, A.S., Aghabozorgi, S., Wah, T.Y., Herawan, T.: Big data clustering: a review. In: Murgante, B., et al. (eds.) ICCSA 2014. LNCS, vol. 8583, pp. 707–720. Springer, Cham (2014). doi:10.1007/978-3-319-09156-3_49
Strehl, A., Ghosh, J.: Cluster ensembles – a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2002)
van Buuren, S., Groothuis-Oudshoorn, K.: MICE: multivariate imputation by chained equations in R. J. Stat. Softw. 45(3), 1–67 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Poon, L.K.M. (2017). Clustering with Multidimensional Mixture Models: Analysis on World Development Indicators. In: Cong, F., Leung, A., Wei, Q. (eds) Advances in Neural Networks - ISNN 2017. ISNN 2017. Lecture Notes in Computer Science(), vol 10261. Springer, Cham. https://doi.org/10.1007/978-3-319-59072-1_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-59072-1_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59071-4
Online ISBN: 978-3-319-59072-1
eBook Packages: Computer ScienceComputer Science (R0)