Skip to main content

Clustering with Multidimensional Mixture Models: Analysis on World Development Indicators

  • Conference paper
  • First Online:
Book cover Advances in Neural Networks - ISNN 2017 (ISNN 2017)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10261))

Included in the following conference series:

Abstract

Clustering is one of the core problems in machine learning. Many clustering algorithms aim to partition data along a single dimension. This approach may become inappropriate when data has higher dimension and is multifaceted. This paper introduces a class of mixture models with multiple dimensions called pouch latent tree models. We use them to perform cluster analysis on a data set consisting of 75 development indicators for 133 countries. We further propose a method that guides the selection of clustering variables due to the existence of multiple latent variables. The analysis results demonstrate that some interesting clusterings of countries can be obtained from mixture models with multiple dimensions but not those with single dimensions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://data.worldbank.org/data-catalog/world-development-report-2014.

  2. 2.

    The NMI can be computed using the empirical distribution after discretizing the continuous attributes.

References

  1. Banfield, J.D., Raftery, A.E.: Model-based Gaussian and non-Gaussian clustering. Biometrics 49(3), 803–821 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  2. Bouveyrona, C., Brunet-Saumard, C.: Model-based clustering of high-dimensional data: a review. Comput. Stat. Data Anal. 71, 52–78 (2014)

    Article  MathSciNet  Google Scholar 

  3. Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, Hoboken (2006)

    MATH  Google Scholar 

  4. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B (Methodol.) 39(1), 1–38 (1977)

    MathSciNet  MATH  Google Scholar 

  5. Dy, J.G., Brodley, C.E.: Feature selection for unsupervised learning. J. Mach. Learn. Res. 5, 845–889 (2004)

    MathSciNet  MATH  Google Scholar 

  6. Fraley, C., Raftery, A.E., Murphy, T.B., Scrucca, L.: MCLUST version 4 for R: normal mixture modeling for model-based clustering, classification, and density estimation. Department of Statistics, University of Washington, Technical report (2012)

    Google Scholar 

  7. Galimberti, G., Soffritti, G.: Model-based methods to identify multiple cluster structures in a data set. Comput. Stat. Data Anal. 52, 520–536 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  8. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)

    Article  Google Scholar 

  9. Kriegel, H.P., Kröger, P., Zimek, A.: Clustering high dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans. Knowl. Discov. Data 3(1), 1–58 (2009)

    Article  Google Scholar 

  10. McLachlan, G.J., Peel, D.: Finite Mixture Models. Wiley, New York (2000)

    Book  MATH  Google Scholar 

  11. Parsons, L., Haque, E., Liu, H.: Subspace clustering for high dimensional data: a review. ACM SIGKDD Explor. Newsl. 6(1), 90–105 (2004)

    Article  Google Scholar 

  12. Poon, L.K.M., Zhang, N.L., Chen, T., Wang, Y.: Variable selection in model-based clustering: to do or to facilitate. In: Proceedings of the 27th International Conference on Machine Learning, pp. 887–894 (2010)

    Google Scholar 

  13. Poon, L.K.M., Zhang, N.L., Liu, T., Liu, A.H.: Model-based clustering of high-dimensional data: variable selection versus facet determination. Int. J. Approx. Reason. 54(1), 196–215 (2013)

    Article  MATH  Google Scholar 

  14. Raftery, A.E., Dean, N.: Variable selection for model-based clustering. J. Am.Stat. Assoc. 101(473), 168–178 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  15. Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  16. Shirkhorshidi, A.S., Aghabozorgi, S., Wah, T.Y., Herawan, T.: Big data clustering: a review. In: Murgante, B., et al. (eds.) ICCSA 2014. LNCS, vol. 8583, pp. 707–720. Springer, Cham (2014). doi:10.1007/978-3-319-09156-3_49

    Google Scholar 

  17. Strehl, A., Ghosh, J.: Cluster ensembles – a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2002)

    MathSciNet  MATH  Google Scholar 

  18. van Buuren, S., Groothuis-Oudshoorn, K.: MICE: multivariate imputation by chained equations in R. J. Stat. Softw. 45(3), 1–67 (2011)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Leonard K. M. Poon .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Poon, L.K.M. (2017). Clustering with Multidimensional Mixture Models: Analysis on World Development Indicators. In: Cong, F., Leung, A., Wei, Q. (eds) Advances in Neural Networks - ISNN 2017. ISNN 2017. Lecture Notes in Computer Science(), vol 10261. Springer, Cham. https://doi.org/10.1007/978-3-319-59072-1_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-59072-1_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-59071-4

  • Online ISBN: 978-3-319-59072-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics