Skip to main content

Cluster Analysis as a Decision-Making Tool: A Methodological Review

  • Conference paper
  • First Online:
Decision Economics: In the Tradition of Herbert A. Simon's Heritage (DCAI 2017)

Abstract

Cluster analysis has long played an important role in a broad variety of areas, such as psychology, biology, computer sciences. It has established as a precious tool for marketing and business areas, thanks to its capability to help in decision-making processes. Traditionally, clustering approaches concentrate on purely numerical or categorical data only. An important area of cluster analysis deals with mixed data, composed by both numerical and categorical attributes. Clustering mixed data is not simple, because there is a strong gap between the similarity metrics for these two kind of data. In this review we provide some technical details about the kind of distances that could be used with mixed-data types. Finally, we emphasize as in most applications of cluster analysis practitioners focus either on numeric or categorical variables, lessening the effectiveness of the method as a tool of decision-making.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ahmad, A., Dey, L.: A k-mean clustering algorithm for mixed numeric and categorical data. Data Knowl. Eng. 63, 503–527 (2007)

    Article  Google Scholar 

  2. Andritsos, P., Tsaparas, P., Miller, R.J., Sevcik, K.C.: LIMBO: scalable clustering of categorical data. In: Proceedings of the 9th International Conference on Extending Database Technology, pp. 123–146 (2004)

    Google Scholar 

  3. Barbará, D., Couto, J., Li, Y.: COOLCAT: An entropy-based algorithm for categorical clustering. In: Proceedings of the 11th ACM Conference on Information and Knowledge Management, pp. 582–589 (2002)

    Google Scholar 

  4. Cesario, E., Manco, G., Ortale, R.: Top-down parameter-free clustering of highdimensional categorical data. IEEE Trans. Knowl. Data Eng. 19(12), 1607–1624 (2007)

    Article  Google Scholar 

  5. Chauhan, R., Kaur, H., Alam, M.A.: Data clustering method for discovering clusters in spatial cancer databases. Int. J. Comput. Appl. 10(6), 9–14 (2010)

    Google Scholar 

  6. Cheung, Y., Hong, J.: Categorical-and-numerical-attribute data clustering based on a unified similarity metric without knowing cluster number. Pattern Recogn. 46, 2228–2238 (2013)

    Article  MATH  Google Scholar 

  7. Cumbers, A., MacKinnon, D.: Introduction: Clusters in urban and regional development. Urban Stud. 41(5–6), 959–969 (2004)

    Article  Google Scholar 

  8. Di Battista, T., De Sanctis, A., Fortuna, F.: Clustering functional data on convex function spaces. In: Di Battista, T., Moreno, E., Racugno, W. (eds.) Topics on Methodological and Applied Statistical Inference. Studies in Theoretical and Applied Statistics, pp. 105–114. Springer (2016)

    Google Scholar 

  9. Di Battista, T., Fortuna, F.: Clustering dichotomously scored items through functional data analysis. Electron. J. Appl. Stat. Anal. 9(2), 433–450 (2016)

    MathSciNet  Google Scholar 

  10. Dzwinel, W., Yuen, D.A., Boryczko, K., Ben-Zion, Y., Yoshioka, S., Ito, T.: Cluster analysis, data-mining, multi-dimensional visualization of earthquakes over space, time and feature space. Nonlinear Processes Geophys. 12, 117–128 (2005)

    Article  Google Scholar 

  11. Edwards, K., Gaber, M.M.: Astronomy and Big Data: A Data Clustering Approach to Identifying Uncertain Galaxy Morphology, 1st edn. Springer, Heidelberg (2014)

    Book  Google Scholar 

  12. Guha, S., Rastogi, R., Shim, K.: ROCK: A robust clustering algorithm for categorical attributes. Inf. Syst. 25(5), 345–366 (2001)

    Article  Google Scholar 

  13. Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques, 3rd edn. Morgan Kaufmann, San Francisco (2011)

    MATH  Google Scholar 

  14. Huang, Z.: Clustering large data sets with mixed numeric and categorical values. In: Proceedings in the First Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 21–34 (1997)

    Google Scholar 

  15. Hunt, L., Jorgensen, M.: Clustering mixed data. Wiley Interdisc. Rev.: Data Min. Knowl. Disc. 1(4), 352–361 (2011)

    Google Scholar 

  16. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: A Review. ACM Comput. Surv. (CSUR) 31(3), 264–323 (1999)

    Article  Google Scholar 

  17. Jang, W., Hendry, M.: Cluster analysis of massive datasets in astronomy. Stat. Comput. 17(3), 253–262 (2007)

    Article  MathSciNet  Google Scholar 

  18. Li, C., Biswas, G.: Unsupervised learning with mixed numeric and nominal data. IEEE Trans. Knowl. Data Eng. 14(14), 673–690 (2002)

    Article  Google Scholar 

  19. MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press, Berkeley (1967)

    Google Scholar 

  20. Milanato, D.: Demand Planning: Processi, metodologie e modelli matematici per la gestione della domanda commerciale, 1st edn. Springer (2008)

    Google Scholar 

  21. Netzel, P., Stepinski, T.: On using a clustering approach for global climate classification. J. Clim. 29(9), 3387–3401 (2016)

    Article  Google Scholar 

  22. Nie, G., Chen, Y., Zhang, L., Guo, Y.: Credit card customer analysis based on panel data clustering. Procedia Comput. Sci. 1(1), 2489–2497 (2010)

    Article  Google Scholar 

  23. Noiva, K., Fernández, J.E., Wescoat Jr., J.L.: Cluster analysis of urban water supply and demand: toward large-scale comparative sustainability planning. Sustain. Cities Soc. 27, 484–496 (2016)

    Article  Google Scholar 

  24. Peng, Y., Kou, G., Shi, Y., Chen, Z.: Improving clustering analysis for credit card accounts classification. In: Proceedings of the 5th International Conference on Computational Science–ICCS 2005, Part III, pp. 548–553. Springer, Heidelberg (2005)

    Google Scholar 

  25. Punj, G., Stewart, D.W.: Cluster analysis in marketing research: Review and suggestions for application. J. Mark. Res. 20(2), 134–148 (1983)

    Article  Google Scholar 

  26. Prasad, D.H., Punithavalli, D.M.: A review on data clustering algorithms for mixed data. Glob. J. Comput. Sci. Technol. 10(5), 43–48 (2010)

    Google Scholar 

  27. Sarumathi, S., Shanthi, N., Vidhya, S., Sharmila, M.: A comprehensive review on different mixed data clustering ensemble methods. Int. J. Comput. Electr. Autom. Control Inf. Eng. 8(8), 1456–1465 (2014)

    Google Scholar 

  28. Srivastava, J., Cooleyz, R., Deshpande, M., Tan, P.: Web usage mining: Discovery and applications of usage patterns from web data. SIGKDD Explor. Newsl. 1, 12–23 (2000)

    Article  Google Scholar 

  29. Tan, P., Steinbach, M., Kumar, V.: Introduction to Data Mining, 1st edn. Pearson, London (2006)

    Google Scholar 

  30. Tishby, N., Pereira, F.C., Bialek, W.: The information bottleneck method. In: Proceedings of the 37th Annual Allerton Conference on Communication, Control and Computing, pp. 368–377 (1999)

    Google Scholar 

  31. Valentini, P., Di Battista, T., Gattone, S.: Heterogeneneity measures in customer satisfaction analysis. J. classifications 28, 38–52 (2011)

    Article  Google Scholar 

  32. Veerappa, V., Letier, E.: Clustering stakeholders for requirements decision making. In: Proceedings of the 17th International Working Conference Requirements Engineering: Foundation for Software Quality, pp. 202–208 (2011)

    Google Scholar 

  33. Wright, C., Burns, T., James, P.: Assertive outreach teams in London: Models of operation. Br. J. Psychiatry 183, 132–138 (2003)

    Article  Google Scholar 

  34. Yeo, A.C., Smith, K.A., Willis, R.J., Brooks, M.: Clustering technique for risk classification and prediction of claim costs in the automobile insurance industry. Intell. Syst. Account. Finance Manage. 10(1), 39–50 (2001)

    Article  Google Scholar 

  35. Zaki, M.J., Peters, M., Assent, I., Seidl, T.: CLICKS: An effective algorithm for mining subspace clusters in categorical datasets. Data Knowl. Eng. 60, 51–70 (2007)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Giulia Caruso .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Cite this paper

Caruso, G., Gattone, S.A., Fortuna, F., Di Battista, T. (2018). Cluster Analysis as a Decision-Making Tool: A Methodological Review. In: Bucciarelli, E., Chen, SH., Corchado, J. (eds) Decision Economics: In the Tradition of Herbert A. Simon's Heritage. DCAI 2017. Advances in Intelligent Systems and Computing, vol 618. Springer, Cham. https://doi.org/10.1007/978-3-319-60882-2_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-60882-2_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-60881-5

  • Online ISBN: 978-3-319-60882-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics