Abstract
Cluster analysis has long played an important role in a broad variety of areas, such as psychology, biology, computer sciences. It has established as a precious tool for marketing and business areas, thanks to its capability to help in decision-making processes. Traditionally, clustering approaches concentrate on purely numerical or categorical data only. An important area of cluster analysis deals with mixed data, composed by both numerical and categorical attributes. Clustering mixed data is not simple, because there is a strong gap between the similarity metrics for these two kind of data. In this review we provide some technical details about the kind of distances that could be used with mixed-data types. Finally, we emphasize as in most applications of cluster analysis practitioners focus either on numeric or categorical variables, lessening the effectiveness of the method as a tool of decision-making.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ahmad, A., Dey, L.: A k-mean clustering algorithm for mixed numeric and categorical data. Data Knowl. Eng. 63, 503–527 (2007)
Andritsos, P., Tsaparas, P., Miller, R.J., Sevcik, K.C.: LIMBO: scalable clustering of categorical data. In: Proceedings of the 9th International Conference on Extending Database Technology, pp. 123–146 (2004)
Barbará, D., Couto, J., Li, Y.: COOLCAT: An entropy-based algorithm for categorical clustering. In: Proceedings of the 11th ACM Conference on Information and Knowledge Management, pp. 582–589 (2002)
Cesario, E., Manco, G., Ortale, R.: Top-down parameter-free clustering of highdimensional categorical data. IEEE Trans. Knowl. Data Eng. 19(12), 1607–1624 (2007)
Chauhan, R., Kaur, H., Alam, M.A.: Data clustering method for discovering clusters in spatial cancer databases. Int. J. Comput. Appl. 10(6), 9–14 (2010)
Cheung, Y., Hong, J.: Categorical-and-numerical-attribute data clustering based on a unified similarity metric without knowing cluster number. Pattern Recogn. 46, 2228–2238 (2013)
Cumbers, A., MacKinnon, D.: Introduction: Clusters in urban and regional development. Urban Stud. 41(5–6), 959–969 (2004)
Di Battista, T., De Sanctis, A., Fortuna, F.: Clustering functional data on convex function spaces. In: Di Battista, T., Moreno, E., Racugno, W. (eds.) Topics on Methodological and Applied Statistical Inference. Studies in Theoretical and Applied Statistics, pp. 105–114. Springer (2016)
Di Battista, T., Fortuna, F.: Clustering dichotomously scored items through functional data analysis. Electron. J. Appl. Stat. Anal. 9(2), 433–450 (2016)
Dzwinel, W., Yuen, D.A., Boryczko, K., Ben-Zion, Y., Yoshioka, S., Ito, T.: Cluster analysis, data-mining, multi-dimensional visualization of earthquakes over space, time and feature space. Nonlinear Processes Geophys. 12, 117–128 (2005)
Edwards, K., Gaber, M.M.: Astronomy and Big Data: A Data Clustering Approach to Identifying Uncertain Galaxy Morphology, 1st edn. Springer, Heidelberg (2014)
Guha, S., Rastogi, R., Shim, K.: ROCK: A robust clustering algorithm for categorical attributes. Inf. Syst. 25(5), 345–366 (2001)
Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques, 3rd edn. Morgan Kaufmann, San Francisco (2011)
Huang, Z.: Clustering large data sets with mixed numeric and categorical values. In: Proceedings in the First Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 21–34 (1997)
Hunt, L., Jorgensen, M.: Clustering mixed data. Wiley Interdisc. Rev.: Data Min. Knowl. Disc. 1(4), 352–361 (2011)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: A Review. ACM Comput. Surv. (CSUR) 31(3), 264–323 (1999)
Jang, W., Hendry, M.: Cluster analysis of massive datasets in astronomy. Stat. Comput. 17(3), 253–262 (2007)
Li, C., Biswas, G.: Unsupervised learning with mixed numeric and nominal data. IEEE Trans. Knowl. Data Eng. 14(14), 673–690 (2002)
MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press, Berkeley (1967)
Milanato, D.: Demand Planning: Processi, metodologie e modelli matematici per la gestione della domanda commerciale, 1st edn. Springer (2008)
Netzel, P., Stepinski, T.: On using a clustering approach for global climate classification. J. Clim. 29(9), 3387–3401 (2016)
Nie, G., Chen, Y., Zhang, L., Guo, Y.: Credit card customer analysis based on panel data clustering. Procedia Comput. Sci. 1(1), 2489–2497 (2010)
Noiva, K., Fernández, J.E., Wescoat Jr., J.L.: Cluster analysis of urban water supply and demand: toward large-scale comparative sustainability planning. Sustain. Cities Soc. 27, 484–496 (2016)
Peng, Y., Kou, G., Shi, Y., Chen, Z.: Improving clustering analysis for credit card accounts classification. In: Proceedings of the 5th International Conference on Computational Science–ICCS 2005, Part III, pp. 548–553. Springer, Heidelberg (2005)
Punj, G., Stewart, D.W.: Cluster analysis in marketing research: Review and suggestions for application. J. Mark. Res. 20(2), 134–148 (1983)
Prasad, D.H., Punithavalli, D.M.: A review on data clustering algorithms for mixed data. Glob. J. Comput. Sci. Technol. 10(5), 43–48 (2010)
Sarumathi, S., Shanthi, N., Vidhya, S., Sharmila, M.: A comprehensive review on different mixed data clustering ensemble methods. Int. J. Comput. Electr. Autom. Control Inf. Eng. 8(8), 1456–1465 (2014)
Srivastava, J., Cooleyz, R., Deshpande, M., Tan, P.: Web usage mining: Discovery and applications of usage patterns from web data. SIGKDD Explor. Newsl. 1, 12–23 (2000)
Tan, P., Steinbach, M., Kumar, V.: Introduction to Data Mining, 1st edn. Pearson, London (2006)
Tishby, N., Pereira, F.C., Bialek, W.: The information bottleneck method. In: Proceedings of the 37th Annual Allerton Conference on Communication, Control and Computing, pp. 368–377 (1999)
Valentini, P., Di Battista, T., Gattone, S.: Heterogeneneity measures in customer satisfaction analysis. J. classifications 28, 38–52 (2011)
Veerappa, V., Letier, E.: Clustering stakeholders for requirements decision making. In: Proceedings of the 17th International Working Conference Requirements Engineering: Foundation for Software Quality, pp. 202–208 (2011)
Wright, C., Burns, T., James, P.: Assertive outreach teams in London: Models of operation. Br. J. Psychiatry 183, 132–138 (2003)
Yeo, A.C., Smith, K.A., Willis, R.J., Brooks, M.: Clustering technique for risk classification and prediction of claim costs in the automobile insurance industry. Intell. Syst. Account. Finance Manage. 10(1), 39–50 (2001)
Zaki, M.J., Peters, M., Assent, I., Seidl, T.: CLICKS: An effective algorithm for mining subspace clusters in categorical datasets. Data Knowl. Eng. 60, 51–70 (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Caruso, G., Gattone, S.A., Fortuna, F., Di Battista, T. (2018). Cluster Analysis as a Decision-Making Tool: A Methodological Review. In: Bucciarelli, E., Chen, SH., Corchado, J. (eds) Decision Economics: In the Tradition of Herbert A. Simon's Heritage. DCAI 2017. Advances in Intelligent Systems and Computing, vol 618. Springer, Cham. https://doi.org/10.1007/978-3-319-60882-2_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-60882-2_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-60881-5
Online ISBN: 978-3-319-60882-2
eBook Packages: EngineeringEngineering (R0)