Cluster Analysis as a Decision-Making Tool: A Methodological Review

Caruso, Giulia; Gattone, Stefano Antonio; Fortuna, Francesca; Di Battista, Tonio

doi:10.1007/978-3-319-60882-2_6

Giulia Caruso¹⁷,
Stefano Antonio Gattone¹⁷,
Francesca Fortuna¹⁷ &
…
Tonio Di Battista¹⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 618))

Included in the following conference series:

International Symposium on Distributed Computing and Artificial Intelligence

955 Accesses
9 Citations

Abstract

Cluster analysis has long played an important role in a broad variety of areas, such as psychology, biology, computer sciences. It has established as a precious tool for marketing and business areas, thanks to its capability to help in decision-making processes. Traditionally, clustering approaches concentrate on purely numerical or categorical data only. An important area of cluster analysis deals with mixed data, composed by both numerical and categorical attributes. Clustering mixed data is not simple, because there is a strong gap between the similarity metrics for these two kind of data. In this review we provide some technical details about the kind of distances that could be used with mixed-data types. Finally, we emphasize as in most applications of cluster analysis practitioners focus either on numeric or categorical variables, lessening the effectiveness of the method as a tool of decision-making.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ahmad, A., Dey, L.: A k-mean clustering algorithm for mixed numeric and categorical data. Data Knowl. Eng. 63, 503–527 (2007)
Article Google Scholar
Andritsos, P., Tsaparas, P., Miller, R.J., Sevcik, K.C.: LIMBO: scalable clustering of categorical data. In: Proceedings of the 9th International Conference on Extending Database Technology, pp. 123–146 (2004)
Google Scholar
Barbará, D., Couto, J., Li, Y.: COOLCAT: An entropy-based algorithm for categorical clustering. In: Proceedings of the 11th ACM Conference on Information and Knowledge Management, pp. 582–589 (2002)
Google Scholar
Cesario, E., Manco, G., Ortale, R.: Top-down parameter-free clustering of highdimensional categorical data. IEEE Trans. Knowl. Data Eng. 19(12), 1607–1624 (2007)
Article Google Scholar
Chauhan, R., Kaur, H., Alam, M.A.: Data clustering method for discovering clusters in spatial cancer databases. Int. J. Comput. Appl. 10(6), 9–14 (2010)
Google Scholar
Cheung, Y., Hong, J.: Categorical-and-numerical-attribute data clustering based on a unified similarity metric without knowing cluster number. Pattern Recogn. 46, 2228–2238 (2013)
Article MATH Google Scholar
Cumbers, A., MacKinnon, D.: Introduction: Clusters in urban and regional development. Urban Stud. 41(5–6), 959–969 (2004)
Article Google Scholar
Di Battista, T., De Sanctis, A., Fortuna, F.: Clustering functional data on convex function spaces. In: Di Battista, T., Moreno, E., Racugno, W. (eds.) Topics on Methodological and Applied Statistical Inference. Studies in Theoretical and Applied Statistics, pp. 105–114. Springer (2016)
Google Scholar
Di Battista, T., Fortuna, F.: Clustering dichotomously scored items through functional data analysis. Electron. J. Appl. Stat. Anal. 9(2), 433–450 (2016)
MathSciNet Google Scholar
Dzwinel, W., Yuen, D.A., Boryczko, K., Ben-Zion, Y., Yoshioka, S., Ito, T.: Cluster analysis, data-mining, multi-dimensional visualization of earthquakes over space, time and feature space. Nonlinear Processes Geophys. 12, 117–128 (2005)
Article Google Scholar
Edwards, K., Gaber, M.M.: Astronomy and Big Data: A Data Clustering Approach to Identifying Uncertain Galaxy Morphology, 1st edn. Springer, Heidelberg (2014)
Book Google Scholar
Guha, S., Rastogi, R., Shim, K.: ROCK: A robust clustering algorithm for categorical attributes. Inf. Syst. 25(5), 345–366 (2001)
Article Google Scholar
Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques, 3rd edn. Morgan Kaufmann, San Francisco (2011)
MATH Google Scholar
Huang, Z.: Clustering large data sets with mixed numeric and categorical values. In: Proceedings in the First Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 21–34 (1997)
Google Scholar
Hunt, L., Jorgensen, M.: Clustering mixed data. Wiley Interdisc. Rev.: Data Min. Knowl. Disc. 1(4), 352–361 (2011)
Google Scholar
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: A Review. ACM Comput. Surv. (CSUR) 31(3), 264–323 (1999)
Article Google Scholar
Jang, W., Hendry, M.: Cluster analysis of massive datasets in astronomy. Stat. Comput. 17(3), 253–262 (2007)
Article MathSciNet Google Scholar
Li, C., Biswas, G.: Unsupervised learning with mixed numeric and nominal data. IEEE Trans. Knowl. Data Eng. 14(14), 673–690 (2002)
Article Google Scholar
MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press, Berkeley (1967)
Google Scholar
Milanato, D.: Demand Planning: Processi, metodologie e modelli matematici per la gestione della domanda commerciale, 1st edn. Springer (2008)
Google Scholar
Netzel, P., Stepinski, T.: On using a clustering approach for global climate classification. J. Clim. 29(9), 3387–3401 (2016)
Article Google Scholar
Nie, G., Chen, Y., Zhang, L., Guo, Y.: Credit card customer analysis based on panel data clustering. Procedia Comput. Sci. 1(1), 2489–2497 (2010)
Article Google Scholar
Noiva, K., Fernández, J.E., Wescoat Jr., J.L.: Cluster analysis of urban water supply and demand: toward large-scale comparative sustainability planning. Sustain. Cities Soc. 27, 484–496 (2016)
Article Google Scholar
Peng, Y., Kou, G., Shi, Y., Chen, Z.: Improving clustering analysis for credit card accounts classification. In: Proceedings of the 5th International Conference on Computational Science–ICCS 2005, Part III, pp. 548–553. Springer, Heidelberg (2005)
Google Scholar
Punj, G., Stewart, D.W.: Cluster analysis in marketing research: Review and suggestions for application. J. Mark. Res. 20(2), 134–148 (1983)
Article Google Scholar
Prasad, D.H., Punithavalli, D.M.: A review on data clustering algorithms for mixed data. Glob. J. Comput. Sci. Technol. 10(5), 43–48 (2010)
Google Scholar
Sarumathi, S., Shanthi, N., Vidhya, S., Sharmila, M.: A comprehensive review on different mixed data clustering ensemble methods. Int. J. Comput. Electr. Autom. Control Inf. Eng. 8(8), 1456–1465 (2014)
Google Scholar
Srivastava, J., Cooleyz, R., Deshpande, M., Tan, P.: Web usage mining: Discovery and applications of usage patterns from web data. SIGKDD Explor. Newsl. 1, 12–23 (2000)
Article Google Scholar
Tan, P., Steinbach, M., Kumar, V.: Introduction to Data Mining, 1st edn. Pearson, London (2006)
Google Scholar
Tishby, N., Pereira, F.C., Bialek, W.: The information bottleneck method. In: Proceedings of the 37th Annual Allerton Conference on Communication, Control and Computing, pp. 368–377 (1999)
Google Scholar
Valentini, P., Di Battista, T., Gattone, S.: Heterogeneneity measures in customer satisfaction analysis. J. classifications 28, 38–52 (2011)
Article Google Scholar
Veerappa, V., Letier, E.: Clustering stakeholders for requirements decision making. In: Proceedings of the 17th International Working Conference Requirements Engineering: Foundation for Software Quality, pp. 202–208 (2011)
Google Scholar
Wright, C., Burns, T., James, P.: Assertive outreach teams in London: Models of operation. Br. J. Psychiatry 183, 132–138 (2003)
Article Google Scholar
Yeo, A.C., Smith, K.A., Willis, R.J., Brooks, M.: Clustering technique for risk classification and prediction of claim costs in the automobile insurance industry. Intell. Syst. Account. Finance Manage. 10(1), 39–50 (2001)
Article Google Scholar
Zaki, M.J., Peters, M., Assent, I., Seidl, T.: CLICKS: An effective algorithm for mining subspace clusters in categorical datasets. Data Knowl. Eng. 60, 51–70 (2007)
Article Google Scholar

Download references

Author information

Authors and Affiliations

University G. d’Annunzio, Chieti-Pescara, 66100, Chieti, Italy
Giulia Caruso, Stefano Antonio Gattone, Francesca Fortuna & Tonio Di Battista

Authors

Giulia Caruso
View author publications
You can also search for this author in PubMed Google Scholar
Stefano Antonio Gattone
View author publications
You can also search for this author in PubMed Google Scholar
Francesca Fortuna
View author publications
You can also search for this author in PubMed Google Scholar
Tonio Di Battista
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Giulia Caruso .

Editor information

Editors and Affiliations

Department of Philosophical, Pedagogical, and Economic-Quantitative Sciences, Section of Economics and Quantitative Methods, University of Chieti-Pescara, Pescara, Italy
Edgardo Bucciarelli
Department of Economics, National Chengchi University, Taipei, Taiwan
Shu-Heng Chen
Departamento de Informática y Automática, Universidad de Salamanca, Salamanca, Spain
Juan M. Corchado

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Caruso, G., Gattone, S.A., Fortuna, F., Di Battista, T. (2018). Cluster Analysis as a Decision-Making Tool: A Methodological Review. In: Bucciarelli, E., Chen, SH., Corchado, J. (eds) Decision Economics: In the Tradition of Herbert A. Simon's Heritage. DCAI 2017. Advances in Intelligent Systems and Computing, vol 618. Springer, Cham. https://doi.org/10.1007/978-3-319-60882-2_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-60882-2_6
Published: 14 June 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-60881-5
Online ISBN: 978-3-319-60882-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics