Abstract
Cluster analysis plays an important role in exploring the correlations in data by dividing datasets into separate clusters so that similar objects are located in the same cluster. Moreover, fuzzy cluster analysis can reveal the mixtures of clusters in datasets containing multiple distributions. Certainly, the outcome of clustering methods is approximately determined by the similarity definition. Thus, the similarity measurement is exceedingly important to the formation of fuzzy clusters. In fact, the similarity between two objects is mostly calculated by the mean of differences across multiple dimensions. However, the dissimilarity in some dimensions has little or no effect on the fuzzy clustering outcome. In this study, we explore such impacts for fuzzy clustering of data with categorical attributes. Accordingly, the impact of each attribute on each fuzzy cluster is calculated using an optimizer, and the overlapping dissimilar values are then adjusted by the corresponding weights. We propose to apply this approach to the Fk-centers clustering algorithm, and the experimental results show that our proposed method can achieve higher fuzzy silhouette scores than other related works. These results demonstrate the applicability of deploying of the proposed method in real-world application.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Zhang, Q., Yang, L.T., Chen, Z.: Deep computation model for unsupervised feature learning on big data. IEEE Trans. Serv. Comput. 9(1), 161–171 (2015)
Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16(3), 645–678 (2005)
Huang, Z., Ng, M.K.: A fuzzy \(k\)-modes algorithm for clustering categorical Aata. IEEE Trans. Fuzz. Syst. 7(4), 446–452 (1999)
Campello, R.J., Hruschka, E.R.: A fuzzy extension of the silhouette width criterion for cluster analysis. Fuzzy Sets Syst. 157(21), 2858–2875 (2006)
Huang, Z.: Extensions to the \(k\)-means algorithm for clustering large data sets with categorical values. Data Min. Knowl. Discov. 2(3), 283–304 (1998)
San, O.M., Huynh, V.-N., Nakamori, Y.: An alternative extension of the \(k\)-means algorithm for clustering categorical data. Int. J. Appl. Math. Comput. Sci. 14, 241–247 (2004)
MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, no. 14. Oakland, pp. 281–297 (1967)
Kim, D.-W., Lee, K.H., Lee, D.: Fuzzy clustering of categorical data using fuzzy centroids. Patt. Recogn. Lett. 25(11), 1263–1271 (2004)
Chen, L., Wang, S.: Central clustering of categorical data with automated feature weighting. In: IJCAI, pp. 1260–1266 (2013)
Mau, T.N., Huynh, V.-N.: Kernel-based \(k\)-representatives algorithm for Fuzzy clustering categorical data. In: IEEE International Conference on Fuzzy Systems (2021, Under review)
Liu, H., Wu, J., Liu, T., Tao, D., Fu, Y.: Spectral ensemble clustering via weighted “k"-means: theoretical and practical evidence. IEEET Trans. Knowl. Data Eng. 29(5), 1129–1143 (2017)
Potdar, K., Pardawala, T.S., Pai, C.D.: A comparative study of categorical variable encoding techniques for neural network classifiers. Int. J. Comput. Appl. 175(4), 7–9 (2017)
Qian, Y., Li, F., Liang, J., Liu, B., Dang, C.: Space structure and clustering of categorical data. IEEE Trans. Neural Netw. Learn. Syst. 27(10), 2047–2059 (2015)
Gan, G., Wu, J., Yang, Z.: A genetic fuzzy \(k\)-modes algorithm for clustering categorical data. Exp. Syst. Appl. 36(2), 1615–1620 (2009)
Mukhopadhyay, A., Maulik, U., Bandyopadhyay, S.: Multiobjective genetic algorithm-based fuzzy clustering of categorical attributes. IEEE Trans. Evol. Comput. 13(5), 991–1005 (2009)
Yang, C.-L., Kuo, R., Chien, C.-H., Quyen, N.T.P.: Non-dominated sorting genetic algorithm using fuzzy membership chromosome for categorical data clustering. Appl. Soft Comput. 30, 113–122 (2015)
Zhu, S., Xu, L.: Many-objective fuzzy centroids clustering algorithm for categorical data. Exp. Syst. Appl. 96, 230–248 (2018)
Dehariya, V.K., Shrivastava, S.K., Jain, R.: Clustering of image data set using \(k\)-means and fuzzy \(k\)-means algorithms. In: 2010 International Conference on Computational Intelligence and Communication Networks, pp. 386–391. IEEE (2010)
Ghosh, S., Dubey, S.K.: Comparative analysis of \(k\)-means and fuzzy \(c\)-means algorithms. Int. J. Adv. Comput. Sci. Appl. 4(4), 36 (2013)
Li, Q., Racine, J.S.: Nonparametric Econometrics: Theory and Practice. Princeton University Press, Princeton (2007)
Lu, Y., Wang, S., Li, S., Zhou, C.: Particle swarm optimizer for variable weighting in clustering high-dimensional data. Mach. Learn. 82(1), 43–70 (2011)
Frank, A., et al.: UCI machine learning repository, vol. 15, p. 22 (2011). http://archive.ics.uci.edu/ml
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Mau, T.N., Huynh, VN. (2021). Automated Attribute Weighting Fuzzy k-Centers Algorithm for Categorical Data Clustering. In: Torra, V., Narukawa, Y. (eds) Modeling Decisions for Artificial Intelligence. MDAI 2021. Lecture Notes in Computer Science(), vol 12898. Springer, Cham. https://doi.org/10.1007/978-3-030-85529-1_17
Download citation
DOI: https://doi.org/10.1007/978-3-030-85529-1_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-85528-4
Online ISBN: 978-3-030-85529-1
eBook Packages: Computer ScienceComputer Science (R0)