Skip to main content

Automated Attribute Weighting Fuzzy k-Centers Algorithm for Categorical Data Clustering

  • Conference paper
  • First Online:
Modeling Decisions for Artificial Intelligence (MDAI 2021)

Abstract

Cluster analysis plays an important role in exploring the correlations in data by dividing datasets into separate clusters so that similar objects are located in the same cluster. Moreover, fuzzy cluster analysis can reveal the mixtures of clusters in datasets containing multiple distributions. Certainly, the outcome of clustering methods is approximately determined by the similarity definition. Thus, the similarity measurement is exceedingly important to the formation of fuzzy clusters. In fact, the similarity between two objects is mostly calculated by the mean of differences across multiple dimensions. However, the dissimilarity in some dimensions has little or no effect on the fuzzy clustering outcome. In this study, we explore such impacts for fuzzy clustering of data with categorical attributes. Accordingly, the impact of each attribute on each fuzzy cluster is calculated using an optimizer, and the overlapping dissimilar values are then adjusted by the corresponding weights. We propose to apply this approach to the Fk-centers clustering algorithm, and the experimental results show that our proposed method can achieve higher fuzzy silhouette scores than other related works. These results demonstrate the applicability of deploying of the proposed method in real-world application.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Zhang, Q., Yang, L.T., Chen, Z.: Deep computation model for unsupervised feature learning on big data. IEEE Trans. Serv. Comput. 9(1), 161–171 (2015)

    Google Scholar 

  2. Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16(3), 645–678 (2005)

    Article  Google Scholar 

  3. Huang, Z., Ng, M.K.: A fuzzy \(k\)-modes algorithm for clustering categorical Aata. IEEE Trans. Fuzz. Syst. 7(4), 446–452 (1999)

    Google Scholar 

  4. Campello, R.J., Hruschka, E.R.: A fuzzy extension of the silhouette width criterion for cluster analysis. Fuzzy Sets Syst. 157(21), 2858–2875 (2006)

    Article  MathSciNet  Google Scholar 

  5. Huang, Z.: Extensions to the \(k\)-means algorithm for clustering large data sets with categorical values. Data Min. Knowl. Discov. 2(3), 283–304 (1998)

    Article  Google Scholar 

  6. San, O.M., Huynh, V.-N., Nakamori, Y.: An alternative extension of the \(k\)-means algorithm for clustering categorical data. Int. J. Appl. Math. Comput. Sci. 14, 241–247 (2004)

    MathSciNet  MATH  Google Scholar 

  7. MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, no. 14. Oakland, pp. 281–297 (1967)

    Google Scholar 

  8. Kim, D.-W., Lee, K.H., Lee, D.: Fuzzy clustering of categorical data using fuzzy centroids. Patt. Recogn. Lett. 25(11), 1263–1271 (2004)

    Article  Google Scholar 

  9. Chen, L., Wang, S.: Central clustering of categorical data with automated feature weighting. In: IJCAI, pp. 1260–1266 (2013)

    Google Scholar 

  10. Mau, T.N., Huynh, V.-N.: Kernel-based \(k\)-representatives algorithm for Fuzzy clustering categorical data. In: IEEE International Conference on Fuzzy Systems (2021, Under review)

    Google Scholar 

  11. Liu, H., Wu, J., Liu, T., Tao, D., Fu, Y.: Spectral ensemble clustering via weighted “k"-means: theoretical and practical evidence. IEEET Trans. Knowl. Data Eng. 29(5), 1129–1143 (2017)

    Article  Google Scholar 

  12. Potdar, K., Pardawala, T.S., Pai, C.D.: A comparative study of categorical variable encoding techniques for neural network classifiers. Int. J. Comput. Appl. 175(4), 7–9 (2017)

    Google Scholar 

  13. Qian, Y., Li, F., Liang, J., Liu, B., Dang, C.: Space structure and clustering of categorical data. IEEE Trans. Neural Netw. Learn. Syst. 27(10), 2047–2059 (2015)

    Article  MathSciNet  Google Scholar 

  14. Gan, G., Wu, J., Yang, Z.: A genetic fuzzy \(k\)-modes algorithm for clustering categorical data. Exp. Syst. Appl. 36(2), 1615–1620 (2009)

    Article  Google Scholar 

  15. Mukhopadhyay, A., Maulik, U., Bandyopadhyay, S.: Multiobjective genetic algorithm-based fuzzy clustering of categorical attributes. IEEE Trans. Evol. Comput. 13(5), 991–1005 (2009)

    Article  Google Scholar 

  16. Yang, C.-L., Kuo, R., Chien, C.-H., Quyen, N.T.P.: Non-dominated sorting genetic algorithm using fuzzy membership chromosome for categorical data clustering. Appl. Soft Comput. 30, 113–122 (2015)

    Google Scholar 

  17. Zhu, S., Xu, L.: Many-objective fuzzy centroids clustering algorithm for categorical data. Exp. Syst. Appl. 96, 230–248 (2018)

    Article  Google Scholar 

  18. Dehariya, V.K., Shrivastava, S.K., Jain, R.: Clustering of image data set using \(k\)-means and fuzzy \(k\)-means algorithms. In: 2010 International Conference on Computational Intelligence and Communication Networks, pp. 386–391. IEEE (2010)

    Google Scholar 

  19. Ghosh, S., Dubey, S.K.: Comparative analysis of \(k\)-means and fuzzy \(c\)-means algorithms. Int. J. Adv. Comput. Sci. Appl. 4(4), 36 (2013)

    Google Scholar 

  20. Li, Q., Racine, J.S.: Nonparametric Econometrics: Theory and Practice. Princeton University Press, Princeton (2007)

    Google Scholar 

  21. Lu, Y., Wang, S., Li, S., Zhou, C.: Particle swarm optimizer for variable weighting in clustering high-dimensional data. Mach. Learn. 82(1), 43–70 (2011)

    Article  MathSciNet  Google Scholar 

  22. Frank, A., et al.: UCI machine learning repository, vol. 15, p. 22 (2011). http://archive.ics.uci.edu/ml

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Toan Nguyen Mau .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mau, T.N., Huynh, VN. (2021). Automated Attribute Weighting Fuzzy k-Centers Algorithm for Categorical Data Clustering. In: Torra, V., Narukawa, Y. (eds) Modeling Decisions for Artificial Intelligence. MDAI 2021. Lecture Notes in Computer Science(), vol 12898. Springer, Cham. https://doi.org/10.1007/978-3-030-85529-1_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-85529-1_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-85528-4

  • Online ISBN: 978-3-030-85529-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics