Skip to main content

Cluster Center Initialization for Fuzzy K-Modes Clustering Using Outlier Detection Technique

  • Conference paper
  • First Online:
Pattern Recognition and Computer Vision (PRCV 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15031))

Included in the following conference series:

  • 230 Accesses

Abstract

The fuzzy K-modes clustering algorithm is an extension of the fuzzy K-means clustering algorithm, which can handle massive categorical data. However, the quality of the initial cluster centers (or called initial centers) may significantly affect the results of fuzzy K-modes clustering. In many cases, poor clustering results may occur due to unsuitable initial centers. Therefore, the selection of initial centers, that is, cluster center initialization (CCI), is a key issue in fuzzy K-modes clustering. This paper deals with the CCI problem of fuzzy K-modes clustering from the perspective of outlier detection, and proposes a cluster center initialization algorithm (CCI_DOFD), for fuzzy K-modes clustering. CCI_DOFD selects initial centers by virtue of the distance outlier factor of each object, the density of each object and the distances between objects. By considering the distance outlier factor, CCI_DOFD can avoid the problem that an outlier is selected as the initial center. Moreover, when calculating the density of each object and the distances between objects, CCI_DOFD assigns different weights to different attributes according to the significance of each attribute, which can effectively reflect the difference between different attributes. Experimental results on several UCI data sets demonstrate the effectiveness of our algorithm for the CCI of fuzzy K-modes clustering.

Supported by organization x.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Li, W., Wang, Z., Sun, W., Bahrami, S.: An ensemble clustering framework based on hierarchical clustering ensemble selection and clusters clustering. Cybern. Syst. 54(5), 741–766 (2023)

    Article  Google Scholar 

  2. Zhang, J., Fan, R., Tao, H., Jiang, J.C., Hou, C.P.: Constrained clustering with weak label prior. Front. Comp. Sci. 18(3), 183338 (2024)

    Article  Google Scholar 

  3. Agarwal, S., and Reddy C.R.K.: A smart intelligent approach based on hybrid group search and pelican optimization algorithm for data stream clustering. Knowl. Inf. Syst. 1–34 (2023)

    Google Scholar 

  4. Zhou, B., Lu, B., Saeidlou, S.: A hybrid clustering method based on the several diverse basic clustering and meta-clustering aggregation technique. Cybern. Syst. 55(1), 203–229 (2024)

    Article  Google Scholar 

  5. Bai, L., Liang, J.Y., Sui, C., Dang, C.Y.: Fast global K-means clustering based on local geometrical information. Inf. Sci. 245, 168–180 (2013)

    Article  MathSciNet  Google Scholar 

  6. Huang, Z.X.: Extensions to the K-means algorithm for clustering large data sets with categorical values. Data Min. Knowl. Disc. 2(3), 283–304 (1998)

    Article  Google Scholar 

  7. Huang, Z.X., Ng, M.K.: A fuzzy K-modes algorithm for clustering categorical data. IEEE Trans. Fuzzy Syst. 7(4), 446–452 (1999)

    Article  Google Scholar 

  8. Wu, S., Jiang, Q.S., Huang, J.Z.: A new initialization method for clustering categorical data. In: 11th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, pp. 972–980. Springer, Heidelberg (2007)

    Google Scholar 

  9. Cao, F.Y., Liang, J.Y., Bai, L.: A new initialization method for categorical data clustering. Expert Syst. Appl. 36(7), 10223–10228 (2009)

    Article  Google Scholar 

  10. Bai, L., Liang, J.Y., Dang, C.Y.: An initialization method to simultaneously find initial cluster centers and the number of clusters for clustering categorical data. Knowl.-Based Syst. 24(6), 785–795 (2011)

    Article  Google Scholar 

  11. Khan, S.S., Ahmad, A.: Cluster center initialization algorithm for K-modes clustering. Expert Syst. Appl. 40(18), 7444–7456 (2013)

    Article  Google Scholar 

  12. Kumar, A., Kumar, S.: A support based initialization algorithm for categorical data clustering. J. Inf. Technol. Res. 11(2), 53–67 (2018)

    Article  Google Scholar 

  13. Li, M.S., Zhou, Y.H., Tang, W.R., Lu, L.F.: K-modes based categorical data clustering algorithms satisfying differential privacy. In: 5th International Conference on Networking and Network Applications, pp. 86–91. IEEE, New York (2020)

    Google Scholar 

  14. Li, D., Xue, H.F., Zhang, W.Y., Zhang, Y.: Categorical data clustering method based on improved fruit fly optimization algorithm. In: 3th International Conference on Intelligent and Interactive Systems and Applications, pp. 736–744. Springer Heidelberg (2018)

    Google Scholar 

  15. Peng, L.W., Liu, Y.G.: Attribute weights-based clustering centers algorithm for initialising K-modes clustering. Clust. Comput. 22(3), 6171–6179 (2019)

    Article  MathSciNet  Google Scholar 

  16. Sajidha, S.A., Chodnekar, S.P., Desikan, K.: Initial seed selection for K-modes clustering. A distance and density based approach. J. King Saud University- Comput. Inf. Sci. 33(6), 693–701 (2021)

    Google Scholar 

  17. Dinh, D.T., Huynh, V.N.: k-PbC: an improved cluster center initialization for categorical data clustering. Appl. Intell. 50(8), 2610–2632 (2020)

    Article  Google Scholar 

  18. Knorr, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large datasets. In: 24th International Conference on Very Large Data Bases, pp. 392–403. Morgan Kaufmann Publishers San Francisco (1998)

    Google Scholar 

  19. Pawlak, Z.: Rough Sets: Theoretical Aspects of Reasoning About Data. Kluwer Academic Publishers, Dordrecht (1991)

    Book  Google Scholar 

  20. Düntsch, I., Gediga, G.: Uncertainty measures of rough set prediction. Artif. Intell. 106(1), 109–137 (1998)

    Article  MathSciNet  Google Scholar 

  21. Liang, J.Y., Bai, L., Cao, F.Y.: K-modes clustering algorithm based on a new distance measure. J. Comput. Res. Devel. 47(10), 1749–1755 (2010)

    Google Scholar 

  22. Gong, X.Y., Cao, K., Jia, P.T., Gong, S.F.: K-modes algorithm based on rough set and information entropy. In: 3rd International Symposium on Power Electronics and Control Engineering, pp. 012239. IOP Publishing Bristol (2020)

    Google Scholar 

  23. Nataliani, Y., Yang, M.S.: Feature-weighted fuzzy K-modes clustering. In: 3rd International Conference on Intelligent Systems, pp. 63–68. ACM New York (2019)

    Google Scholar 

  24. Dai, Y.W., Yuan, G.H., Yang, Z.Y., Wang, B.: K-modes clustering algorithm based on weighted overlap distance and its application in intrusion detection. Sci. Program. 2021, 1–9 (2021)

    Google Scholar 

  25. Xu, Z.Y., Liu, Z.P., Yang, B.R., S, W.: A quick attribute reduction algorithm with complexity of max(O(\(|C,: U|\)), O(\(|C|^{2}|U/C|\))). Chin. J. Comput. 29(3), 391–399 (2006)

    Google Scholar 

  26. Jiang, F., Yu, X., Du, J.W., Gong, D.W., Zhang, Y.Q., Peng, Y.J.: Ensemble learning based on approximate reducts and bootstrap sampling. Inf. Sci. 547, 797–813 (2021)

    Article  Google Scholar 

  27. Dolatshah, M., Hadian, A., Minaei-Bidgoli, B.: Ball*-tree: efficient spatial indexing for constrained nearest-neighbor search in metric spaces. arXiv:1511.00628 (2015)

  28. Bache, K., Lichman, M.: UCI machine learning repository. http://archive.ics.uci.edu/ml. Accessed 15 Oct 2022

  29. Witten, I.H., Frank, E., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques, 3rd edn. Morgan Kaufmann Publishers, San Francisco (2011)

    Google Scholar 

  30. Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)

    MathSciNet  Google Scholar 

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (grant nos. 61973180, 62202253), and the Natural Science Foundation of Shandong Province, China (grant nos. ZR2022MF326, ZR2021QF074, ZR2021MF092).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Feng Jiang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sha, Y., Du, J., Yang, Z., Jiang, F. (2025). Cluster Center Initialization for Fuzzy K-Modes Clustering Using Outlier Detection Technique. In: Lin, Z., et al. Pattern Recognition and Computer Vision. PRCV 2024. Lecture Notes in Computer Science, vol 15031. Springer, Singapore. https://doi.org/10.1007/978-981-97-8487-5_1

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-8487-5_1

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-8486-8

  • Online ISBN: 978-981-97-8487-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics