Abstract
The fuzzy K-modes clustering algorithm is an extension of the fuzzy K-means clustering algorithm, which can handle massive categorical data. However, the quality of the initial cluster centers (or called initial centers) may significantly affect the results of fuzzy K-modes clustering. In many cases, poor clustering results may occur due to unsuitable initial centers. Therefore, the selection of initial centers, that is, cluster center initialization (CCI), is a key issue in fuzzy K-modes clustering. This paper deals with the CCI problem of fuzzy K-modes clustering from the perspective of outlier detection, and proposes a cluster center initialization algorithm (CCI_DOFD), for fuzzy K-modes clustering. CCI_DOFD selects initial centers by virtue of the distance outlier factor of each object, the density of each object and the distances between objects. By considering the distance outlier factor, CCI_DOFD can avoid the problem that an outlier is selected as the initial center. Moreover, when calculating the density of each object and the distances between objects, CCI_DOFD assigns different weights to different attributes according to the significance of each attribute, which can effectively reflect the difference between different attributes. Experimental results on several UCI data sets demonstrate the effectiveness of our algorithm for the CCI of fuzzy K-modes clustering.
Supported by organization x.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Li, W., Wang, Z., Sun, W., Bahrami, S.: An ensemble clustering framework based on hierarchical clustering ensemble selection and clusters clustering. Cybern. Syst. 54(5), 741–766 (2023)
Zhang, J., Fan, R., Tao, H., Jiang, J.C., Hou, C.P.: Constrained clustering with weak label prior. Front. Comp. Sci. 18(3), 183338 (2024)
Agarwal, S., and Reddy C.R.K.: A smart intelligent approach based on hybrid group search and pelican optimization algorithm for data stream clustering. Knowl. Inf. Syst. 1–34 (2023)
Zhou, B., Lu, B., Saeidlou, S.: A hybrid clustering method based on the several diverse basic clustering and meta-clustering aggregation technique. Cybern. Syst. 55(1), 203–229 (2024)
Bai, L., Liang, J.Y., Sui, C., Dang, C.Y.: Fast global K-means clustering based on local geometrical information. Inf. Sci. 245, 168–180 (2013)
Huang, Z.X.: Extensions to the K-means algorithm for clustering large data sets with categorical values. Data Min. Knowl. Disc. 2(3), 283–304 (1998)
Huang, Z.X., Ng, M.K.: A fuzzy K-modes algorithm for clustering categorical data. IEEE Trans. Fuzzy Syst. 7(4), 446–452 (1999)
Wu, S., Jiang, Q.S., Huang, J.Z.: A new initialization method for clustering categorical data. In: 11th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, pp. 972–980. Springer, Heidelberg (2007)
Cao, F.Y., Liang, J.Y., Bai, L.: A new initialization method for categorical data clustering. Expert Syst. Appl. 36(7), 10223–10228 (2009)
Bai, L., Liang, J.Y., Dang, C.Y.: An initialization method to simultaneously find initial cluster centers and the number of clusters for clustering categorical data. Knowl.-Based Syst. 24(6), 785–795 (2011)
Khan, S.S., Ahmad, A.: Cluster center initialization algorithm for K-modes clustering. Expert Syst. Appl. 40(18), 7444–7456 (2013)
Kumar, A., Kumar, S.: A support based initialization algorithm for categorical data clustering. J. Inf. Technol. Res. 11(2), 53–67 (2018)
Li, M.S., Zhou, Y.H., Tang, W.R., Lu, L.F.: K-modes based categorical data clustering algorithms satisfying differential privacy. In: 5th International Conference on Networking and Network Applications, pp. 86–91. IEEE, New York (2020)
Li, D., Xue, H.F., Zhang, W.Y., Zhang, Y.: Categorical data clustering method based on improved fruit fly optimization algorithm. In: 3th International Conference on Intelligent and Interactive Systems and Applications, pp. 736–744. Springer Heidelberg (2018)
Peng, L.W., Liu, Y.G.: Attribute weights-based clustering centers algorithm for initialising K-modes clustering. Clust. Comput. 22(3), 6171–6179 (2019)
Sajidha, S.A., Chodnekar, S.P., Desikan, K.: Initial seed selection for K-modes clustering. A distance and density based approach. J. King Saud University- Comput. Inf. Sci. 33(6), 693–701 (2021)
Dinh, D.T., Huynh, V.N.: k-PbC: an improved cluster center initialization for categorical data clustering. Appl. Intell. 50(8), 2610–2632 (2020)
Knorr, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large datasets. In: 24th International Conference on Very Large Data Bases, pp. 392–403. Morgan Kaufmann Publishers San Francisco (1998)
Pawlak, Z.: Rough Sets: Theoretical Aspects of Reasoning About Data. Kluwer Academic Publishers, Dordrecht (1991)
Düntsch, I., Gediga, G.: Uncertainty measures of rough set prediction. Artif. Intell. 106(1), 109–137 (1998)
Liang, J.Y., Bai, L., Cao, F.Y.: K-modes clustering algorithm based on a new distance measure. J. Comput. Res. Devel. 47(10), 1749–1755 (2010)
Gong, X.Y., Cao, K., Jia, P.T., Gong, S.F.: K-modes algorithm based on rough set and information entropy. In: 3rd International Symposium on Power Electronics and Control Engineering, pp. 012239. IOP Publishing Bristol (2020)
Nataliani, Y., Yang, M.S.: Feature-weighted fuzzy K-modes clustering. In: 3rd International Conference on Intelligent Systems, pp. 63–68. ACM New York (2019)
Dai, Y.W., Yuan, G.H., Yang, Z.Y., Wang, B.: K-modes clustering algorithm based on weighted overlap distance and its application in intrusion detection. Sci. Program. 2021, 1–9 (2021)
Xu, Z.Y., Liu, Z.P., Yang, B.R., S, W.: A quick attribute reduction algorithm with complexity of max(O(\(|C,: U|\)), O(\(|C|^{2}|U/C|\))). Chin. J. Comput. 29(3), 391–399 (2006)
Jiang, F., Yu, X., Du, J.W., Gong, D.W., Zhang, Y.Q., Peng, Y.J.: Ensemble learning based on approximate reducts and bootstrap sampling. Inf. Sci. 547, 797–813 (2021)
Dolatshah, M., Hadian, A., Minaei-Bidgoli, B.: Ball*-tree: efficient spatial indexing for constrained nearest-neighbor search in metric spaces. arXiv:1511.00628 (2015)
Bache, K., Lichman, M.: UCI machine learning repository. http://archive.ics.uci.edu/ml. Accessed 15 Oct 2022
Witten, I.H., Frank, E., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques, 3rd edn. Morgan Kaufmann Publishers, San Francisco (2011)
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
Acknowledgements
This work is supported by the National Natural Science Foundation of China (grant nos. 61973180, 62202253), and the Natural Science Foundation of Shandong Province, China (grant nos. ZR2022MF326, ZR2021QF074, ZR2021MF092).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Sha, Y., Du, J., Yang, Z., Jiang, F. (2025). Cluster Center Initialization for Fuzzy K-Modes Clustering Using Outlier Detection Technique. In: Lin, Z., et al. Pattern Recognition and Computer Vision. PRCV 2024. Lecture Notes in Computer Science, vol 15031. Springer, Singapore. https://doi.org/10.1007/978-981-97-8487-5_1
Download citation
DOI: https://doi.org/10.1007/978-981-97-8487-5_1
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-8486-8
Online ISBN: 978-981-97-8487-5
eBook Packages: Computer ScienceComputer Science (R0)