Cluster Center Initialization for Fuzzy K-Modes Clustering Using Outlier Detection Technique

Sha, Yuqi; Du, Junwei; Yang, Zhiyong; Jiang, Feng

doi:10.1007/978-981-97-8487-5_1

Yuqi Sha¹⁵,
Junwei Du¹⁶,
Zhiyong Yang¹⁵ &
…
Feng Jiang¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15031))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

230 Accesses

Abstract

The fuzzy K-modes clustering algorithm is an extension of the fuzzy K-means clustering algorithm, which can handle massive categorical data. However, the quality of the initial cluster centers (or called initial centers) may significantly affect the results of fuzzy K-modes clustering. In many cases, poor clustering results may occur due to unsuitable initial centers. Therefore, the selection of initial centers, that is, cluster center initialization (CCI), is a key issue in fuzzy K-modes clustering. This paper deals with the CCI problem of fuzzy K-modes clustering from the perspective of outlier detection, and proposes a cluster center initialization algorithm (CCI_DOFD), for fuzzy K-modes clustering. CCI_DOFD selects initial centers by virtue of the distance outlier factor of each object, the density of each object and the distances between objects. By considering the distance outlier factor, CCI_DOFD can avoid the problem that an outlier is selected as the initial center. Moreover, when calculating the density of each object and the distances between objects, CCI_DOFD assigns different weights to different attributes according to the significance of each attribute, which can effectively reflect the difference between different attributes. Experimental results on several UCI data sets demonstrate the effectiveness of our algorithm for the CCI of fuzzy K-modes clustering.

Supported by organization x.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 74.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Cluster Center Initialization and Outlier Detection Based on Distance and Density for the K-Means Algorithm

Adaptive K-means Algorithm Based on Three-Way Decision

Fuzzy clustering algorithm for outlier-interval data based on the robust exponent distance

Article 06 September 2021

References

Li, W., Wang, Z., Sun, W., Bahrami, S.: An ensemble clustering framework based on hierarchical clustering ensemble selection and clusters clustering. Cybern. Syst. 54(5), 741–766 (2023)
Article Google Scholar
Zhang, J., Fan, R., Tao, H., Jiang, J.C., Hou, C.P.: Constrained clustering with weak label prior. Front. Comp. Sci. 18(3), 183338 (2024)
Article Google Scholar
Agarwal, S., and Reddy C.R.K.: A smart intelligent approach based on hybrid group search and pelican optimization algorithm for data stream clustering. Knowl. Inf. Syst. 1–34 (2023)
Google Scholar
Zhou, B., Lu, B., Saeidlou, S.: A hybrid clustering method based on the several diverse basic clustering and meta-clustering aggregation technique. Cybern. Syst. 55(1), 203–229 (2024)
Article Google Scholar
Bai, L., Liang, J.Y., Sui, C., Dang, C.Y.: Fast global K-means clustering based on local geometrical information. Inf. Sci. 245, 168–180 (2013)
Article MathSciNet Google Scholar
Huang, Z.X.: Extensions to the K-means algorithm for clustering large data sets with categorical values. Data Min. Knowl. Disc. 2(3), 283–304 (1998)
Article Google Scholar
Huang, Z.X., Ng, M.K.: A fuzzy K-modes algorithm for clustering categorical data. IEEE Trans. Fuzzy Syst. 7(4), 446–452 (1999)
Article Google Scholar
Wu, S., Jiang, Q.S., Huang, J.Z.: A new initialization method for clustering categorical data. In: 11th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, pp. 972–980. Springer, Heidelberg (2007)
Google Scholar
Cao, F.Y., Liang, J.Y., Bai, L.: A new initialization method for categorical data clustering. Expert Syst. Appl. 36(7), 10223–10228 (2009)
Article Google Scholar
Bai, L., Liang, J.Y., Dang, C.Y.: An initialization method to simultaneously find initial cluster centers and the number of clusters for clustering categorical data. Knowl.-Based Syst. 24(6), 785–795 (2011)
Article Google Scholar
Khan, S.S., Ahmad, A.: Cluster center initialization algorithm for K-modes clustering. Expert Syst. Appl. 40(18), 7444–7456 (2013)
Article Google Scholar
Kumar, A., Kumar, S.: A support based initialization algorithm for categorical data clustering. J. Inf. Technol. Res. 11(2), 53–67 (2018)
Article Google Scholar
Li, M.S., Zhou, Y.H., Tang, W.R., Lu, L.F.: K-modes based categorical data clustering algorithms satisfying differential privacy. In: 5th International Conference on Networking and Network Applications, pp. 86–91. IEEE, New York (2020)
Google Scholar
Li, D., Xue, H.F., Zhang, W.Y., Zhang, Y.: Categorical data clustering method based on improved fruit fly optimization algorithm. In: 3th International Conference on Intelligent and Interactive Systems and Applications, pp. 736–744. Springer Heidelberg (2018)
Google Scholar
Peng, L.W., Liu, Y.G.: Attribute weights-based clustering centers algorithm for initialising K-modes clustering. Clust. Comput. 22(3), 6171–6179 (2019)
Article MathSciNet Google Scholar
Sajidha, S.A., Chodnekar, S.P., Desikan, K.: Initial seed selection for K-modes clustering. A distance and density based approach. J. King Saud University- Comput. Inf. Sci. 33(6), 693–701 (2021)
Google Scholar
Dinh, D.T., Huynh, V.N.: k-PbC: an improved cluster center initialization for categorical data clustering. Appl. Intell. 50(8), 2610–2632 (2020)
Article Google Scholar
Knorr, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large datasets. In: 24th International Conference on Very Large Data Bases, pp. 392–403. Morgan Kaufmann Publishers San Francisco (1998)
Google Scholar
Pawlak, Z.: Rough Sets: Theoretical Aspects of Reasoning About Data. Kluwer Academic Publishers, Dordrecht (1991)
Book Google Scholar
Düntsch, I., Gediga, G.: Uncertainty measures of rough set prediction. Artif. Intell. 106(1), 109–137 (1998)
Article MathSciNet Google Scholar
Liang, J.Y., Bai, L., Cao, F.Y.: K-modes clustering algorithm based on a new distance measure. J. Comput. Res. Devel. 47(10), 1749–1755 (2010)
Google Scholar
Gong, X.Y., Cao, K., Jia, P.T., Gong, S.F.: K-modes algorithm based on rough set and information entropy. In: 3rd International Symposium on Power Electronics and Control Engineering, pp. 012239. IOP Publishing Bristol (2020)
Google Scholar
Nataliani, Y., Yang, M.S.: Feature-weighted fuzzy K-modes clustering. In: 3rd International Conference on Intelligent Systems, pp. 63–68. ACM New York (2019)
Google Scholar
Dai, Y.W., Yuan, G.H., Yang, Z.Y., Wang, B.: K-modes clustering algorithm based on weighted overlap distance and its application in intrusion detection. Sci. Program. 2021, 1–9 (2021)
Google Scholar
Xu, Z.Y., Liu, Z.P., Yang, B.R., S, W.: A quick attribute reduction algorithm with complexity of max(O($|C,: U|$), O($|C|^{2}|U/C|$)). Chin. J. Comput. 29(3), 391–399 (2006)
Google Scholar
Jiang, F., Yu, X., Du, J.W., Gong, D.W., Zhang, Y.Q., Peng, Y.J.: Ensemble learning based on approximate reducts and bootstrap sampling. Inf. Sci. 547, 797–813 (2021)
Article Google Scholar
Dolatshah, M., Hadian, A., Minaei-Bidgoli, B.: Ball*-tree: efficient spatial indexing for constrained nearest-neighbor search in metric spaces. arXiv:1511.00628 (2015)
Bache, K., Lichman, M.: UCI machine learning repository. http://archive.ics.uci.edu/ml. Accessed 15 Oct 2022
Witten, I.H., Frank, E., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques, 3rd edn. Morgan Kaufmann Publishers, San Francisco (2011)
Google Scholar
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
MathSciNet Google Scholar

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (grant nos. 61973180, 62202253), and the Natural Science Foundation of Shandong Province, China (grant nos. ZR2022MF326, ZR2021QF074, ZR2021MF092).

Author information

Authors and Affiliations

College of Information Science and Technology, Qingdao University of Science and Technology, Qingdao, 266061, China
Yuqi Sha, Zhiyong Yang & Feng Jiang
School of Data Science and Technology, Qingdao University of Science and Technology, Qingdao, 266061, China
Junwei Du

Authors

Yuqi Sha
View author publications
You can also search for this author in PubMed Google Scholar
Junwei Du
View author publications
You can also search for this author in PubMed Google Scholar
Zhiyong Yang
View author publications
You can also search for this author in PubMed Google Scholar
Feng Jiang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Feng Jiang .

Editor information

Editors and Affiliations

Peking University, Beijing, China
Zhouchen Lin
Nankai University, Tianjin, China
Ming-Ming Cheng
Chinese Academy of Sciences, Beijing, China
Ran He
Xinjiang University, Urumqi, Xinjiang, China
Kurban Ubul
Xinjiang University, Urumqi, China
Wushouer Silamu
Peking University, Beijing, China
Hongbin Zha
Tsinghua University, Beijing, China
Jie Zhou
Chinese Academy of Sciences, Beijing, China
Cheng-Lin Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sha, Y., Du, J., Yang, Z., Jiang, F. (2025). Cluster Center Initialization for Fuzzy K-Modes Clustering Using Outlier Detection Technique. In: Lin, Z., et al. Pattern Recognition and Computer Vision. PRCV 2024. Lecture Notes in Computer Science, vol 15031. Springer, Singapore. https://doi.org/10.1007/978-981-97-8487-5_1

Download citation

DOI: https://doi.org/10.1007/978-981-97-8487-5_1
Published: 04 November 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-8486-8
Online ISBN: 978-981-97-8487-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics