High-Dimensional Data Clustering with Fuzzy C-Means: Problem, Reason, and Solution

Shen, Yinghua; E, Hanyu; Chen, Tianhua; Xiao, Zhi; Liu, Bingsheng; Chen, Yuan

doi:10.1007/978-3-030-85030-2_8

Yinghua Shen¹¹,
Hanyu E¹²,
Tianhua Chen¹³,
Zhi Xiao¹¹,
Bingsheng Liu¹⁴ &
…
Yuan Chen¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12861))

Included in the following conference series:

International Work-Conference on Artificial Neural Networks

1424 Accesses
3 Citations

Abstract

Fuzzy C-Means (FCM) clustering algorithm is a popular unsupervised learning approach that has been extensively utilized in various domains. However, in this study, we point out a major problem faced by FCM when it is applied to the high-dimensional data, i.e., quite often the obtained prototypes (cluster centers) could not be distinguished with each other. Many studies have claimed that the concentration of the distance (CoD) could be a major reason for this phenomenon. This paper has therefore revisited this factor, and highlight that the CoD could not only lead to decreased performance, but sometimes also positively contribute to enhanced performance of the clustering algorithm. Instead, this paper point out the significance of features that are noisy and correlated, which could have a negative effect on FCM performance. Hence, to tackle the mentioned problem, we resort to a neural network model, i.e., the autoencoder, to reduce the dimensionality of the feature space while extracting features that are most informative. We conduct several experiments to show the validity of the proposed strategy for the FCM algorithm.

This work was supported in part by the National Natural Science Foundation of China under Grant 72001032, Grant 72071021, Grant 72002152; in part by Natural Science Foundation of Chongqing under Grant cstc2020jcyj-bshX0013.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Effect of cluster size distribution on clustering: a comparative study of k-means and fuzzy c-means clustering

Article 06 March 2019

Clustering by Unified Principal Component Analysis and Fuzzy C-Means with Sparsity Constraint

Variable feature weighted fuzzy k-means algorithm for high dimensional data

Article 21 January 2025

References

Jain, A.K.: Data clustering: 50 years beyond K-means. Pattern Recognit. Lett. 31(8), 651–666 (2010)
Article Google Scholar
Dunn, J.C.: Well-separated clusters and optimal fuzzy partitions. J. Cybern. 4(1), 95–104 (1974)
Article Google Scholar
Bezdek, J.C., Ehrlich, R., Full, W.: FCM: the fuzzy c-means clustering algorithm. Comput. Geosci. 10(2–3), 191–203 (1984)
Article Google Scholar
Päivinen, N.: Clustering with a minimum spanning tree of scale-free-like structure. Pattern Recogn. Lett. 26(7), 921–930 (2005)
Article Google Scholar
Wu, Z., Leahy, R.: An optimal graph theoretic approach to data clustering: Theory and its application to image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 11, 1101–1113 (1993)
Article Google Scholar
Murtagh, F.: A survey of recent advances in hierarchical clustering algorithms. Comput. J. 26(4), 354–359 (1983)
Article Google Scholar
Karypis, G., Han, E.-H.S., Kumar, V.: Chameleon: Hierarchical clustering using dynamic modeling. Comput. (Long. Beach. Calif.) 8, 68–75 (1999)
Google Scholar
Kriegel, H., Kröger, P., Sander, J., Zimek, A.: Density-based clustering. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 1(3), 231–240 (2011)
Article Google Scholar
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. Kdd 96(34), 226–231 (1996)
Google Scholar
Shen, Y., Pedrycz, W.: Collaborative fuzzy clustering algorithm: Some refinements. Int. J. Approx. Reason. 86, 41–61 (2017)
Article Google Scholar
Shen, Y., Pedrycz, W., Wang, X.: Clustering homogeneous granular data: formation and evaluation. IEEE Trans. Cybern. 49(4), 1391–1402 (2019)
Article Google Scholar
Shen, Y., Pedrycz, W., Chen, Y., Wang, X., Gacek, A.: Hyperplane division in fuzzy c-means: clustering big data. IEEE Trans. Fuzzy Syst. 28(11), 3032–3046 (2020)
Article Google Scholar
Zadeh, L.A.: Fuzzy sets-information and control-1965. Inf. Control. (1965)
Google Scholar
Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Springer Science & Business Media, Berlin (2013)
Google Scholar
Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is “nearest neighbor” meaningful? In: Beeri, C., Buneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-49257-7_15
Chapter Google Scholar
François, D., Wertz, V., Verleysen, M.: The concentration of fractional distances. IEEE Trans. Knowl. Data Eng. 19(7), 873–886 (2007)
Article Google Scholar
Kumari, S., Jayaram, B.: Measuring concentration of distances—an effective and efficient empirical index. IEEE Trans. Knowl. Data Eng. 29(2), 373–386 (2016)
Article Google Scholar
Hsu, C.-M., Chen, M.-S.: On the design and applicability of distance functions in high-dimensional data space. IEEE Trans. Knowl. Data Eng. 21(4), 523–536 (2008)
Google Scholar
Pestov, V.: Is the k-NN classifier in high dimensions affected by the curse of dimensionality? Comput. Math. with Appl. 65(10), 1427–1437 (2013)
Article Google Scholar
Pal, A.K., Mondal, P.K., Ghosh, A.K.: High dimensional nearest neighbor classification based on mean absolute differences of inter-point distances. Pattern Recognit. Lett. 74, 1–8 (2016)
Article Google Scholar
Klawonn, F., Höppner, F., Jayaram, B.: What are clusters in high dimensions and are they difficult to find? In: Masulli, F., Petrosino, A., Rovetta, S. (eds.) CHDD 2012. LNCS, vol. 7627, pp. 14–33. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-48577-4_2
Chapter Google Scholar
Levina, E., Bickel, P.J.: Maximum likelihood estimation of intrinsic dimension. In: Advances in Neural Information Processing Systems, pp. 777–784 (2005)
Google Scholar
Radovanovic, M., Nanopoulos, A., Ivanovic, M.: Hubs in space: Popular nearest neighbors in high-dimensional data. J. Mach. Learn. Res. 11(Sept), 2487–2531 (2010)
Google Scholar
Durrant, R.J., Kabán, A.: When is ‘nearest neighbour’meaningful: a converse theorem and implications. J. Complex. 25(4), 385–397 (2009)
Article Google Scholar
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science (80-). 313(5786), 504–507 (2006)
Article CAS Google Scholar
Olshausen, B.A., Field, D.J.: Sparse coding with an overcomplete basis set: a strategy employed by V1? Vision Res. 37(23), 3311–3325 (1997)
Article CAS Google Scholar
Deng, Z., Choi, K.-S., Jiang, Y., Wang, J., Wang, S.: A survey on soft subspace clustering. Inf. Sci. (Ny) 348, 84–106 (2016)
Article Google Scholar
Chang, X., Wang, Q., Liu, Y., Wang, Y.: Sparse regularization in fuzzy c-means for high-dimensional data clustering. IEEE Trans. Cybern. 47(9), 2616–2627 (2016)
Article Google Scholar
Mitra, P., Murthy, C.A., Pal, S.K.: Unsupervised feature selection using feature similarity. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 301–312 (2002)
Article Google Scholar
Shen, Y., Pedrycz, W., Jing, X., Gacek, A., Wang, X., Liu, B.: Identification of fuzzy rule-based models with output space knowledge guidance. IEEE Trans. Fuzzy Syst. 99, 1–1 (2020)
Google Scholar
Hu, X., Shen, Y., Pedrycz, W., Li, Y., Wu, G.: Granular Fuzzy Rule-Based Modeling With Incomplete Data Representation. IEEE Trans. Cybern. 99, 1–1 (2021)
Google Scholar
Chen, T., Shang, C., Yang, J., Li, F., Shen, Q.: A new approach for transformation-based fuzzy rule interpolation. IEEE Trans. Fuzzy Syst. 28(12), 3330–3344 (2019)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Economics and Business Administration, Chongqing University, Chongqing, China
Yinghua Shen & Zhi Xiao
Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Canada
Hanyu E
Department of Computer Science, University of Huddersfield, Huddersfield, UK
Tianhua Chen
School of Public Affairs, Chongqing University, Chongqing, China
Bingsheng Liu
College of Management and Economics, Tianjin University, Tianjin, China
Yuan Chen

Authors

Yinghua Shen
View author publications
You can also search for this author in PubMed Google Scholar
Hanyu E
View author publications
You can also search for this author in PubMed Google Scholar
Tianhua Chen
View author publications
You can also search for this author in PubMed Google Scholar
Zhi Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Bingsheng Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yuan Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yinghua Shen .

Editor information

Editors and Affiliations

University of Granada, Granada, Spain
Ignacio Rojas
University of Málaga, Málaga, Spain
Gonzalo Joya
Technical University of Catalonia, Barcelona, Spain
Andreu Català

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shen, Y., E, H., Chen, T., Xiao, Z., Liu, B., Chen, Y. (2021). High-Dimensional Data Clustering with Fuzzy C-Means: Problem, Reason, and Solution. In: Rojas, I., Joya, G., Català, A. (eds) Advances in Computational Intelligence. IWANN 2021. Lecture Notes in Computer Science(), vol 12861. Springer, Cham. https://doi.org/10.1007/978-3-030-85030-2_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-85030-2_8
Published: 21 August 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-85029-6
Online ISBN: 978-3-030-85030-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

High-Dimensional Data Clustering with Fuzzy C-Means: Problem, Reason, and Solution

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Effect of cluster size distribution on clustering: a comparative study of k-means and fuzzy c-means clustering

Clustering by Unified Principal Component Analysis and Fuzzy C-Means with Sparsity Constraint

Variable feature weighted fuzzy k-means algorithm for high dimensional data

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

High-Dimensional Data Clustering with Fuzzy C-Means: Problem, Reason, and Solution

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Effect of cluster size distribution on clustering: a comparative study of k-means and fuzzy c-means clustering

Clustering by Unified Principal Component Analysis and Fuzzy C-Means with Sparsity Constraint

Variable feature weighted fuzzy k-means algorithm for high dimensional data

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation