Abstract
Fuzzy C-Means (FCM) clustering algorithm is a popular unsupervised learning approach that has been extensively utilized in various domains. However, in this study, we point out a major problem faced by FCM when it is applied to the high-dimensional data, i.e., quite often the obtained prototypes (cluster centers) could not be distinguished with each other. Many studies have claimed that the concentration of the distance (CoD) could be a major reason for this phenomenon. This paper has therefore revisited this factor, and highlight that the CoD could not only lead to decreased performance, but sometimes also positively contribute to enhanced performance of the clustering algorithm. Instead, this paper point out the significance of features that are noisy and correlated, which could have a negative effect on FCM performance. Hence, to tackle the mentioned problem, we resort to a neural network model, i.e., the autoencoder, to reduce the dimensionality of the feature space while extracting features that are most informative. We conduct several experiments to show the validity of the proposed strategy for the FCM algorithm.
This work was supported in part by the National Natural Science Foundation of China under Grant 72001032, Grant 72071021, Grant 72002152; in part by Natural Science Foundation of Chongqing under Grant cstc2020jcyj-bshX0013.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Jain, A.K.: Data clustering: 50 years beyond K-means. Pattern Recognit. Lett. 31(8), 651–666 (2010)
Dunn, J.C.: Well-separated clusters and optimal fuzzy partitions. J. Cybern. 4(1), 95–104 (1974)
Bezdek, J.C., Ehrlich, R., Full, W.: FCM: the fuzzy c-means clustering algorithm. Comput. Geosci. 10(2–3), 191–203 (1984)
Päivinen, N.: Clustering with a minimum spanning tree of scale-free-like structure. Pattern Recogn. Lett. 26(7), 921–930 (2005)
Wu, Z., Leahy, R.: An optimal graph theoretic approach to data clustering: Theory and its application to image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 11, 1101–1113 (1993)
Murtagh, F.: A survey of recent advances in hierarchical clustering algorithms. Comput. J. 26(4), 354–359 (1983)
Karypis, G., Han, E.-H.S., Kumar, V.: Chameleon: Hierarchical clustering using dynamic modeling. Comput. (Long. Beach. Calif.) 8, 68–75 (1999)
Kriegel, H., Kröger, P., Sander, J., Zimek, A.: Density-based clustering. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 1(3), 231–240 (2011)
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. Kdd 96(34), 226–231 (1996)
Shen, Y., Pedrycz, W.: Collaborative fuzzy clustering algorithm: Some refinements. Int. J. Approx. Reason. 86, 41–61 (2017)
Shen, Y., Pedrycz, W., Wang, X.: Clustering homogeneous granular data: formation and evaluation. IEEE Trans. Cybern. 49(4), 1391–1402 (2019)
Shen, Y., Pedrycz, W., Chen, Y., Wang, X., Gacek, A.: Hyperplane division in fuzzy c-means: clustering big data. IEEE Trans. Fuzzy Syst. 28(11), 3032–3046 (2020)
Zadeh, L.A.: Fuzzy sets-information and control-1965. Inf. Control. (1965)
Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Springer Science & Business Media, Berlin (2013)
Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is “nearest neighbor” meaningful? In: Beeri, C., Buneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-49257-7_15
François, D., Wertz, V., Verleysen, M.: The concentration of fractional distances. IEEE Trans. Knowl. Data Eng. 19(7), 873–886 (2007)
Kumari, S., Jayaram, B.: Measuring concentration of distances—an effective and efficient empirical index. IEEE Trans. Knowl. Data Eng. 29(2), 373–386 (2016)
Hsu, C.-M., Chen, M.-S.: On the design and applicability of distance functions in high-dimensional data space. IEEE Trans. Knowl. Data Eng. 21(4), 523–536 (2008)
Pestov, V.: Is the k-NN classifier in high dimensions affected by the curse of dimensionality? Comput. Math. with Appl. 65(10), 1427–1437 (2013)
Pal, A.K., Mondal, P.K., Ghosh, A.K.: High dimensional nearest neighbor classification based on mean absolute differences of inter-point distances. Pattern Recognit. Lett. 74, 1–8 (2016)
Klawonn, F., Höppner, F., Jayaram, B.: What are clusters in high dimensions and are they difficult to find? In: Masulli, F., Petrosino, A., Rovetta, S. (eds.) CHDD 2012. LNCS, vol. 7627, pp. 14–33. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-48577-4_2
Levina, E., Bickel, P.J.: Maximum likelihood estimation of intrinsic dimension. In: Advances in Neural Information Processing Systems, pp. 777–784 (2005)
Radovanovic, M., Nanopoulos, A., Ivanovic, M.: Hubs in space: Popular nearest neighbors in high-dimensional data. J. Mach. Learn. Res. 11(Sept), 2487–2531 (2010)
Durrant, R.J., Kabán, A.: When is ‘nearest neighbour’meaningful: a converse theorem and implications. J. Complex. 25(4), 385–397 (2009)
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science (80-). 313(5786), 504–507 (2006)
Olshausen, B.A., Field, D.J.: Sparse coding with an overcomplete basis set: a strategy employed by V1? Vision Res. 37(23), 3311–3325 (1997)
Deng, Z., Choi, K.-S., Jiang, Y., Wang, J., Wang, S.: A survey on soft subspace clustering. Inf. Sci. (Ny) 348, 84–106 (2016)
Chang, X., Wang, Q., Liu, Y., Wang, Y.: Sparse regularization in fuzzy c-means for high-dimensional data clustering. IEEE Trans. Cybern. 47(9), 2616–2627 (2016)
Mitra, P., Murthy, C.A., Pal, S.K.: Unsupervised feature selection using feature similarity. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 301–312 (2002)
Shen, Y., Pedrycz, W., Jing, X., Gacek, A., Wang, X., Liu, B.: Identification of fuzzy rule-based models with output space knowledge guidance. IEEE Trans. Fuzzy Syst. 99, 1–1 (2020)
Hu, X., Shen, Y., Pedrycz, W., Li, Y., Wu, G.: Granular Fuzzy Rule-Based Modeling With Incomplete Data Representation. IEEE Trans. Cybern. 99, 1–1 (2021)
Chen, T., Shang, C., Yang, J., Li, F., Shen, Q.: A new approach for transformation-based fuzzy rule interpolation. IEEE Trans. Fuzzy Syst. 28(12), 3330–3344 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Shen, Y., E, H., Chen, T., Xiao, Z., Liu, B., Chen, Y. (2021). High-Dimensional Data Clustering with Fuzzy C-Means: Problem, Reason, and Solution. In: Rojas, I., Joya, G., Català, A. (eds) Advances in Computational Intelligence. IWANN 2021. Lecture Notes in Computer Science(), vol 12861. Springer, Cham. https://doi.org/10.1007/978-3-030-85030-2_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-85030-2_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-85029-6
Online ISBN: 978-3-030-85030-2
eBook Packages: Computer ScienceComputer Science (R0)