Skip to main content

High-Dimensional Data Clustering with Fuzzy C-Means: Problem, Reason, and Solution

  • Conference paper
  • First Online:
Advances in Computational Intelligence (IWANN 2021)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12861))

Included in the following conference series:

Abstract

Fuzzy C-Means (FCM) clustering algorithm is a popular unsupervised learning approach that has been extensively utilized in various domains. However, in this study, we point out a major problem faced by FCM when it is applied to the high-dimensional data, i.e., quite often the obtained prototypes (cluster centers) could not be distinguished with each other. Many studies have claimed that the concentration of the distance (CoD) could be a major reason for this phenomenon. This paper has therefore revisited this factor, and highlight that the CoD could not only lead to decreased performance, but sometimes also positively contribute to enhanced performance of the clustering algorithm. Instead, this paper point out the significance of features that are noisy and correlated, which could have a negative effect on FCM performance. Hence, to tackle the mentioned problem, we resort to a neural network model, i.e., the autoencoder, to reduce the dimensionality of the feature space while extracting features that are most informative. We conduct several experiments to show the validity of the proposed strategy for the FCM algorithm.

This work was supported in part by the National Natural Science Foundation of China under Grant 72001032, Grant 72071021, Grant 72002152; in part by Natural Science Foundation of Chongqing under Grant cstc2020jcyj-bshX0013.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Jain, A.K.: Data clustering: 50 years beyond K-means. Pattern Recognit. Lett. 31(8), 651–666 (2010)

    Article  Google Scholar 

  2. Dunn, J.C.: Well-separated clusters and optimal fuzzy partitions. J. Cybern. 4(1), 95–104 (1974)

    Article  Google Scholar 

  3. Bezdek, J.C., Ehrlich, R., Full, W.: FCM: the fuzzy c-means clustering algorithm. Comput. Geosci. 10(2–3), 191–203 (1984)

    Article  Google Scholar 

  4. Päivinen, N.: Clustering with a minimum spanning tree of scale-free-like structure. Pattern Recogn. Lett. 26(7), 921–930 (2005)

    Article  Google Scholar 

  5. Wu, Z., Leahy, R.: An optimal graph theoretic approach to data clustering: Theory and its application to image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 11, 1101–1113 (1993)

    Article  Google Scholar 

  6. Murtagh, F.: A survey of recent advances in hierarchical clustering algorithms. Comput. J. 26(4), 354–359 (1983)

    Article  Google Scholar 

  7. Karypis, G., Han, E.-H.S., Kumar, V.: Chameleon: Hierarchical clustering using dynamic modeling. Comput. (Long. Beach. Calif.) 8, 68–75 (1999)

    Google Scholar 

  8. Kriegel, H., Kröger, P., Sander, J., Zimek, A.: Density-based clustering. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 1(3), 231–240 (2011)

    Article  Google Scholar 

  9. Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. Kdd 96(34), 226–231 (1996)

    Google Scholar 

  10. Shen, Y., Pedrycz, W.: Collaborative fuzzy clustering algorithm: Some refinements. Int. J. Approx. Reason. 86, 41–61 (2017)

    Article  Google Scholar 

  11. Shen, Y., Pedrycz, W., Wang, X.: Clustering homogeneous granular data: formation and evaluation. IEEE Trans. Cybern. 49(4), 1391–1402 (2019)

    Article  Google Scholar 

  12. Shen, Y., Pedrycz, W., Chen, Y., Wang, X., Gacek, A.: Hyperplane division in fuzzy c-means: clustering big data. IEEE Trans. Fuzzy Syst. 28(11), 3032–3046 (2020)

    Article  Google Scholar 

  13. Zadeh, L.A.: Fuzzy sets-information and control-1965. Inf. Control. (1965)

    Google Scholar 

  14. Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Springer Science & Business Media, Berlin (2013)

    Google Scholar 

  15. Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is “nearest neighbor” meaningful? In: Beeri, C., Buneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-49257-7_15

    Chapter  Google Scholar 

  16. François, D., Wertz, V., Verleysen, M.: The concentration of fractional distances. IEEE Trans. Knowl. Data Eng. 19(7), 873–886 (2007)

    Article  Google Scholar 

  17. Kumari, S., Jayaram, B.: Measuring concentration of distances—an effective and efficient empirical index. IEEE Trans. Knowl. Data Eng. 29(2), 373–386 (2016)

    Article  Google Scholar 

  18. Hsu, C.-M., Chen, M.-S.: On the design and applicability of distance functions in high-dimensional data space. IEEE Trans. Knowl. Data Eng. 21(4), 523–536 (2008)

    Google Scholar 

  19. Pestov, V.: Is the k-NN classifier in high dimensions affected by the curse of dimensionality? Comput. Math. with Appl. 65(10), 1427–1437 (2013)

    Article  Google Scholar 

  20. Pal, A.K., Mondal, P.K., Ghosh, A.K.: High dimensional nearest neighbor classification based on mean absolute differences of inter-point distances. Pattern Recognit. Lett. 74, 1–8 (2016)

    Article  Google Scholar 

  21. Klawonn, F., Höppner, F., Jayaram, B.: What are clusters in high dimensions and are they difficult to find? In: Masulli, F., Petrosino, A., Rovetta, S. (eds.) CHDD 2012. LNCS, vol. 7627, pp. 14–33. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-48577-4_2

    Chapter  Google Scholar 

  22. Levina, E., Bickel, P.J.: Maximum likelihood estimation of intrinsic dimension. In: Advances in Neural Information Processing Systems, pp. 777–784 (2005)

    Google Scholar 

  23. Radovanovic, M., Nanopoulos, A., Ivanovic, M.: Hubs in space: Popular nearest neighbors in high-dimensional data. J. Mach. Learn. Res. 11(Sept), 2487–2531 (2010)

    Google Scholar 

  24. Durrant, R.J., Kabán, A.: When is ‘nearest neighbour’meaningful: a converse theorem and implications. J. Complex. 25(4), 385–397 (2009)

    Article  Google Scholar 

  25. Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science (80-). 313(5786), 504–507 (2006)

    Article  CAS  Google Scholar 

  26. Olshausen, B.A., Field, D.J.: Sparse coding with an overcomplete basis set: a strategy employed by V1? Vision Res. 37(23), 3311–3325 (1997)

    Article  CAS  Google Scholar 

  27. Deng, Z., Choi, K.-S., Jiang, Y., Wang, J., Wang, S.: A survey on soft subspace clustering. Inf. Sci. (Ny) 348, 84–106 (2016)

    Article  Google Scholar 

  28. Chang, X., Wang, Q., Liu, Y., Wang, Y.: Sparse regularization in fuzzy c-means for high-dimensional data clustering. IEEE Trans. Cybern. 47(9), 2616–2627 (2016)

    Article  Google Scholar 

  29. Mitra, P., Murthy, C.A., Pal, S.K.: Unsupervised feature selection using feature similarity. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 301–312 (2002)

    Article  Google Scholar 

  30. Shen, Y., Pedrycz, W., Jing, X., Gacek, A., Wang, X., Liu, B.: Identification of fuzzy rule-based models with output space knowledge guidance. IEEE Trans. Fuzzy Syst. 99, 1–1 (2020)

    Google Scholar 

  31. Hu, X., Shen, Y., Pedrycz, W., Li, Y., Wu, G.: Granular Fuzzy Rule-Based Modeling With Incomplete Data Representation. IEEE Trans. Cybern. 99, 1–1 (2021)

    Google Scholar 

  32. Chen, T., Shang, C., Yang, J., Li, F., Shen, Q.: A new approach for transformation-based fuzzy rule interpolation. IEEE Trans. Fuzzy Syst. 28(12), 3330–3344 (2019)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yinghua Shen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Shen, Y., E, H., Chen, T., Xiao, Z., Liu, B., Chen, Y. (2021). High-Dimensional Data Clustering with Fuzzy C-Means: Problem, Reason, and Solution. In: Rojas, I., Joya, G., Català, A. (eds) Advances in Computational Intelligence. IWANN 2021. Lecture Notes in Computer Science(), vol 12861. Springer, Cham. https://doi.org/10.1007/978-3-030-85030-2_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-85030-2_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-85029-6

  • Online ISBN: 978-3-030-85030-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics