Abstract
Fuzzy document clustering aims at automatically organizing related documents into clusters in a flexible way. At this context, the topics identification addressed by documents in every cluster is performed by automatically discovering cluster descriptors, which are relevant terms present in these documents. Since documents are represented by a high-dimensional feature space, the extraction of good descriptors is a big problem to be solved. This problem is even bigger using fuzzy clustering, since the same descriptor can be representative for more than one cluster. Moreover, it is well-known that the Fuzzy C-Means clustering algorithm is also affected by documents dimensionality and the choice of correct partition of a given document collection into clusters is still a challenging problem. In order to overcome this drawback, we have investigated the most common fuzzy clustering validity indexes to validate the organization of data with high dimensional feature space, since they are commonly used to evaluate fuzzy clusters from low dimensional data sets.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Bezdek, J.C.: Numerical taxonomy with fuzzy sets. J. Math. Biol. 1(1), 57–71 (1974). doi:10.1007/BF02339490
Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Kluwer Academic Publishers, Norwell, MA (1981)
Bezdek, J.C.: Cluster validity with fuzzy sets. J. Cybern. 3(3), 58–73 (1974)
Campello, R., Hruschka, E.: A fuzzy extension of the silhouette width criterion for cluster analysis. Fuzzy Sets Syst. 157(21), 2858–2875 (2006)
Carvalho, N.V., Rezende, S.O., Camargo, H.A., Nogueira, T.M.: Flexible document organization by mixing fuzzy and possibilistic clustering algorithms. In: IEEE International Conference on Fuzzy Systems, pp. 790–797 (2016)
Chiang, I.J., Liu, C.H., Tsai, Y.H., Kumar, A.: Discovering latent semantics in web documents using fuzzy clustering. IEEE Trans. Fuzzy Syst. 23(6), 2122–2134 (2015)
Dave, R.N.: Validating fuzzy partitions obtained through c-shells clustering. Pattern Recogn. Lett. 17(6), 613–623 (1996)
Fukuyama, Y., Sugeno, M.: A new method of choosing the number of clusters for fuzzy c-means method. In: Fuzzy Systems Symposium, pp. 247–250 (1989)
Ingwersen, P.: Information Retrieval Interaction. Taylor Graham, London (1992)
Nogueira, T.M., Rezende, S.O., Camargo, H.A.: Fuzzy cluster descriptor extraction for flexible organization of documents. In: International Conference on Hybrid Intelligent Systems, pp. 528–533 (2011)
Nogueira, T.M., Rezende, S.O., Camargo, H.A.: Fuzzy cluster descriptors improve flexible organization of documents. In: International Conference on Intelligent Systems Design and Applications, pp. 616–621 (2012)
Nogueira, T.M., Rezende, S.O., Camargo, H.A.: Flexible document organization: comparing fuzzy and possibilistic approaches. In: IEEE International Conference on Fuzzy Systems, pp. 1–8 (2015)
Pal, N.R., Bezdek, J.C.: On cluster validity for the fuzzy c-means model. IEEE Trans. Fuzzy Syst. 3(3), 370–379 (1995)
Read, J., Reutemann, P., Pfahringer, B., Holmes, G.: MEKA: A multi-label/multi-target extension to Weka. J. Mach. Learn. Res. 17(21), 1–5 (2016)
Shanahan, J., Roma, N.: Improving SVM text classification performance through threshold adjustment. Machine Learning, Lecture Notes in Computer Science, vol. 2837, pp. 361–372 (2003)
Soares, M.V.B., Prati, R.C., Monard, M.C.: PreTexT II: Description of restructuring tool preprocessing of texts. Technical report 333, ICMC-USP (2008). (in Portuguese)
Subhashini, R., Kumar, V.: Evaluating the performance of similarity measures used in document clustering and information retrieval. In: International Conference on Integrated Intelligent Computing, pp. 27–31 (2010)
Wang, W., Zhang, Y.: On fuzzy cluster validity indices. Fuzzy Sets Syst. 158(19), 2095–2117 (2007)
Xie, X.L., Beni, G.: A validity measure for fuzzy clustering. IEEE Trans. Pattern Anal. Mach. Intell. 13(8), 841–847 (1991)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Eustáquio, F., Camargo, H., Rezende, S., Nogueira, T. (2018). On Fuzzy Cluster Validity Indexes for High Dimensional Feature Space. In: Kacprzyk, J., Szmidt, E., Zadrożny, S., Atanassov, K., Krawczak, M. (eds) Advances in Fuzzy Logic and Technology 2017. EUSFLAT IWIFSGN 2017 2017. Advances in Intelligent Systems and Computing, vol 642. Springer, Cham. https://doi.org/10.1007/978-3-319-66824-6_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-66824-6_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66823-9
Online ISBN: 978-3-319-66824-6
eBook Packages: EngineeringEngineering (R0)