Abstract
In this paper, an approach using fuzzy logic techniques and self-organizing maps (SOM) is presented in order to manage conceptual aspects in document clusters and to reduce the training time. In order to measure the presence degree of a concept in a document, a concept frequency formula is introduced. This formula is based on new fuzzy formulas to calculate the polysemy degree of terms and the synonymy degree between terms. In this approach, new fuzzy improvements such as automatic choice of the topology, heuristic map initialization, a fuzzy similarity measure and a keywords extraction process are used. Some experiments have been carried out in order to compare the proposed system with classic SOM approaches by means of Reuters collection. The system performance has been measured in terms of F-measure and training time. The experimental results show that the proposed approach generates good results with less training time compared to classic SOM techniques.
Similar content being viewed by others
References
Apté C, Damerau F, Weiss SM (1994) Automated learning of decision rules for text categorization. ACM Trans Inf Syst 12(3):233–251
Azcarraga AP, Yap TN (2001) Som-based methodology for building large text archives. In: Proceedings of the 7th international conference on database systems for advanced applications, pp 66–73. IEEE Computer Society, Washington
Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. ACM Press, New York
Bezdek JC, Tsao EC, Pal NR (1992) Fuzzy kohonen clustering networks. In: Proceedings of the IEEE international conference on fuzzy systems, pp 1035–1043
Bordogna G, Pagani M, Pasi G (2006) A dynamical hierarchical fuzzy clustering algorithm for document filtering. Stud Fuzziness Soft Comput 197:1–23
Bouchachia A, Mittermeir R (2006) Towards incremental fuzzy classifiers. Soft Comput 11(2):193–207
Cottrell M, Verleysen M (2006) Advances in self-organizing maps. Neural Netw 19(6):721–722
Ellman J (2003) Eurowordnet: a multilingual database with lexical semantic networks. Nat Lang Eng 9(4):427–430
Fellbaum C (1998) WordNet: an electronic lexical database. MIT Press, Cambridge
Fernandez S, Grana J, Sobrino A (2002) A spanish e-dictionary of synonyms as a fuzzy tool for information retrieval. In: Actas de las I Jornadas de Tratamiento y Recuperacion de Informacion
Garcés P, Olivas J, Romero F (2006) Concept-matching IR systems versus word-matching information retrieval systems: considering fuzzy interrelations for indexing web pages. J Am Soc Inf Sci Technol 57(4):564–576
Gonzalo J, Verdejo F, Chugur I (1998) Indexing with wordnet synsets can improve text retrieval. In: Proceedings of the COLING/ACL’98 Workshop on Usage of WordNet for NLP, pp 38–44
Han J (2005) Data Mining: concepts and techniques. Morgan Kaufmann, San Francisco
Hotho A, Staab S, Stumme G (2003) Ontologies improve text document clustering. In: Proceedings of the third IEEE international conference on data mining, pp 541–544. IEEE Press, Washington DC
Huntsberger T, Ajjimarangsee P (1992) Parallel self-organizing feature maps for unsupervised pattern recognition. In: Fuzzy Models for Pattern Recognition, pp 483–495
Kaski S (1998) Dimensionality reduction by random mapping: fast similarity computation for clustering. In: Proceedings international join conference on neural networks, vol 1, pp 413–418
Kohonen T (1982) Self-organized formation of topologically correct feature maps. Biol Cybern 43:59–69
Kohonen T (1998) The self-organizing map. Neurocomputing 21:1–6
Kong S, Kosko B (1992) Adaptive fuzzy system for backing up a truck-and-trailer. IEEE Trans Neural Netw 3:211–223
Lagus K, Honkela T, Kaski S, Kohonen T (1999) Websom for textual data mining. Artif Intell Rev 13(5–6):345–364
Lazzerini B, Marcelloni F (2007) A hierarchical fuzzy clustering-based system to create user profiles. Soft Comput 11:157–168
Li Y, Chung SM, Holt JD (2008) Text document clustering based on frequent word meaning sequences. Data Knowl Eng 64(1):381–404
Lin X, Soergel D, Marchionini G (1991) A self-organizing semantic map for information retrieval. In: Proceedings of the 14th annual international ACM SIGIR conference, pp 262–269. ACM, New York
Merkl D (1998) Text classification with self-organizing maps: some lessons learned. Neurocomputing 21:68–77
Miller GA, Beckwith R, Fellbaum C et al (1990) Introduction to wordnet: an on-line lexical database. Int J Lexicogr 3(4):235–244
Mitra S, Pal SK (1994) Self-organizing neural network as a fuzzy classifier. IEEE Trans Syst Man Cybern 24(3):385–399
Miyamoto S (1990) Fuzzy sets in information retrieval and cluster analysis. Kluwer, Dordrecht
Nürnberger A, Detyniecki M (2006) Externally growing self-organizing maps and its application to e-mail database visualization and exploration. Appl Soft Comput 6(4):357–371
Olivas JA, Garcés PJ, Romero FP (2003) An application of the fis-crm model to the fiss metasearcher: using fuzzy synonymy and fuzzy generality for representing concepts in documents. Int J Approx Reason 34:201–209
Pascual-Marqui RD, Pascual-Montano AD, Kochi K, Carazo JM (2001) Smoothly distributed fuzzy c-means: a new self-organizing map. Pattern Recognit 34:2395–2402
Ritter H, Kohonen T (1989) Self-organizing semantic maps. Biol Cybern 61:241–254
Romero FP, Olivas JA, Garcés PJ (2006) A soft approach to hybrid models for document clustering. Proc Inform Process Manag Uncertain Knowl Based Syst 1:1040–1045
Salton G, McGill MJ (1986) Introduction to modern information retrieval. McGraw-Hill, New York
Soto A, Olivas JA, Prieto M (2008) Fuzzy approach of synonymy and polysemy for information retrieval. Stud Fuzziness Soft Comput 224:179–198
Steinbach M, Karypis G, Kumara V (2000) A comparison of document clustering techniques. In: Proceedings of the knowledge discovery on databases, pp 3–7
Uchida H, Zhu M, Della ST (1995) UNL: a gift for a millennium. The United Nations University, Tokyo
Van Rijsbergen C (1979) Information retrieval. Butterworth, London
Vuorimaa P (1994) Fuzzy self-organizing map. Fuzzy Sets Syst 66:223–231
Wallace M, Akrivas G, Stamou G (2003) Automatic thematic categorization of documents using a fuzzy taxonomy and fuzzy hierarchical clustering. In: Proceedings of the 12th IEEE international conference on fuzzy systems, vol 2, pp 1446–1451
Acknowledgments
This research has been partially supported by TIN2007-67494 F-META project, MEC-FEDER, (Spain) and PEIC09-0196-3018 SCAIWEB-2 excellence project of Autonomous Government of Castilla-La Mancha (Spain).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Romero, F.P., Peralta, A., Soto, A. et al. Fuzzy optimized self-organizing maps and their application to document clustering. Soft Comput 14, 857–867 (2010). https://doi.org/10.1007/s00500-009-0468-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-009-0468-3