Skip to main content
Log in

Fuzzy optimized self-organizing maps and their application to document clustering

  • Focus
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

In this paper, an approach using fuzzy logic techniques and self-organizing maps (SOM) is presented in order to manage conceptual aspects in document clusters and to reduce the training time. In order to measure the presence degree of a concept in a document, a concept frequency formula is introduced. This formula is based on new fuzzy formulas to calculate the polysemy degree of terms and the synonymy degree between terms. In this approach, new fuzzy improvements such as automatic choice of the topology, heuristic map initialization, a fuzzy similarity measure and a keywords extraction process are used. Some experiments have been carried out in order to compare the proposed system with classic SOM approaches by means of Reuters collection. The system performance has been measured in terms of F-measure and training time. The experimental results show that the proposed approach generates good results with less training time compared to classic SOM techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. http://www.daviddlewis.com/resources/testcollections.

References

  • Apté C, Damerau F, Weiss SM (1994) Automated learning of decision rules for text categorization. ACM Trans Inf Syst 12(3):233–251

    Article  Google Scholar 

  • Azcarraga AP, Yap TN (2001) Som-based methodology for building large text archives. In: Proceedings of the 7th international conference on database systems for advanced applications, pp 66–73. IEEE Computer Society, Washington

  • Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. ACM Press, New York

  • Bezdek JC, Tsao EC, Pal NR (1992) Fuzzy kohonen clustering networks. In: Proceedings of the IEEE international conference on fuzzy systems, pp 1035–1043

  • Bordogna G, Pagani M, Pasi G (2006) A dynamical hierarchical fuzzy clustering algorithm for document filtering. Stud Fuzziness Soft Comput 197:1–23

    Article  Google Scholar 

  • Bouchachia A, Mittermeir R (2006) Towards incremental fuzzy classifiers. Soft Comput 11(2):193–207

    Article  Google Scholar 

  • Cottrell M, Verleysen M (2006) Advances in self-organizing maps. Neural Netw 19(6):721–722

    Article  Google Scholar 

  • Ellman J (2003) Eurowordnet: a multilingual database with lexical semantic networks. Nat Lang Eng 9(4):427–430

    Article  Google Scholar 

  • Fellbaum C (1998) WordNet: an electronic lexical database. MIT Press, Cambridge

  • Fernandez S, Grana J, Sobrino A (2002) A spanish e-dictionary of synonyms as a fuzzy tool for information retrieval. In: Actas de las I Jornadas de Tratamiento y Recuperacion de Informacion

  • Garcés P, Olivas J, Romero F (2006) Concept-matching IR systems versus word-matching information retrieval systems: considering fuzzy interrelations for indexing web pages. J Am Soc Inf Sci Technol 57(4):564–576

    Article  Google Scholar 

  • Gonzalo J, Verdejo F, Chugur I (1998) Indexing with wordnet synsets can improve text retrieval. In: Proceedings of the COLING/ACL’98 Workshop on Usage of WordNet for NLP, pp 38–44

  • Han J (2005) Data Mining: concepts and techniques. Morgan Kaufmann, San Francisco

    Google Scholar 

  • Hotho A, Staab S, Stumme G (2003) Ontologies improve text document clustering. In: Proceedings of the third IEEE international conference on data mining, pp 541–544. IEEE Press, Washington DC

  • Huntsberger T, Ajjimarangsee P (1992) Parallel self-organizing feature maps for unsupervised pattern recognition. In: Fuzzy Models for Pattern Recognition, pp 483–495

  • Kaski S (1998) Dimensionality reduction by random mapping: fast similarity computation for clustering. In: Proceedings international join conference on neural networks, vol 1, pp 413–418

  • Kohonen T (1982) Self-organized formation of topologically correct feature maps. Biol Cybern 43:59–69

    Article  MATH  MathSciNet  Google Scholar 

  • Kohonen T (1998) The self-organizing map. Neurocomputing 21:1–6

    Article  MATH  Google Scholar 

  • Kong S, Kosko B (1992) Adaptive fuzzy system for backing up a truck-and-trailer. IEEE Trans Neural Netw 3:211–223

    Article  Google Scholar 

  • Lagus K, Honkela T, Kaski S, Kohonen T (1999) Websom for textual data mining. Artif Intell Rev 13(5–6):345–364

    Article  Google Scholar 

  • Lazzerini B, Marcelloni F (2007) A hierarchical fuzzy clustering-based system to create user profiles. Soft Comput 11:157–168

    Article  MATH  Google Scholar 

  • Li Y, Chung SM, Holt JD (2008) Text document clustering based on frequent word meaning sequences. Data Knowl Eng 64(1):381–404

    Article  Google Scholar 

  • Lin X, Soergel D, Marchionini G (1991) A self-organizing semantic map for information retrieval. In: Proceedings of the 14th annual international ACM SIGIR conference, pp 262–269. ACM, New York

  • Merkl D (1998) Text classification with self-organizing maps: some lessons learned. Neurocomputing 21:68–77

    Article  Google Scholar 

  • Miller GA, Beckwith R, Fellbaum C et al (1990) Introduction to wordnet: an on-line lexical database. Int J Lexicogr 3(4):235–244

    Article  Google Scholar 

  • Mitra S, Pal SK (1994) Self-organizing neural network as a fuzzy classifier. IEEE Trans Syst Man Cybern 24(3):385–399

    Article  Google Scholar 

  • Miyamoto S (1990) Fuzzy sets in information retrieval and cluster analysis. Kluwer, Dordrecht

  • Nürnberger A, Detyniecki M (2006) Externally growing self-organizing maps and its application to e-mail database visualization and exploration. Appl Soft Comput 6(4):357–371

    Article  Google Scholar 

  • Olivas JA, Garcés PJ, Romero FP (2003) An application of the fis-crm model to the fiss metasearcher: using fuzzy synonymy and fuzzy generality for representing concepts in documents. Int J Approx Reason 34:201–209

    Article  MATH  Google Scholar 

  • Pascual-Marqui RD, Pascual-Montano AD, Kochi K, Carazo JM (2001) Smoothly distributed fuzzy c-means: a new self-organizing map. Pattern Recognit 34:2395–2402

    Article  Google Scholar 

  • Ritter H, Kohonen T (1989) Self-organizing semantic maps. Biol Cybern 61:241–254

    Article  Google Scholar 

  • Romero FP, Olivas JA, Garcés PJ (2006) A soft approach to hybrid models for document clustering. Proc Inform Process Manag Uncertain Knowl Based Syst 1:1040–1045

    Google Scholar 

  • Salton G, McGill MJ (1986) Introduction to modern information retrieval. McGraw-Hill, New York

    Google Scholar 

  • Soto A, Olivas JA, Prieto M (2008) Fuzzy approach of synonymy and polysemy for information retrieval. Stud Fuzziness Soft Comput 224:179–198

    Article  Google Scholar 

  • Steinbach M, Karypis G, Kumara V (2000) A comparison of document clustering techniques. In: Proceedings of the knowledge discovery on databases, pp 3–7

  • Uchida H, Zhu M, Della ST (1995) UNL: a gift for a millennium. The United Nations University, Tokyo

  • Van Rijsbergen C (1979) Information retrieval. Butterworth, London

    Google Scholar 

  • Vuorimaa P (1994) Fuzzy self-organizing map. Fuzzy Sets Syst 66:223–231

    Article  Google Scholar 

  • Wallace M, Akrivas G, Stamou G (2003) Automatic thematic categorization of documents using a fuzzy taxonomy and fuzzy hierarchical clustering. In: Proceedings of the 12th IEEE international conference on fuzzy systems, vol 2, pp 1446–1451

Download references

Acknowledgments

This research has been partially supported by TIN2007-67494 F-META project, MEC-FEDER, (Spain) and PEIC09-0196-3018 SCAIWEB-2 excellence project of Autonomous Government of Castilla-La Mancha (Spain).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Francisco P. Romero.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Romero, F.P., Peralta, A., Soto, A. et al. Fuzzy optimized self-organizing maps and their application to document clustering. Soft Comput 14, 857–867 (2010). https://doi.org/10.1007/s00500-009-0468-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-009-0468-3

Keywords

Navigation