Skip to main content

A Hybrid Model for Document Clustering Based on a Fuzzy Approach of Synonymy and Polysemy

  • Chapter
Book cover Theoretical Advances and Applications of Fuzzy Logic and Soft Computing

Part of the book series: Advances in Soft Computing ((AINSC,volume 42))

  • 1306 Accesses

Abstract

A new model for document clustering is proposed in order to manage with conceptual aspects. To measure the presence degree of a concept in a document (or even in a document collection), a concept frequency formula is introduced. This formula is based on new fuzzy formulas to calculate the synonymy and polysemy degrees between terms. To solve the several shortcomings of classical clustering algorithm a soft approach to hybrid model is proposed. The clustering procedure is implemented by two connected and tailored algorithms with the aim to build a fuzzy-hierarchical structure. A fuzzy hierarchical clustering algorithm is used to determine an initial clustering and the process is completed using an improved soft clustering algorithm. Experiments show that using this model, clustering tends to perform better than the classical approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Zamir, O., Etzioni, O.: Grouper: A dynamic clustering interface to web search results. In: Proceedings of the WWW8 (1999)

    Google Scholar 

  2. Spath, H.: Clustering Analysis Algorithms for Data Reduction and Classification of Objects. Ellis Horwood, Chichester (1980)

    Google Scholar 

  3. Zhao, Y., Karypis, G.: Evaluation of hierarchical clustering algorithms for document datasets. In: Proceedings of CIKM, pp. 515–524. ACM Press, New York (2002)

    Chapter  Google Scholar 

  4. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. ACM Press, New York (1999)

    Google Scholar 

  5. Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  6. Soto, A., Olivas, J.A., Prieto, M.E.: Fuzzy Approach of Synonymy and Polysemy for Information Retrieval. In: Proceedings International Symposium on Fuzzy and Rough Sets (ISFUROS ’06), Santa Clara, Cuba (2006)

    Google Scholar 

  7. Fernandez, S., Grana, J., Sobrino, A.: A Spanish e-dictionary of synonyms as a fuzzy tool for information retrieval. In: Actas de las I Jornadas de Tratamiento y Recuperación de Información (JOTRI 2002), León, Spain (September 2002)

    Google Scholar 

  8. Mendes, M.E.S., Sacks, L.: A Scalable Hierarchical Fuzzy Clustering Algorithm for Text Mining. In: Proc. of the 4th International Conference on Recent Advances in Soft Computing, RASC’2004, Nottingham, UK, pp. 269–274 (2004)

    Google Scholar 

  9. El-Hamdouchi, A., Willet, P.: Comparison of Hierarchic Agglomerative Clustering Methods for Document Retrieval. The Computer Journal 32(3) (1989)

    Google Scholar 

  10. Akrivas, G., et al.: Context - Sensitive Semantic Query Expansion. In: Proceedings of the IEEE International Conference on Artificial Intelligence Systems (ICAIS), Divnomorskoe, Russia, IEEE, Los Alamitos (2002)

    Google Scholar 

  11. King-ip, L., Ravikumar, K.: A similarity-based soft clustering algorithm for documents. In: Proc. of the Seventh Int. Conf. on Database Sys. for Advanced Applications (2001)

    Google Scholar 

  12. Olivas, J.A., Garcés, P., Romero, F.P.: An application of the FIS-CRM model to the FISS metasearcher: Using fuzzy synonymy and fuzzy generality for representing concepts in documents. International Journal of Approximate Reasoning (Soft Computing in Recognition and Search) 34, 201–219 (2003)

    Article  MATH  Google Scholar 

  13. Beil, F., Ester, M., Xu, X.: Frequent Term-Based Clustering. In: Proceedings of the SIGKDD’02, Edmonton, Canada (2002)

    Google Scholar 

  14. Yang, Y.: An Evaluation of Statistical Approaches to Text Categorization. Journal of Information Retrieval 1(1/2), 67–88 (1999)

    Google Scholar 

  15. Larsen, B., Aone, C.: Fast and Effective Text Mining Using Linear-time Document Clustering. In: KDD-99, San Diego, California (1999)

    Google Scholar 

  16. Van Rijsbergen, C.J.: Information Retrieval, 2nd edn. Buttersworth, London (1989)

    Google Scholar 

  17. Kowalski, G.: Information Retrieval Systems – Theory and Implementation. Kluwer Academic Publishers, Dordrecht (1997)

    MATH  Google Scholar 

  18. Lewis, D.: Reuters-21578 text categorization text collection 1.0. http://www.research.att.com/~lewis

  19. Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)

    MATH  Google Scholar 

  20. Pedrycz, W.: Conditional Fuzzy C-Means. Pattern Recognition Letters 17, 625–631 (1996)

    Article  Google Scholar 

  21. Kohonen, T.: Self-organizing Maps. Series in Information Sciences, vol. 30. Springer, Heidelberg (1995)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Oscar Castillo Patricia Melin Oscar Montiel Ross Roberto Sepúlveda Cruz Witold Pedrycz Janusz Kacprzyk

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Romero, F.P., Soto, A., Olivas, J.A. (2007). A Hybrid Model for Document Clustering Based on a Fuzzy Approach of Synonymy and Polysemy. In: Castillo, O., Melin, P., Ross, O.M., Sepúlveda Cruz, R., Pedrycz, W., Kacprzyk, J. (eds) Theoretical Advances and Applications of Fuzzy Logic and Soft Computing. Advances in Soft Computing, vol 42. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72434-6_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-72434-6_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-72433-9

  • Online ISBN: 978-3-540-72434-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics