Skip to main content

Clustering and Classification of Textual Documents Based on Fuzzy Quantitative Similarity Measure — a Practical Evaluation

  • Chapter
Advances in Web Intelligence and Data Mining

Part of the book series: Studies in Computational Intelligence ((SCI,volume 23))

  • 673 Accesses

Summary

Clustering enables more effective information retrieval. In practice, similar approaches are used for ranking and clustering. This paper presents a practical evaluation of a method for clustering of documents which is based on certain textual fuzzy similarity measure. The similarity measure was originally introduced in [12.] — cf. also [13.], and later used in internet-related applications [14., 15., 18.]. Two textual databases [21., 22.] of predefined clusters and of diverse level of freedom in the contents of documents were used for experiments that employed some variants of the basic clustering method [19.].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baeza-Yates R, Ribeiro-Neto B (1999) Modern Information Retrieval. Addison Wesley, New York

    Google Scholar 

  2. Bandemer H, Gottwald S (1995) Fuzzy sets, Fuzzy Logic, Fuzzy Methods with Applications. John Wiley and Sons

    Google Scholar 

  3. Bezdek J C (1981) Pattern Recognition with Fuzzy Objective Function. Plenum Press, New York

    MATH  Google Scholar 

  4. Bezdek J C (1980) A Convergence Theorem for the Fuzzy ISODATA Clustering Algorithms. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2:1–8

    Article  MATH  Google Scholar 

  5. Bezdek J C, Hathaway R. J, Sabin M J, Tucker W T (1987) Convergence Theory for Fuzzy c-Means: Counterexamples and Repairs. IEEE Trans. on Systems, Man, and Cybernetics, 17:873–877

    MATH  Google Scholar 

  6. Kraft D H, Chen J (2001) Integrating and Extending Fuzzy Clustering and Inferencing to Improve Text Retrieval Performance. In: Larsen H L, et al. (eds) Flexible Query Answering Systems. Physica-Verlag, A Springer-Verlag Company, Heidelberg, New York

    Google Scholar 

  7. Kraft D H, Chen J, Martin-Bautista M J, Amparo-Vila M (2003) Textual In formation Retrieval with User Profiles using Fuzzy Clustering and Inferencing. In: Szczepaniak P S, Segovia J, Kacprzyk J, Zadeh L (eds) Intelligent Exploration of the Web. Physica-Verlag, A Springer-Verlag Company, Heidelberg, New York

    Google Scholar 

  8. Jain A K, Dubes R. C (1988) Algorithms for Clustering Data. Englewood Cliffs, Prentice Hall

    MATH  Google Scholar 

  9. Larsen H L, Kacprzyk J, Zadrozny S, Andreasen T (2001) Flexible Query Answering Systems. Physica-Verlag, A Springer-Verlag Company, Heidelberg

    MATH  Google Scholar 

  10. Lebart L, Salem A, Berry L (1998) Exploring Textual Data, Kluwer Academic Publisher

    Google Scholar 

  11. Ho T B, Kawasaki S, Nguyen N B (2003) Documents Clustering using Tolerance Rough Set Model and Its Application to Information Retrieval. In: Szczepaniak P S, Segovia J, Kacprzyk J, Zadeh L (eds) Intelligent Exploration of the Web. Physica-Verlag, A Springer-Verlag Company, Heidelberg, New York

    Google Scholar 

  12. Niewiadomski A (2000) Appliance of fuzzy relations for text document com paring. Proceedings of the 5th Conference NNSC (Zakopane, Poland, June 6–10):347–352

    Google Scholar 

  13. Niewiadomski A, Szczepaniak P S (2001) Intutionistic Fuzzy Relations in Approximate Text Comparison. Published in Polish: Intuicjonistyczne relacjerozmyte w przyblizonym porownywaniu tekstow. In: Chojcan J, Leski J (eds) Zbiory rozmyte i ich zastosowania. Silesian Technical University Press, Gliwice, Poland:271–282; ISBN 83-88000-64-0

    Google Scholar 

  14. Niewiadomski A, Szczepaniak P S, (2002) Fuzzy Similarity in E-Commerce Do mains. In: Segovia J, Szczepaniak P S, Niedzwiedzinski M (eds) E-Commerce and Intelligent Methods. Physica-Verlag, A Springer-Verlag Company, Heidelberg, New York

    Google Scholar 

  15. Niewiadomski A, Kryger P, Szczepaniak P S (2004) Fuzzyfication of Indescernibility Relation for Structurizing Lists of Synonyms and Stop-Lists for Search Engines. In: Rutkowski L, Siekmann J, Tadeusiewicz R, Zadeh L A (eds) Artificial Intelligence and Soft Computing ICAISC 2004. Proceedings of the Seventh International Conference on Neural Networks and Soft Computig. Zakopane, Poland, 2004. Series: Lecture Notes in Artificial Intelligence — LNAI 3070, Springer-Verlag, Berlin, Heidelberg, New York, 2004:504–509. ISBN 3-540-22123-9

    Google Scholar 

  16. Niewiadomski A, Kryger P, Szczepaniak P S (2004) Fuzzy Comparison of Strings in FAQ Answering. In: W. Abramowicz (eds) BIS’2004. Proceedings of 7th International Conference on Business Information Systems. Wydawnictwo Akademii Ekonomicznej, Poznan, Poland:355–362. ISBN 83-7417-019-0

    Google Scholar 

  17. Pal S K, Talwar V, Mitra P (2002) Web Mining in Soft Computing Framewarks: Relevance, State of the Art and FutureDirections. IEEE Trans. on Neural Net works, vol.13, no.5.

    Google Scholar 

  18. Szczepaniak P S, Niewiadomski A (2003) Internet Search Based on Text Intuitionistic Fuzzy Similarity. In: Szczepaniak P S, Segovia J, Kacprzyk J, Zadeh L (eds) Intelligent Exploration of the Web. Physica-Verlag, A Springer-Verlag Company, Heidelberg, New York

    Google Scholar 

  19. Szczepaniak P S, Niewiadomski A (2003) Clustering of Documents on the Basis of Text Fuzzy Similarity. In: Abramowicz W (eds) Knowledge-Based Retrieval and Filtering from the Web. Kluwer Academic Publishers, USA:219–230; ISBN 1-4020-7523-5

    Google Scholar 

  20. Zadeh L (1965) Fuzzy Sets. Information and Control, 8:338–353.

    Article  MATH  MathSciNet  Google Scholar 

  21. http://tldp.org/HOWTO/HOWTO-INDEX/categories.html

    Google Scholar 

  22. http://people.csail.mit.edu/u/j/jrennie/public_html/20Newsgroups

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Szczepaniak, P.S. (2006). Clustering and Classification of Textual Documents Based on Fuzzy Quantitative Similarity Measure — a Practical Evaluation. In: Last, M., Szczepaniak, P.S., Volkovich, Z., Kandel, A. (eds) Advances in Web Intelligence and Data Mining. Studies in Computational Intelligence, vol 23. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-33880-2_31

Download citation

  • DOI: https://doi.org/10.1007/3-540-33880-2_31

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-33879-6

  • Online ISBN: 978-3-540-33880-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics