Clustering and Classification of Textual Documents Based on Fuzzy Quantitative Similarity Measure — a Practical Evaluation

Szczepaniak, Piotr S.

doi:10.1007/3-540-33880-2_31

Piotr S. Szczepaniak^7,8

Part of the book series: Studies in Computational Intelligence ((SCI,volume 23))

673 Accesses

Summary

Clustering enables more effective information retrieval. In practice, similar approaches are used for ranking and clustering. This paper presents a practical evaluation of a method for clustering of documents which is based on certain textual fuzzy similarity measure. The similarity measure was originally introduced in [12.] — cf. also [13.], and later used in internet-related applications [14., 15., 18.]. Two textual databases [21., 22.] of predefined clusters and of diverse level of freedom in the contents of documents were used for experiments that employed some variants of the basic clustering method [19.].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Baeza-Yates R, Ribeiro-Neto B (1999) Modern Information Retrieval. Addison Wesley, New York
Google Scholar
Bandemer H, Gottwald S (1995) Fuzzy sets, Fuzzy Logic, Fuzzy Methods with Applications. John Wiley and Sons
Google Scholar
Bezdek J C (1981) Pattern Recognition with Fuzzy Objective Function. Plenum Press, New York
MATH Google Scholar
Bezdek J C (1980) A Convergence Theorem for the Fuzzy ISODATA Clustering Algorithms. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2:1–8
Article MATH Google Scholar
Bezdek J C, Hathaway R. J, Sabin M J, Tucker W T (1987) Convergence Theory for Fuzzy c-Means: Counterexamples and Repairs. IEEE Trans. on Systems, Man, and Cybernetics, 17:873–877
MATH Google Scholar
Kraft D H, Chen J (2001) Integrating and Extending Fuzzy Clustering and Inferencing to Improve Text Retrieval Performance. In: Larsen H L, et al. (eds) Flexible Query Answering Systems. Physica-Verlag, A Springer-Verlag Company, Heidelberg, New York
Google Scholar
Kraft D H, Chen J, Martin-Bautista M J, Amparo-Vila M (2003) Textual In formation Retrieval with User Profiles using Fuzzy Clustering and Inferencing. In: Szczepaniak P S, Segovia J, Kacprzyk J, Zadeh L (eds) Intelligent Exploration of the Web. Physica-Verlag, A Springer-Verlag Company, Heidelberg, New York
Google Scholar
Jain A K, Dubes R. C (1988) Algorithms for Clustering Data. Englewood Cliffs, Prentice Hall
MATH Google Scholar
Larsen H L, Kacprzyk J, Zadrozny S, Andreasen T (2001) Flexible Query Answering Systems. Physica-Verlag, A Springer-Verlag Company, Heidelberg
MATH Google Scholar
Lebart L, Salem A, Berry L (1998) Exploring Textual Data, Kluwer Academic Publisher
Google Scholar
Ho T B, Kawasaki S, Nguyen N B (2003) Documents Clustering using Tolerance Rough Set Model and Its Application to Information Retrieval. In: Szczepaniak P S, Segovia J, Kacprzyk J, Zadeh L (eds) Intelligent Exploration of the Web. Physica-Verlag, A Springer-Verlag Company, Heidelberg, New York
Google Scholar
Niewiadomski A (2000) Appliance of fuzzy relations for text document com paring. Proceedings of the 5th Conference NNSC (Zakopane, Poland, June 6–10):347–352
Google Scholar
Niewiadomski A, Szczepaniak P S (2001) Intutionistic Fuzzy Relations in Approximate Text Comparison. Published in Polish: Intuicjonistyczne relacjerozmyte w przyblizonym porownywaniu tekstow. In: Chojcan J, Leski J (eds) Zbiory rozmyte i ich zastosowania. Silesian Technical University Press, Gliwice, Poland:271–282; ISBN 83-88000-64-0
Google Scholar
Niewiadomski A, Szczepaniak P S, (2002) Fuzzy Similarity in E-Commerce Do mains. In: Segovia J, Szczepaniak P S, Niedzwiedzinski M (eds) E-Commerce and Intelligent Methods. Physica-Verlag, A Springer-Verlag Company, Heidelberg, New York
Google Scholar
Niewiadomski A, Kryger P, Szczepaniak P S (2004) Fuzzyfication of Indescernibility Relation for Structurizing Lists of Synonyms and Stop-Lists for Search Engines. In: Rutkowski L, Siekmann J, Tadeusiewicz R, Zadeh L A (eds) Artificial Intelligence and Soft Computing ICAISC 2004. Proceedings of the Seventh International Conference on Neural Networks and Soft Computig. Zakopane, Poland, 2004. Series: Lecture Notes in Artificial Intelligence — LNAI 3070, Springer-Verlag, Berlin, Heidelberg, New York, 2004:504–509. ISBN 3-540-22123-9
Google Scholar
Niewiadomski A, Kryger P, Szczepaniak P S (2004) Fuzzy Comparison of Strings in FAQ Answering. In: W. Abramowicz (eds) BIS’2004. Proceedings of 7th International Conference on Business Information Systems. Wydawnictwo Akademii Ekonomicznej, Poznan, Poland:355–362. ISBN 83-7417-019-0
Google Scholar
Pal S K, Talwar V, Mitra P (2002) Web Mining in Soft Computing Framewarks: Relevance, State of the Art and FutureDirections. IEEE Trans. on Neural Net works, vol.13, no.5.
Google Scholar
Szczepaniak P S, Niewiadomski A (2003) Internet Search Based on Text Intuitionistic Fuzzy Similarity. In: Szczepaniak P S, Segovia J, Kacprzyk J, Zadeh L (eds) Intelligent Exploration of the Web. Physica-Verlag, A Springer-Verlag Company, Heidelberg, New York
Google Scholar
Szczepaniak P S, Niewiadomski A (2003) Clustering of Documents on the Basis of Text Fuzzy Similarity. In: Abramowicz W (eds) Knowledge-Based Retrieval and Filtering from the Web. Kluwer Academic Publishers, USA:219–230; ISBN 1-4020-7523-5
Google Scholar
Zadeh L (1965) Fuzzy Sets. Information and Control, 8:338–353.
Article MATH MathSciNet Google Scholar
http://tldp.org/HOWTO/HOWTO-INDEX/categories.html
Google Scholar
http://people.csail.mit.edu/u/j/jrennie/public_html/20Newsgroups
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Computer Science, Technical University of Lodz, Sterlinga 16/18, 90-217, Lodz, Poland
Piotr S. Szczepaniak
Systems Research Institute, Polish Academy of Sciences, Newelska 6, 01-447, Warsaw, Poland
Piotr S. Szczepaniak

Authors

Piotr S. Szczepaniak
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Information Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva, 84105, Israel
Mark Last
Institute of Computer Sciences, Technical University of Lodz, ul. Wolczanska 215, 93-1005, Lodz, Poland
Piotr S. Szczepaniak
Systems Research Institute, Polish Academy of Sciences, ul. Newelska 6, 01-447, Warsaw, Poland
Piotr S. Szczepaniak
Department of Software Engineering, ORT Braude College, POB. 78, 21982, Karmiel, Israel
Zeev Volkovich
Department of Computer Science and Engineering, University of South Florida, 4202 E. Fowler Ave., ENB 118, Tampa, FL, 33620, USA
Abraham Kandel

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Szczepaniak, P.S. (2006). Clustering and Classification of Textual Documents Based on Fuzzy Quantitative Similarity Measure — a Practical Evaluation. In: Last, M., Szczepaniak, P.S., Volkovich, Z., Kandel, A. (eds) Advances in Web Intelligence and Data Mining. Studies in Computational Intelligence, vol 23. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-33880-2_31

Download citation

DOI: https://doi.org/10.1007/3-540-33880-2_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33879-6
Online ISBN: 978-3-540-33880-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics