Summary
Clustering enables more effective information retrieval. In practice, similar approaches are used for ranking and clustering. This paper presents a practical evaluation of a method for clustering of documents which is based on certain textual fuzzy similarity measure. The similarity measure was originally introduced in [12.] — cf. also [13.], and later used in internet-related applications [14., 15., 18.]. Two textual databases [21., 22.] of predefined clusters and of diverse level of freedom in the contents of documents were used for experiments that employed some variants of the basic clustering method [19.].
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Baeza-Yates R, Ribeiro-Neto B (1999) Modern Information Retrieval. Addison Wesley, New York
Bandemer H, Gottwald S (1995) Fuzzy sets, Fuzzy Logic, Fuzzy Methods with Applications. John Wiley and Sons
Bezdek J C (1981) Pattern Recognition with Fuzzy Objective Function. Plenum Press, New York
Bezdek J C (1980) A Convergence Theorem for the Fuzzy ISODATA Clustering Algorithms. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2:1–8
Bezdek J C, Hathaway R. J, Sabin M J, Tucker W T (1987) Convergence Theory for Fuzzy c-Means: Counterexamples and Repairs. IEEE Trans. on Systems, Man, and Cybernetics, 17:873–877
Kraft D H, Chen J (2001) Integrating and Extending Fuzzy Clustering and Inferencing to Improve Text Retrieval Performance. In: Larsen H L, et al. (eds) Flexible Query Answering Systems. Physica-Verlag, A Springer-Verlag Company, Heidelberg, New York
Kraft D H, Chen J, Martin-Bautista M J, Amparo-Vila M (2003) Textual In formation Retrieval with User Profiles using Fuzzy Clustering and Inferencing. In: Szczepaniak P S, Segovia J, Kacprzyk J, Zadeh L (eds) Intelligent Exploration of the Web. Physica-Verlag, A Springer-Verlag Company, Heidelberg, New York
Jain A K, Dubes R. C (1988) Algorithms for Clustering Data. Englewood Cliffs, Prentice Hall
Larsen H L, Kacprzyk J, Zadrozny S, Andreasen T (2001) Flexible Query Answering Systems. Physica-Verlag, A Springer-Verlag Company, Heidelberg
Lebart L, Salem A, Berry L (1998) Exploring Textual Data, Kluwer Academic Publisher
Ho T B, Kawasaki S, Nguyen N B (2003) Documents Clustering using Tolerance Rough Set Model and Its Application to Information Retrieval. In: Szczepaniak P S, Segovia J, Kacprzyk J, Zadeh L (eds) Intelligent Exploration of the Web. Physica-Verlag, A Springer-Verlag Company, Heidelberg, New York
Niewiadomski A (2000) Appliance of fuzzy relations for text document com paring. Proceedings of the 5th Conference NNSC (Zakopane, Poland, June 6–10):347–352
Niewiadomski A, Szczepaniak P S (2001) Intutionistic Fuzzy Relations in Approximate Text Comparison. Published in Polish: Intuicjonistyczne relacjerozmyte w przyblizonym porownywaniu tekstow. In: Chojcan J, Leski J (eds) Zbiory rozmyte i ich zastosowania. Silesian Technical University Press, Gliwice, Poland:271–282; ISBN 83-88000-64-0
Niewiadomski A, Szczepaniak P S, (2002) Fuzzy Similarity in E-Commerce Do mains. In: Segovia J, Szczepaniak P S, Niedzwiedzinski M (eds) E-Commerce and Intelligent Methods. Physica-Verlag, A Springer-Verlag Company, Heidelberg, New York
Niewiadomski A, Kryger P, Szczepaniak P S (2004) Fuzzyfication of Indescernibility Relation for Structurizing Lists of Synonyms and Stop-Lists for Search Engines. In: Rutkowski L, Siekmann J, Tadeusiewicz R, Zadeh L A (eds) Artificial Intelligence and Soft Computing ICAISC 2004. Proceedings of the Seventh International Conference on Neural Networks and Soft Computig. Zakopane, Poland, 2004. Series: Lecture Notes in Artificial Intelligence — LNAI 3070, Springer-Verlag, Berlin, Heidelberg, New York, 2004:504–509. ISBN 3-540-22123-9
Niewiadomski A, Kryger P, Szczepaniak P S (2004) Fuzzy Comparison of Strings in FAQ Answering. In: W. Abramowicz (eds) BIS’2004. Proceedings of 7th International Conference on Business Information Systems. Wydawnictwo Akademii Ekonomicznej, Poznan, Poland:355–362. ISBN 83-7417-019-0
Pal S K, Talwar V, Mitra P (2002) Web Mining in Soft Computing Framewarks: Relevance, State of the Art and FutureDirections. IEEE Trans. on Neural Net works, vol.13, no.5.
Szczepaniak P S, Niewiadomski A (2003) Internet Search Based on Text Intuitionistic Fuzzy Similarity. In: Szczepaniak P S, Segovia J, Kacprzyk J, Zadeh L (eds) Intelligent Exploration of the Web. Physica-Verlag, A Springer-Verlag Company, Heidelberg, New York
Szczepaniak P S, Niewiadomski A (2003) Clustering of Documents on the Basis of Text Fuzzy Similarity. In: Abramowicz W (eds) Knowledge-Based Retrieval and Filtering from the Web. Kluwer Academic Publishers, USA:219–230; ISBN 1-4020-7523-5
Zadeh L (1965) Fuzzy Sets. Information and Control, 8:338–353.
http://tldp.org/HOWTO/HOWTO-INDEX/categories.html
http://people.csail.mit.edu/u/j/jrennie/public_html/20Newsgroups
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Szczepaniak, P.S. (2006). Clustering and Classification of Textual Documents Based on Fuzzy Quantitative Similarity Measure — a Practical Evaluation. In: Last, M., Szczepaniak, P.S., Volkovich, Z., Kandel, A. (eds) Advances in Web Intelligence and Data Mining. Studies in Computational Intelligence, vol 23. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-33880-2_31
Download citation
DOI: https://doi.org/10.1007/3-540-33880-2_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33879-6
Online ISBN: 978-3-540-33880-2
eBook Packages: EngineeringEngineering (R0)