Documents Clustering Using Tolerance Rough Set Model and Its Application to Information Retrieval

Ho, Tu Bao; Kawasaki, Saori; Nguyen, Ngoc Binh

doi:10.1007/978-3-7908-1772-0_12

Documents Clustering Using Tolerance Rough Set Model and Its Application to Information Retrieval

Tu Bao Ho⁶,
Saori Kawasaki⁶ &
Ngoc Binh Nguyen⁷

Chapter

229 Accesses
7 Citations

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 111))

Abstract

Clustering is a powerful tool for analyzing and finding useful information in text collections. However, document clustering is a difficult clustering problem because of the unstructured form and textual characteristics of documents. As a consequence, the quality of document clustering depends not only on clustering algorithms but also on document representation models. In this work we introduce a tolerance rough set model (TRSM) for representing documents as an alternative way of considering semantics relatedness between documents. Using TRSM we develop two hierarchical and nonhierarchical clustering algorithms for documents and apply these clustering methods to information retrieval. The TRSM clustering methods and the TRSM cluster-based information retrieval method are carefully evaluated and validated by comparative experiments on test collections.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Baeza-Yates, R. and Ribeiro-Neto, B. (1999). Modern Information Retrieval, Addison Wesley, 1999.
Google Scholar
Fakes, W. B. and Baeza-Yates (eds.), (1992). Information Retrieval. Data Structures and Algorithms,Prentice Hall.
Google Scholar
Ho, T. B. and Funakoshi K. (1998). Information retrieval using rough sets’ Journal of Japanese Society for Artificial Intelligence, Vol. 13, N. 3, 424–433.
Google Scholar
Kawasaki, S, Nguyen, N.B. and Ho, T.B. (2000). Hierarchical Document Clustering Based on Tolerance Rough Set Mode, 4th European Conference on Principles and Practice of Knowledge Discovery in Databases, September 2000. Lecture Notes in Artificial Intelligence, Springer, xx-xx.
Google Scholar
Landau, D., Feldman, R., Aumann, Y., Fresko, M., Lindell, Y., Lipshtat, O., and Zamir, O. (1996). TextVis: An integrated visual environment for text mining, Principles of Data Mining and Knowledge Discovery, Springer, 56–64.
Google Scholar
Larsen, B. and Aone, C. (1999). Fast and effective text mining using linear-time document clustering, Proc. Knowledge Discovery and Data Mining KDD’99, 16–22.
Google Scholar
Lebart, L., Salem, A., and Berry, L. (1998). Exploring Textual Data,Kluwer Academic Publishers.
Google Scholar
Lin, T. Y. and Cercone, N. (eds.), (1997), Rough Sets and Data Mining. Analysis of Imprecise Data,Kluwer Academic Publishers.
Google Scholar
Manning, C. D. and Schutze, H. (1999). Foundations of Statistical Natural Language Processing,The MIT Press.
Google Scholar
Pawlak, Z. (1991). Rough sets: Theoretical Aspects of Reasoning about Data,Kluwer Academic Publishers.
Google Scholar
Polkowski, L. and Skowron, A. (eds.), (1998). Rough Sets in Knowledge Discovery 2. Applications, Case Studies and Software Systems,Physica-Verlag.
Google Scholar
Skowron, A. and Stepaniuk, J. (1994). Generalized approximation spaces, The 3rd International Workshop on Rough Sets and Soft Computing, 156–163.
Google Scholar
Willet, P. (1988). Recent trends in hierarchical document clustering: A critical review, Information Processing and Management, 577–597.
Google Scholar

Download references

Author information

Authors and Affiliations

Japan Advanced Institute of Science and Technology, Tatsunokuchi, Ishikawa, 923-1292, Japan
Tu Bao Ho & Saori Kawasaki
Hanoi University of Technology, DaiCo Viet Road, Hanoi, Vietnam
Ngoc Binh Nguyen

Authors

Tu Bao Ho
View author publications
You can also search for this author in PubMed Google Scholar
Saori Kawasaki
View author publications
You can also search for this author in PubMed Google Scholar
Ngoc Binh Nguyen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer Science, Technical University of Lodz, ul. Sterlinga 16/18, 90-217, Lodz, Poland
Piotr S. Szczepaniak
Systems Research Institute, Polish Academy of Sciences, ul. Newelska 6, 01-447, Warsaw, Poland
Piotr S. Szczepaniak & Janusz Kacprzyk &
Facultad de Informática, Universidad Politécnica de Madrid, Campus de Montegancedo, 28660, Madrid, Spain
Javier Segovia
Computer Science Division, Department of Electrical Engineering and Computer Sciences, University of California, 94720-1776, Berkeley, CA, USA
Lotfi A. Zadeh

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Ho, T.B., Kawasaki, S., Nguyen, N.B. (2003). Documents Clustering Using Tolerance Rough Set Model and Its Application to Information Retrieval. In: Szczepaniak, P.S., Segovia, J., Kacprzyk, J., Zadeh, L.A. (eds) Intelligent Exploration of the Web. Studies in Fuzziness and Soft Computing, vol 111. Physica, Heidelberg. https://doi.org/10.1007/978-3-7908-1772-0_12

Download citation

DOI: https://doi.org/10.1007/978-3-7908-1772-0_12
Publisher Name: Physica, Heidelberg
Print ISBN: 978-3-7908-2519-0
Online ISBN: 978-3-7908-1772-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics