An Effective Rough Set-Based Method for Text Classification

Bao, Yongguang; Asai, Daisuke; Du, Xiaoyong; Yamada, Kazutaka; Ishii, Naohiro

doi:10.1007/978-3-540-45080-1_75

Yongguang Bao⁷,
Daisuke Asai⁷,
Xiaoyong Du⁸,
Kazutaka Yamada⁷ &
…
Naohiro Ishii⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2690))

Included in the following conference series:

International Conference on Intelligent Data Engineering and Automated Learning

978 Accesses
3 Citations

Abstract

A central problem in good text classification for IF/IR is the high dimensionality of the data. To cope with this problem, we propose a technique using Rough Sets theory to alleviate this situation. Given corpora of documents and a training set of examples of classified documents, the technique locates a minimal set of co-ordinate keywords to distinguish between classes of documents, reducing the dimensionality of the keyword vectors. Besides, we generate several reduct bases for the classification of new object, hoping that the combination of answers of the multiple reduct bases result in better performance. To get the tidy and effective rules, we use the value reduction as the final rules. This paper describes the proposed technique and provides experimental results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Joachims, T.: Text Classification with Support Vector Machines: Learning with Many Relevant Features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 170–178. Springer, Heidelberg (1998)
Google Scholar
Yang, Y.: An Evaluation of Statistical Approaches to Text Classification. Journal of Information Retrieval 1, 69–90 (1999)
Article Google Scholar
Pawlak, Z.: Rough Sets–Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dordrecht (1991)
MATH Google Scholar
Skowron, A., Rauszer, C.: The Discernibility Matrices and Functions in Information Systems. In: Slowinski, R. (ed.) Intelligent Decision Support – Handbook of Application and Advances of Rough Sets Theory, pp. 331–362. Kluwer Academic Publishers, Dordrecht (1992)
Google Scholar
Chouchoulas, A., Shen, Q.: A Rough Set-Based Approach to Text Classification. In: Zhong, N., Skowron, A., Ohsuga, S. (eds.) RSFDGrC 1999. LNCS (LNAI), vol. 1711, pp. 118–127. Springer, Heidelberg (1999)
Chapter Google Scholar
Ishii, N., Bao, Y.: A Simple Method of Computing Value Reduction. In: Proceedings of CSITeA 2002, Brazil, pp. 76–80 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Intelligence and Computer Science, Nagoya Institute of Technology, Nagoya, 466-8555, Japan
Yongguang Bao, Daisuke Asai, Kazutaka Yamada & Naohiro Ishii
School of Information, Renmin University of China, 100872, Beijing, China
Xiaoyong Du

Authors

Yongguang Bao
View author publications
You can also search for this author in PubMed Google Scholar
Daisuke Asai
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyong Du
View author publications
You can also search for this author in PubMed Google Scholar
Kazutaka Yamada
View author publications
You can also search for this author in PubMed Google Scholar
Naohiro Ishii
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong
Jiming Liu
Department of Computer Science, Hong Kong Baptist University, Hong Kong
Yiu-ming Cheung
School of Electrical and Electronic Engineering, University of Manchester, UK
Hujun Yin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bao, Y., Asai, D., Du, X., Yamada, K., Ishii, N. (2003). An Effective Rough Set-Based Method for Text Classification. In: Liu, J., Cheung, Ym., Yin, H. (eds) Intelligent Data Engineering and Automated Learning. IDEAL 2003. Lecture Notes in Computer Science, vol 2690. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45080-1_75

Download citation

DOI: https://doi.org/10.1007/978-3-540-45080-1_75
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40550-4
Online ISBN: 978-3-540-45080-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics