Abstract
A central problem in good text classification for IF/IR is the high dimensionality of the data. To cope with this problem, we propose a technique using Rough Sets theory to alleviate this situation. Given corpora of documents and a training set of examples of classified documents, the technique locates a minimal set of co-ordinate keywords to distinguish between classes of documents, reducing the dimensionality of the keyword vectors. Besides, we generate several reduct bases for the classification of new object, hoping that the combination of answers of the multiple reduct bases result in better performance. To get the tidy and effective rules, we use the value reduction as the final rules. This paper describes the proposed technique and provides experimental results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Joachims, T.: Text Classification with Support Vector Machines: Learning with Many Relevant Features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 170–178. Springer, Heidelberg (1998)
Yang, Y.: An Evaluation of Statistical Approaches to Text Classification. Journal of Information Retrieval 1, 69–90 (1999)
Pawlak, Z.: Rough Sets–Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dordrecht (1991)
Skowron, A., Rauszer, C.: The Discernibility Matrices and Functions in Information Systems. In: Slowinski, R. (ed.) Intelligent Decision Support – Handbook of Application and Advances of Rough Sets Theory, pp. 331–362. Kluwer Academic Publishers, Dordrecht (1992)
Chouchoulas, A., Shen, Q.: A Rough Set-Based Approach to Text Classification. In: Zhong, N., Skowron, A., Ohsuga, S. (eds.) RSFDGrC 1999. LNCS (LNAI), vol. 1711, pp. 118–127. Springer, Heidelberg (1999)
Ishii, N., Bao, Y.: A Simple Method of Computing Value Reduction. In: Proceedings of CSITeA 2002, Brazil, pp. 76–80 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bao, Y., Asai, D., Du, X., Yamada, K., Ishii, N. (2003). An Effective Rough Set-Based Method for Text Classification. In: Liu, J., Cheung, Ym., Yin, H. (eds) Intelligent Data Engineering and Automated Learning. IDEAL 2003. Lecture Notes in Computer Science, vol 2690. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45080-1_75
Download citation
DOI: https://doi.org/10.1007/978-3-540-45080-1_75
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40550-4
Online ISBN: 978-3-540-45080-1
eBook Packages: Springer Book Archive