Abstract
This paper describes a document classifier for web documents in the fields of Information Technology and uses SVM to learn a model, which is constructed from the training sets and its representative terms. To reduce information overload, it needs to exploit automatic text classification for handling enormous documents. Support Vector Machine (SVM) is a model that is calculated as a weighted sum of kernel function outputs. The basic idea is to exploit the representative terms meaning distribution in coherent thematic texts of each category by simple statistics methods. Vector-space model is applied to represent documents in the categories by using feature selection scheme based on TFiDF. We apply a category factor which represents effects in category of any term to the feature selection. Experiments show the results of categorization and the correlation of vector length.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Yan, T.W., Garcia-Molina, H.: Sift - a tool for wide-area information dissemination. In: Proceedings of the 1995 USENIX Technical Conference, pp. 177–186 (1995)
Salton, G.: Automatic Text Processing: The Transformation Analysis and Retrieval of Information by Computer. Addison-Wesley, Reading (1989)
Vapnik, V.: Statistical Learning Tehory. John Wiley and Sons, Inc., New York (1998)
Chapelle, O., Haffner, P., Vapnik, V.: Svm for histogram-based image classification. IEEE Trans. on Neural Networks 10(5), 1055–1065 (1999)
Yang, Y., Pdedersen, J.O.: A Comparative Study on Feature Selection in Text Categorization. In: Proc. Of the 14th Internatinal Conference on Machine Learning ICML 1997, pp. 412–429 (1997)
Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)
Joachims: SVMLight (1998), http://ais.gmd.de/~thorsten/svm_light
Lewis, D., Gale, W.A.: A sequential algorithm for training text classifiers. In: Proc. SIGIR 1994, Dublin, Ireland, pp. 3–12 (1994)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kang, Y. (2005). Representative Term Based Feature Selection Method for SVM Based Document Classification. In: Khosla, R., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2005. Lecture Notes in Computer Science(), vol 3681. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11552413_9
Download citation
DOI: https://doi.org/10.1007/11552413_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28894-7
Online ISBN: 978-3-540-31983-2
eBook Packages: Computer ScienceComputer Science (R0)