Abstract
In this paper, we propose a probabilistic approach to feature selection for multi-class text categorization. Specifically, we regard document class and occurrence of each feature as events, calculate the probability of occurrence of each feature by the theorem on the total probability and utilize the values as a ranking criterion. Experiments on Reuters-2000 collection show that the proposed method can yield better performance than information gain and χ-square, which are two well-known feature selection methods.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrival. Addison-Wesley, Reading (1999)
Yang, Y., Pedersen, J.O.: A Comparative Study on Feature Selection in Text Categorization. In: Proc. Of the 14th Int. Conf. on Machine Learning, pp. 412–420 (1997)
Brank, J., Grobelnik, M., Milic-Frayling, N., Mladenic, D.: Feature Selection Using Support Vector Machines. In: Proc. 3rd Int. Conf. on Data Mining Methods and Databases for Engineering, Finance, and Other Fields (2002)
Ittner, D.J., Lewis, D.D., Ahn, D.D.: Text Categorization of Low Quality Images. In: Symposium on Document Analysis and Information Retrieval, Las Vegas, pp. 301–315 (1995)
Fan, Z.G., Lu, B.L.: Fast Recognition of Multi-View Faces with Feature Selection. In: 10th IEEE International Conference on Computer Vision, pp. 76–81 (2005)
McCallum, A., Nigam, K.: A Comparision of Event Models for Naive Bayes Text Classification. In: AAAI-98 Workshop on Learning for Text Categorization (1998)
Sebastiani, F.: Machine Learning in Automated Text Categorization. ACM computing Surveys 34(1), 1–47 (2002)
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (2000)
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene Selection for Cancer Classification Using Support Vector Machines. Machine Learning 46, 389–422 (2002)
Fan, Z.-G., Wang, K.-A., Lu, B.-L.: Feature Selection for Fast Image Classification with Support Vector Machines. In: Pal, N.R., Kasabov, N., Mudi, R.K., Pal, S., Parui, S.K. (eds.) ICONIP 2004. LNCS, vol. 3316, pp. 1026–1031. Springer, Heidelberg (2004)
Heisele, B., Serre, T., Prentice, S., Poggio, T.: Hierarchical Classification Andfeature Reduction for Fast Face Detection with Support Vector Machines. Pattern Recognition 36, 2007–2017 (2003)
Rifkin, R., Klautau, A.: In Defense of One-Vs-All Classification. Journal of Machine Learning Research 5, 101–141 (2004)
Chang, C.C., Lin, C.J.: LIBSVM: A Library for Support Vector Machines [EB/OL] (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wu, K., Lu, BL., Uchiyama, M., Isahara, H. (2007). A Probabilistic Approach to Feature Selection for Multi-class Text Categorization. In: Liu, D., Fei, S., Hou, ZG., Zhang, H., Sun, C. (eds) Advances in Neural Networks – ISNN 2007. ISNN 2007. Lecture Notes in Computer Science, vol 4491. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72383-7_153
Download citation
DOI: https://doi.org/10.1007/978-3-540-72383-7_153
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-72382-0
Online ISBN: 978-3-540-72383-7
eBook Packages: Computer ScienceComputer Science (R0)