Abstract
Automatic text categorization is the task of assigning natural language text documents to predefined categories based on their context. In order to classify text documents, we must evaluate the values of words in documents. In previous research, the value of a word is commonly represented by the product of the term frequency and the inverted document frequency of the word, which is called TF*IDF for short. Since there is a different role for a word in different category documents, we should measure the value of the word according to various categories. In this paper, we proposal a new method used to measure the importance of words in categories and a new framework for text categorization. To verity the efficiency of our new method, we conduct experiments using three text collections. The k-NN is used as the classifier in our experiments. Experimental results show that our new method makes a significant improvement in all these text collections.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Yang, Y.: Expert network: Effective and efficient learning from human decisions in text categorization and retrieval. In: Proceedings of 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 13–22 (1994)
McCallum, A., Nigam, K.: A comparison of event models for naïve bayes text classification. In: Proceedings of AAAI 1998 Workshop on Learning for Text Categorization (1998)
Apte, C., Damerau, F., Weiss, S.: Text mining with decision rules and decision trees. In: Proceedings of Conference on Automated Learning and Discovery, Workshop 6: Learning from Text and the Web, pp. 487–499 (1998)
Ng, H.T., Goh, W.B., Low, K.L.: Feature selection, perceptron learning, and a usability case study for text categorization. In: Proceedings of 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 67–73 (1997)
Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)
Salton, G., Buckley, C.: Term weighting approaches in automatic text retrieval. Information Processing and Management 24(5), 513–523 (1988)
Yang, Y., Liu, X.: A re-examination of text categorization methods. In: Proceedings of 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 42–49 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Deng, Z., Zhang, M. (2005). Improving Text Categorization Using the Importance of Words in Different Categories. In: Hao, Y., et al. Computational Intelligence and Security. CIS 2005. Lecture Notes in Computer Science(), vol 3801. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11596448_67
Download citation
DOI: https://doi.org/10.1007/11596448_67
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30818-8
Online ISBN: 978-3-540-31599-5
eBook Packages: Computer ScienceComputer Science (R0)