Improving Text Categorization Using the Importance of Words in Different Categories

Deng, Zhihong; Zhang, Ming

doi:10.1007/11596448_67

Improving Text Categorization Using the Importance of Words in Different Categories

Zhihong Deng^26,27 &
Ming Zhang²⁷

Conference paper

1260 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3801))

Abstract

Automatic text categorization is the task of assigning natural language text documents to predefined categories based on their context. In order to classify text documents, we must evaluate the values of words in documents. In previous research, the value of a word is commonly represented by the product of the term frequency and the inverted document frequency of the word, which is called TF*IDF for short. Since there is a different role for a word in different category documents, we should measure the value of the word according to various categories. In this paper, we proposal a new method used to measure the importance of words in categories and a new framework for text categorization. To verity the efficiency of our new method, we conduct experiments using three text collections. The k-NN is used as the classifier in our experiments. Experimental results show that our new method makes a significant improvement in all these text collections.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Yang, Y.: Expert network: Effective and efficient learning from human decisions in text categorization and retrieval. In: Proceedings of 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 13–22 (1994)
Google Scholar
McCallum, A., Nigam, K.: A comparison of event models for naïve bayes text classification. In: Proceedings of AAAI 1998 Workshop on Learning for Text Categorization (1998)
Google Scholar
Apte, C., Damerau, F., Weiss, S.: Text mining with decision rules and decision trees. In: Proceedings of Conference on Automated Learning and Discovery, Workshop 6: Learning from Text and the Web, pp. 487–499 (1998)
Google Scholar
Ng, H.T., Goh, W.B., Low, K.L.: Feature selection, perceptron learning, and a usability case study for text categorization. In: Proceedings of 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 67–73 (1997)
Google Scholar
Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)
Chapter Google Scholar
Salton, G., Buckley, C.: Term weighting approaches in automatic text retrieval. Information Processing and Management 24(5), 513–523 (1988)
Article Google Scholar
Yang, Y., Liu, X.: A re-examination of text categorization methods. In: Proceedings of 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 42–49 (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

National Laboratory on Machine Perception, Peking University, Beijing, 100871, China
Zhihong Deng
School of Electronics Engineering and Computer Science, Peking University, Beijing, 100871, China
Zhihong Deng & Ming Zhang

Authors

Zhihong Deng
View author publications
You can also search for this author in PubMed Google Scholar
Ming Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Microelectronic Instiute, Xidian University, 710071, Xi’an, China
Yue Hao
Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong
Jiming Liu
School of Computer Science and Technology, Xidian University, Xi’an, China
Yuping Wang
Department of Computer Science, Hong Kong Baptist University, Hong Kong,
Yiu-ming Cheung
School of Electrical and Electronic Engineering, University of Manchester, UK
Hujun Yin
Life Science Research Center, School of Electronic Engineering, Xidian University, 710071, Xi’an, Shaanxi, China
Licheng Jiao
Key Laboratory of Computer Networks and Information Security (Ministry of Education), Xidian University, 710071, Xi’an, China
Jianfeng Ma
National Laboratory of Antennas and Microwave Technology, Xidian University, 710071, Xi’an, Shanxi, P.R. China
Yong-Chang Jiao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Deng, Z., Zhang, M. (2005). Improving Text Categorization Using the Importance of Words in Different Categories. In: Hao, Y., et al. Computational Intelligence and Security. CIS 2005. Lecture Notes in Computer Science(), vol 3801. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11596448_67

Download citation

DOI: https://doi.org/10.1007/11596448_67
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30818-8
Online ISBN: 978-3-540-31599-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics