Context-Based Term Frequency Assessment for Text Classification

Liu, Rey-Long

doi:10.1007/978-3-540-89197-0_98

Rey-Long Liu³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5351))

Included in the following conference series:

Pacific Rim International Conference on Artificial Intelligence

1007 Accesses
1 Citations

Abstract

Automatic text classification (TC) is a fundamental component for information processing and management. To properly classify a document d, it is essential to identify semantics of each term t in d, while the semantics heavily depends on contexts (neighboring terms) of t in d. In this paper, we present a technique CTFA (Context-based Term Frequency Assessment) that improves text classifiers by considering term contexts in test documents. Results of the term context recognition are used to re-assess term frequencies, and hence CTFA may easily work with various kinds of text classifiers that base their TC decisions on term frequencies. Moreover, CTFA is efficient, and neither huge memory nor domain-specific knowledge is required. Experimental Results show that CTFA may successfully enhance performances of Rocchio and SVM (Support Vector Machine) classifiers on Reuters and Newsgroups data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Alvarez, C., Langlais, P., Nie, J.-Y.: Word Pairs in Language Modeling for Information Retrieval. In: Proceedings of RIAO (Recherche d’Information Assistée par Ordinateur), University of Avignon (Vaucluse), France, pp. 686–705 (2004)
Google Scholar
Arampatzis, A., Beney, J., Koster, C.H.A., van der Weide, T.P.: Incrementality, Half-life, and Threshold Optimization for Adaptive Document Filtering. In: Proceedings of the 9th Text Retrieval Conference, Gaithersburg, Maryland, pp. 589–600 (2000)
Google Scholar
Caropreso, M.F., Matwin, S., Sebastiani, F.: Statistical phrases in automated text categorization. Technical Report IEI-B4-07-2000, Istituto di Elaborazione dell’Informazione, Pisa, IT (2000)
Google Scholar
Chakrabarti, S., Roy, S., Soundalgekar, M.V.: Fast and accurate text classification via multiple linear discriminant projections. The VLDB Journal (2003)
Google Scholar
Cohen, W.W., Singer, Y.: Context-Sensitive Mining Methods for Text Categorization. In: Proceedings of the 19th annual international ACM SIGIR conference on research and development in information retrieval, Zurich, Switzerland, pp. 307–315 (1996)
Google Scholar
Joachims, T.: Making Large-Scale SVM Learning Practical. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning. MIT-Press, Cambridge (1999)
Google Scholar
Lewis, D.D., Schapire, R.E., Callan, P., Papka, R.: Training Algorithms for Linear Text Classifiers. In: Proceedings of the 19th annual international ACM SIGIR conference on research and development in information retrieval, Zurich, Switzerland, pp. 298–306 (1996)
Google Scholar
Liu, R.-L.: Dynamic Category Profiling for Text Filtering and Classification. Information Processing & Management 43(1), 154–168 (2007)
Article Google Scholar
Riloff, E., Lehnert, W.: Information Extraction as a Basis for High-Precision Text Classification. ACM Transactions on Information Systems, 12(3) (1994)
Google Scholar
Srikanth, M., Srihari, R.: Biterm Language Models for Document Retrieval. In: Proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrieval, Tampere, Finland (2002)
Google Scholar
Wang, X., McCallum, A., Wei, X.: Topical N-grams: Phrase and Topic Discovery, with an Application to Information Retrieval. In: Proceedings of the IEEE 7th International Conference on Data Mining, Omaha NE, USA, pp. 697–702 (2007)
Google Scholar
Yang, Y., Lin, X.: A Re-examination of Text Categorization Methods. In: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval, Berkeley, California, pp. 42–49 (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Medical Informatics, Tzu Chi University, Hualien, Taiwan, R.O.C.
Rey-Long Liu

Authors

Rey-Long Liu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Japan Advanced Institute of Science and Technology, Asahidai 1-1, 923-12292, Nomi, Japan
Tu-Bao Ho
Department of Computer Science & Technology, Nanjing University, 22 Hankou Road, 210093, China
Zhi-Hua Zhou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, RL. (2008). Context-Based Term Frequency Assessment for Text Classification. In: Ho, TB., Zhou, ZH. (eds) PRICAI 2008: Trends in Artificial Intelligence. PRICAI 2008. Lecture Notes in Computer Science(), vol 5351. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89197-0_98

Download citation

DOI: https://doi.org/10.1007/978-3-540-89197-0_98
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89196-3
Online ISBN: 978-3-540-89197-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics