Abstract
Automatic text classification (TC) is a fundamental component for information processing and management. To properly classify a document d, it is essential to identify semantics of each term t in d, while the semantics heavily depends on contexts (neighboring terms) of t in d. In this paper, we present a technique CTFA (Context-based Term Frequency Assessment) that improves text classifiers by considering term contexts in test documents. Results of the term context recognition are used to re-assess term frequencies, and hence CTFA may easily work with various kinds of text classifiers that base their TC decisions on term frequencies. Moreover, CTFA is efficient, and neither huge memory nor domain-specific knowledge is required. Experimental Results show that CTFA may successfully enhance performances of Rocchio and SVM (Support Vector Machine) classifiers on Reuters and Newsgroups data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Alvarez, C., Langlais, P., Nie, J.-Y.: Word Pairs in Language Modeling for Information Retrieval. In: Proceedings of RIAO (Recherche d’Information Assistée par Ordinateur), University of Avignon (Vaucluse), France, pp. 686–705 (2004)
Arampatzis, A., Beney, J., Koster, C.H.A., van der Weide, T.P.: Incrementality, Half-life, and Threshold Optimization for Adaptive Document Filtering. In: Proceedings of the 9th Text Retrieval Conference, Gaithersburg, Maryland, pp. 589–600 (2000)
Caropreso, M.F., Matwin, S., Sebastiani, F.: Statistical phrases in automated text categorization. Technical Report IEI-B4-07-2000, Istituto di Elaborazione dell’Informazione, Pisa, IT (2000)
Chakrabarti, S., Roy, S., Soundalgekar, M.V.: Fast and accurate text classification via multiple linear discriminant projections. The VLDB Journal (2003)
Cohen, W.W., Singer, Y.: Context-Sensitive Mining Methods for Text Categorization. In: Proceedings of the 19th annual international ACM SIGIR conference on research and development in information retrieval, Zurich, Switzerland, pp. 307–315 (1996)
Joachims, T.: Making Large-Scale SVM Learning Practical. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning. MIT-Press, Cambridge (1999)
Lewis, D.D., Schapire, R.E., Callan, P., Papka, R.: Training Algorithms for Linear Text Classifiers. In: Proceedings of the 19th annual international ACM SIGIR conference on research and development in information retrieval, Zurich, Switzerland, pp. 298–306 (1996)
Liu, R.-L.: Dynamic Category Profiling for Text Filtering and Classification. Information Processing & Management 43(1), 154–168 (2007)
Riloff, E., Lehnert, W.: Information Extraction as a Basis for High-Precision Text Classification. ACM Transactions on Information Systems, 12(3) (1994)
Srikanth, M., Srihari, R.: Biterm Language Models for Document Retrieval. In: Proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrieval, Tampere, Finland (2002)
Wang, X., McCallum, A., Wei, X.: Topical N-grams: Phrase and Topic Discovery, with an Application to Information Retrieval. In: Proceedings of the IEEE 7th International Conference on Data Mining, Omaha NE, USA, pp. 697–702 (2007)
Yang, Y., Lin, X.: A Re-examination of Text Categorization Methods. In: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval, Berkeley, California, pp. 42–49 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Liu, RL. (2008). Context-Based Term Frequency Assessment for Text Classification. In: Ho, TB., Zhou, ZH. (eds) PRICAI 2008: Trends in Artificial Intelligence. PRICAI 2008. Lecture Notes in Computer Science(), vol 5351. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89197-0_98
Download citation
DOI: https://doi.org/10.1007/978-3-540-89197-0_98
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89196-3
Online ISBN: 978-3-540-89197-0
eBook Packages: Computer ScienceComputer Science (R0)