Skip to main content

Context-Based Term Frequency Assessment for Text Classification

  • Conference paper
PRICAI 2008: Trends in Artificial Intelligence (PRICAI 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5351))

Included in the following conference series:

Abstract

Automatic text classification (TC) is a fundamental component for information processing and management. To properly classify a document d, it is essential to identify semantics of each term t in d, while the semantics heavily depends on contexts (neighboring terms) of t in d. In this paper, we present a technique CTFA (Context-based Term Frequency Assessment) that improves text classifiers by considering term contexts in test documents. Results of the term context recognition are used to re-assess term frequencies, and hence CTFA may easily work with various kinds of text classifiers that base their TC decisions on term frequencies. Moreover, CTFA is efficient, and neither huge memory nor domain-specific knowledge is required. Experimental Results show that CTFA may successfully enhance performances of Rocchio and SVM (Support Vector Machine) classifiers on Reuters and Newsgroups data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alvarez, C., Langlais, P., Nie, J.-Y.: Word Pairs in Language Modeling for Information Retrieval. In: Proceedings of RIAO (Recherche d’Information Assistée par Ordinateur), University of Avignon (Vaucluse), France, pp. 686–705 (2004)

    Google Scholar 

  2. Arampatzis, A., Beney, J., Koster, C.H.A., van der Weide, T.P.: Incrementality, Half-life, and Threshold Optimization for Adaptive Document Filtering. In: Proceedings of the 9th Text Retrieval Conference, Gaithersburg, Maryland, pp. 589–600 (2000)

    Google Scholar 

  3. Caropreso, M.F., Matwin, S., Sebastiani, F.: Statistical phrases in automated text categorization. Technical Report IEI-B4-07-2000, Istituto di Elaborazione dell’Informazione, Pisa, IT (2000)

    Google Scholar 

  4. Chakrabarti, S., Roy, S., Soundalgekar, M.V.: Fast and accurate text classification via multiple linear discriminant projections. The VLDB Journal (2003)

    Google Scholar 

  5. Cohen, W.W., Singer, Y.: Context-Sensitive Mining Methods for Text Categorization. In: Proceedings of the 19th annual international ACM SIGIR conference on research and development in information retrieval, Zurich, Switzerland, pp. 307–315 (1996)

    Google Scholar 

  6. Joachims, T.: Making Large-Scale SVM Learning Practical. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning. MIT-Press, Cambridge (1999)

    Google Scholar 

  7. Lewis, D.D., Schapire, R.E., Callan, P., Papka, R.: Training Algorithms for Linear Text Classifiers. In: Proceedings of the 19th annual international ACM SIGIR conference on research and development in information retrieval, Zurich, Switzerland, pp. 298–306 (1996)

    Google Scholar 

  8. Liu, R.-L.: Dynamic Category Profiling for Text Filtering and Classification. Information Processing & Management 43(1), 154–168 (2007)

    Article  Google Scholar 

  9. Riloff, E., Lehnert, W.: Information Extraction as a Basis for High-Precision Text Classification. ACM Transactions on Information Systems, 12(3) (1994)

    Google Scholar 

  10. Srikanth, M., Srihari, R.: Biterm Language Models for Document Retrieval. In: Proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrieval, Tampere, Finland (2002)

    Google Scholar 

  11. Wang, X., McCallum, A., Wei, X.: Topical N-grams: Phrase and Topic Discovery, with an Application to Information Retrieval. In: Proceedings of the IEEE 7th International Conference on Data Mining, Omaha NE, USA, pp. 697–702 (2007)

    Google Scholar 

  12. Yang, Y., Lin, X.: A Re-examination of Text Categorization Methods. In: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval, Berkeley, California, pp. 42–49 (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Liu, RL. (2008). Context-Based Term Frequency Assessment for Text Classification. In: Ho, TB., Zhou, ZH. (eds) PRICAI 2008: Trends in Artificial Intelligence. PRICAI 2008. Lecture Notes in Computer Science(), vol 5351. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89197-0_98

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-89197-0_98

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-89196-3

  • Online ISBN: 978-3-540-89197-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics