Skip to main content

A Term Weighting Approach for Text Categorization

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3689))

Abstract

It is common that representative words in a document are identified and discriminated by their statistical distribution of their frequency statistics. We assume that evaluating the confidence measure of terms through content-based document analysis leads to a better performance than the parametric assumptions of the standard frequency-based method. In this paper, we propose a new approach of term weighting method that replaces the frequency-based probabilistic methods. Experiments on Naïve Bayesian classifiers showed that our approach achieved an improvement compared to the frequency-based method on each point of the evaluation.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Sebastiani, F.: Machine Learning in Automated Text Categorization. ACM Computing Surveys 34(1), 1–47 (2002)

    Article  Google Scholar 

  2. Yang, Y., Zhang, J., Kisiel, B.: A Scalability Analysis of Classifiers in Text Categorization. In: SIGIR 2003, pp. 96–103 (2003)

    Google Scholar 

  3. Yang, Y., Liu, X.: A Re-examination of Text Categorization Methods. In: Proceedings of Int. Conference on Research and Development in Information Retrieval, pp. 42–49 (1999)

    Google Scholar 

  4. Bennett, P.: Using symmetric Distributions to Improve Text Classifier Probability Estimates. In: SIGIR 2003, pp. 111–118 (2003)

    Google Scholar 

  5. Yang, Y., Pedersen, J.P.: A Comparative Study on Feature Selection in Text Categorization. In: Fisher Jr., D.H. (ed.) Proceedings of the 14th Int. Conference on Machine Learning, pp. 412–420 (1997)

    Google Scholar 

  6. Lam, W., Lai, K.: A Meta-Learning Approach for Text Categorization. In: SIGIR 2001, pp. 303–309 (2001)

    Google Scholar 

  7. Robertson, S.: The Probability Ranking Principle in IR, pp. 281–286. Morgan Kaufmann Publishers, San Francisco (1997)

    Google Scholar 

  8. Bekkerman, R., El-Yaniv, R., Tisshby, N., Winter, Y.: On Feature Distributional Clustering for Text Categorization. In: SIGIR 2001, pp. 146–153 (2001)

    Google Scholar 

  9. Kawatani, T.: Topic Difference Factor Extraction between Two Document Sets and its Application to Text Categorization. In: SIGIR 2002, pp. 137–144 (2002)

    Google Scholar 

  10. Rijsbergen, C., Harper, D., Porter, M.: The Selection of Good Search Terms. Information Processing and Management 17, 77–91 (1981)

    Article  Google Scholar 

  11. Lai, Y., Wu, C.: Meaningful Term Extraction and Discriminative Term Selection in Text Categorization via Unknown-Word Methodology. ACM Transactions on Asian Languages Information Processing 1(1), 34–64 (2002)

    Article  Google Scholar 

  12. Yang, Y.: A Study on Thresholding Strategies for Text Categorization. In: Proceedings of SIGIR 2001, pp. 137–145 (2001)

    Google Scholar 

  13. Kang, S., Lee, H., Son, S., Hong, G., Moon, B.: Term Weighting Method by Postposition and Compound Noun Recognition. In: Proceedings of the 13th Conference on Korean Language Computing, pp. 196–198 (2001)

    Google Scholar 

  14. Ko, Y., Park, J., Seo, J.: Automatic Text Categorization using the Importance of Sentences. Journal of Korean Information Science Society: Software and Application, 417–423 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lee, KC., Kang, SS., Hahn, KS. (2005). A Term Weighting Approach for Text Categorization. In: Lee, G.G., Yamada, A., Meng, H., Myaeng, S.H. (eds) Information Retrieval Technology. AIRS 2005. Lecture Notes in Computer Science, vol 3689. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562382_66

Download citation

  • DOI: https://doi.org/10.1007/11562382_66

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29186-2

  • Online ISBN: 978-3-540-32001-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics