Skip to main content

Dynamic Category Profiling for Text Filtering and Classification

  • Conference paper
Advances in Knowledge Discovery and Data Mining (PAKDD 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3918))

Included in the following conference series:

Abstract

Information is often represented in text form and classified into categories for efficient browsing, retrieval, and dissemination. Unfortunately, automatic classifiers may conduct many misclassifications. One of the reasons is that the documents for training the classifiers are mainly from the categories, leading the classifiers to derive category profiles for distinguishing each category from others, rather than measuring the extent to which a document’s content overlaps that of a category. To tackle the problem, we present a technique DP4FC to help various classifiers to improve the mining of category profiles. Upon receiving a document, DP4FC helps to create dynamic category profiles with respect to the document, and accordingly helps to make proper filtering and classification decisions. Theoretical analysis and empirical results show that DP4FC may make a classifier’s performance both better and more stable.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Arampatzis, A., Beney, J., Koster, C.H.A., van der Weide, T.P.: Incrementality, Half-life, and Threshold Optimization for Adaptive Document Filtering. In: Proceedings of the 9th Text Retrieval Conference (2000), Gaithersburg, Maryland, pp. 589–600 (2000)

    Google Scholar 

  2. Cohen, W.W., Singer, Y.: Context-Sensitive Mining Methods for Text Categorization. In: Proceedings of the 19th annual international ACM SIGIR conference on research and development in information retrieval, Zurich, Switzerland (1996)

    Google Scholar 

  3. Iwayama, M.: Relevance Feedback with a Small Number of Relevance Judgments: Incremental Relevance Feedback vs. Document Clustering. In: Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval, Athens, Greece, pp. 10–16 (2000)

    Google Scholar 

  4. Iyengar, V.S., Apte, C., Zhang, T.: Active Learning using Adaptive Resampling. In: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, Massachusetts, pp. 91–98 (2000)

    Google Scholar 

  5. Lewis, D.D.: Reuters-21578 text categorization test collection Distribution 1.0 README file (v 1.2) (1997), http://www.daviddlewis.com/resources/testcollections/reuters21578

  6. Liu, R.-L., Lin, W.-J.: Incremental Mining of Information Interest for Personalized Web Scanning. Information Systems 30(8), 630–648 (2005)

    Article  Google Scholar 

  7. Liu, R.-L., Lin, W.-J.: Adaptive Sampling for Thresholding in Document Filtering and Classification. Information Processing and Management 41(4), 745–758 (2004)

    Article  Google Scholar 

  8. Mladenić, D., Brank, J., Grobelnik, M., Milic-Frayling, N.: Feature Selection using Linear Classifier Weights: Interaction with Classification Models. In: Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval, Sheffield, South Yorkshire, UK, pp. 234–241 (2004)

    Google Scholar 

  9. Mladenić, D., Grobelnik, M.: Feature Selection for Classification based on Text Hierarchy. In: Proc. of the Conference on Automated Learning and Discovery (1998)

    Google Scholar 

  10. Schapire, R.E., Singer, Y., Singhal, A.: Boosting and Rocchio Applied to Text Filtering. In: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval, Melbourne, Australia, pp. 215–223 (1998)

    Google Scholar 

  11. Singhal, A., Mitra, M., Buckley, C.: Learning Routing Queries in a Query Zone. In: Proceedings of the 20th annual international ACM SIGIR conference on research and development in information retrieval, Philadelphia, Pennsylvania, pp. 25–32 (1997)

    Google Scholar 

  12. Wu, H., Phang, T.H., Liu, B., Li, X.: A Refinement Approach to Handling Model Misfit in Text Categorization. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, pp. 207–216 (2002)

    Google Scholar 

  13. Yang, Y.: A Study of Thresholding Strategies for Text Categorization. In: Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval, New Orleans, Louisiana, pp. 137–145 (2001)

    Google Scholar 

  14. Yang, Y., Pedersen, J.O.: A Comparative Study on Feature Selection in Text Categorization. In: Proceedings of the 14th International Conference on Machine Learning, Nashville, Tennessee, pp. 412–420 (1997)

    Google Scholar 

  15. Zhang, Y., Callan, J.: Maximum Likelihood Estimation for Filtering Thresholds. In: Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval, New Orleans, Louisiana, pp. 294–302 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Liu, RL. (2006). Dynamic Category Profiling for Text Filtering and Classification. In: Ng, WK., Kitsuregawa, M., Li, J., Chang, K. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2006. Lecture Notes in Computer Science(), vol 3918. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11731139_31

Download citation

  • DOI: https://doi.org/10.1007/11731139_31

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-33206-0

  • Online ISBN: 978-3-540-33207-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics