Skip to main content

Text Categorisation Using a Partial-Matching Strategy

  • Conference paper
  • First Online:
Computational Methods in Neural Modeling (IWANN 2003)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2686))

Included in the following conference series:

  • 994 Accesses

Abstract

In this paper a family of rule learners whose application is carried out according to a partial-matching criterion based on different purity measures is presented. The behavior of these rule learners is tested by solving a Text Categorisation problem. To illustrate the advantages of each learner, the MDL-based method of C4-5 is replaced by a pruning process whose performance relies on an estimation of the quality of the rules. Empirical results show that, in general, inducing partial-matching rules yields more compact rule sets without degrading performance measured in terms of microaveraged F1 which is one of the most common performance measure in Information Retrieval tasks. The experiments show that there are some purity measures which produces a number of rules significantly lesser than C4-5 meanwhile the performance measured with F1 is not degraded.

The research reported in this paper has been supported in part under MCyT and Feder grant TIC2001-3579

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. D. W. Aha. A Study of Instance-based Algorithms for Supervised Learning Tasks: Mathematical, Empirical, and Psychological Evaluations. PhD thesis, University of California at Irvine, 1990.

    Google Scholar 

  2. C. Apte, F. Damerau, and S. Weiss. Automated learning of decision rules for text categorization. Information Systems, 12(3):233–251, 1994.

    Google Scholar 

  3. L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Wadsworth International Group, Belmont, CA, 1984.

    Google Scholar 

  4. P. Clark and T. Niblett. The cn2 induction algorithm. Machine Learning, 3(4):261–283, 1989.

    Google Scholar 

  5. T. Dietterich, M. Kearns, and Y. Mansour. Applying the weak learning frame-work to understand and improve c4.5. In Proc. 13th International Conference on Machine Learning, pages 96–104. Morgan Kaufmann, 1996.

    Google Scholar 

  6. P. Domingos. Unifying instance-based and rule-based induction. Machine Learning, 24:141–168, 1996.

    Google Scholar 

  7. J. Fürnkranz and G. Widmer. Incremental reduced error pruning. In International Conference on Machine Learning, pages 70–77, 1994.

    Google Scholar 

  8. D. D. Lewis and M. Ringuette. A comparison of two learning algorithms for text categorization. In Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval, pages 81–93, Las Vegas, US, 1994.

    Google Scholar 

  9. O. Luaces, J. Alonso, E. de la Cal, J. Ranilla, and A. Bahamonde. Machine learning usefulness relies on accuracy and self-maintenance. In Springer-Verlag, editor, Lecture Notes in Artificial Intelligence. Proc. of the 11 th IEA & AIE, volume 1416, pages 448–457, 1998.

    Google Scholar 

  10. E. Montañés, J. Fernández, I. Díaz, E. F. Combarro, and J. Ranilla. Text categorisation with support vector machines and feature reduction. In Proceedings of the International Conference on Computational Intelligence for Modelling, Control and Automation CIMCA2003, 2003.

    Google Scholar 

  11. J. R. Quinlan. Constructing decision tree in c4.5. In Programs of Machine Learning, pages 17–26. Morgan Kaufman, 1993.

    Google Scholar 

  12. J. Ranilla and A. Bahamonde. Fan: Finding accurate inductions. International Journal of Human Computer Studies, 56(4):445–474, 2002.

    Article  Google Scholar 

  13. J. Ranilla, O. Luaces, and A. Bahamonde. A heuristic for learning decision trees and pruning them into classification rules. AICom (Artificial Intelligence Communication), 16(2):in press, 2003.

    Google Scholar 

  14. Reuters. Reuters collection. http://www.research.attp.com/lewis/reuters21578.html.

  15. G. Salton and M. J. McGill. An introduction to modern information retrieval. McGraw-Hill, 1983.

    Google Scholar 

  16. F. Sebastiani. Machine learning in automated text categorisation. ACM Computing Survey, 34(1), 2002.

    Google Scholar 

  17. M. R. Spiegel. Estadística. McGraw-Hill, 1970.

    Google Scholar 

  18. L. Todorovski, P. Flach, and N. Lavrač. Predictive performance of weighted relative accuracy. In 4th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD2000), pages 255–264. Springer-Verlag, 2000.

    Google Scholar 

  19. C. J. Van-Rijsbergen. Information retrieval. Butterworths, 2 edition, 1979.

    Google Scholar 

  20. D. R. Wilson and T. R. Martínez. Improved heterogeneous distance functions. Journal of Artificial Intelligence Research, 6(1):1–34, 1997.

    MATH  MathSciNet  Google Scholar 

  21. T. Yang and J. P. Pedersen. Feature selection in statistical learning of text categorization. In Proceedings of the 14th Int. Conf. on Machine Learning, pages 412–420, 1997.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ranilla, J., Díaz, I., Fernández, J. (2003). Text Categorisation Using a Partial-Matching Strategy. In: Mira, J., Álvarez, J.R. (eds) Computational Methods in Neural Modeling. IWANN 2003. Lecture Notes in Computer Science, vol 2686. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44868-3_34

Download citation

  • DOI: https://doi.org/10.1007/3-540-44868-3_34

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40210-7

  • Online ISBN: 978-3-540-44868-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics