Abstract
In this paper a family of rule learners whose application is carried out according to a partial-matching criterion based on different purity measures is presented. The behavior of these rule learners is tested by solving a Text Categorisation problem. To illustrate the advantages of each learner, the MDL-based method of C4-5 is replaced by a pruning process whose performance relies on an estimation of the quality of the rules. Empirical results show that, in general, inducing partial-matching rules yields more compact rule sets without degrading performance measured in terms of microaveraged F1 which is one of the most common performance measure in Information Retrieval tasks. The experiments show that there are some purity measures which produces a number of rules significantly lesser than C4-5 meanwhile the performance measured with F1 is not degraded.
The research reported in this paper has been supported in part under MCyT and Feder grant TIC2001-3579
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
D. W. Aha. A Study of Instance-based Algorithms for Supervised Learning Tasks: Mathematical, Empirical, and Psychological Evaluations. PhD thesis, University of California at Irvine, 1990.
C. Apte, F. Damerau, and S. Weiss. Automated learning of decision rules for text categorization. Information Systems, 12(3):233–251, 1994.
L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Wadsworth International Group, Belmont, CA, 1984.
P. Clark and T. Niblett. The cn2 induction algorithm. Machine Learning, 3(4):261–283, 1989.
T. Dietterich, M. Kearns, and Y. Mansour. Applying the weak learning frame-work to understand and improve c4.5. In Proc. 13th International Conference on Machine Learning, pages 96–104. Morgan Kaufmann, 1996.
P. Domingos. Unifying instance-based and rule-based induction. Machine Learning, 24:141–168, 1996.
J. Fürnkranz and G. Widmer. Incremental reduced error pruning. In International Conference on Machine Learning, pages 70–77, 1994.
D. D. Lewis and M. Ringuette. A comparison of two learning algorithms for text categorization. In Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval, pages 81–93, Las Vegas, US, 1994.
O. Luaces, J. Alonso, E. de la Cal, J. Ranilla, and A. Bahamonde. Machine learning usefulness relies on accuracy and self-maintenance. In Springer-Verlag, editor, Lecture Notes in Artificial Intelligence. Proc. of the 11 th IEA & AIE, volume 1416, pages 448–457, 1998.
E. Montañés, J. Fernández, I. Díaz, E. F. Combarro, and J. Ranilla. Text categorisation with support vector machines and feature reduction. In Proceedings of the International Conference on Computational Intelligence for Modelling, Control and Automation CIMCA2003, 2003.
J. R. Quinlan. Constructing decision tree in c4.5. In Programs of Machine Learning, pages 17–26. Morgan Kaufman, 1993.
J. Ranilla and A. Bahamonde. Fan: Finding accurate inductions. International Journal of Human Computer Studies, 56(4):445–474, 2002.
J. Ranilla, O. Luaces, and A. Bahamonde. A heuristic for learning decision trees and pruning them into classification rules. AICom (Artificial Intelligence Communication), 16(2):in press, 2003.
Reuters. Reuters collection. http://www.research.attp.com/lewis/reuters21578.html.
G. Salton and M. J. McGill. An introduction to modern information retrieval. McGraw-Hill, 1983.
F. Sebastiani. Machine learning in automated text categorisation. ACM Computing Survey, 34(1), 2002.
M. R. Spiegel. Estadística. McGraw-Hill, 1970.
L. Todorovski, P. Flach, and N. Lavrač. Predictive performance of weighted relative accuracy. In 4th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD2000), pages 255–264. Springer-Verlag, 2000.
C. J. Van-Rijsbergen. Information retrieval. Butterworths, 2 edition, 1979.
D. R. Wilson and T. R. Martínez. Improved heterogeneous distance functions. Journal of Artificial Intelligence Research, 6(1):1–34, 1997.
T. Yang and J. P. Pedersen. Feature selection in statistical learning of text categorization. In Proceedings of the 14th Int. Conf. on Machine Learning, pages 412–420, 1997.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ranilla, J., Díaz, I., Fernández, J. (2003). Text Categorisation Using a Partial-Matching Strategy. In: Mira, J., Álvarez, J.R. (eds) Computational Methods in Neural Modeling. IWANN 2003. Lecture Notes in Computer Science, vol 2686. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44868-3_34
Download citation
DOI: https://doi.org/10.1007/3-540-44868-3_34
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40210-7
Online ISBN: 978-3-540-44868-6
eBook Packages: Springer Book Archive