Text Categorisation Using a Partial-Matching Strategy

Ranilla, J.; Díaz, I.; Fernández, J.

doi:10.1007/3-540-44868-3_34

J. Ranilla⁵,
I. Díaz⁵ &
J. Fernández⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2686))

Included in the following conference series:

International Work-Conference on Artificial Neural Networks

994 Accesses

Abstract

In this paper a family of rule learners whose application is carried out according to a partial-matching criterion based on different purity measures is presented. The behavior of these rule learners is tested by solving a Text Categorisation problem. To illustrate the advantages of each learner, the MDL-based method of C4-5 is replaced by a pruning process whose performance relies on an estimation of the quality of the rules. Empirical results show that, in general, inducing partial-matching rules yields more compact rule sets without degrading performance measured in terms of microaveraged F1 which is one of the most common performance measure in Information Retrieval tasks. The experiments show that there are some purity measures which produces a number of rules significantly lesser than C4-5 meanwhile the performance measured with F1 is not degraded.

The research reported in this paper has been supported in part under MCyT and Feder grant TIC2001-3579

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

D. W. Aha. A Study of Instance-based Algorithms for Supervised Learning Tasks: Mathematical, Empirical, and Psychological Evaluations. PhD thesis, University of California at Irvine, 1990.
Google Scholar
C. Apte, F. Damerau, and S. Weiss. Automated learning of decision rules for text categorization. Information Systems, 12(3):233–251, 1994.
Google Scholar
L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Wadsworth International Group, Belmont, CA, 1984.
Google Scholar
P. Clark and T. Niblett. The cn2 induction algorithm. Machine Learning, 3(4):261–283, 1989.
Google Scholar
T. Dietterich, M. Kearns, and Y. Mansour. Applying the weak learning frame-work to understand and improve c4.5. In Proc. 13th International Conference on Machine Learning, pages 96–104. Morgan Kaufmann, 1996.
Google Scholar
P. Domingos. Unifying instance-based and rule-based induction. Machine Learning, 24:141–168, 1996.
Google Scholar
J. Fürnkranz and G. Widmer. Incremental reduced error pruning. In International Conference on Machine Learning, pages 70–77, 1994.
Google Scholar
D. D. Lewis and M. Ringuette. A comparison of two learning algorithms for text categorization. In Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval, pages 81–93, Las Vegas, US, 1994.
Google Scholar
O. Luaces, J. Alonso, E. de la Cal, J. Ranilla, and A. Bahamonde. Machine learning usefulness relies on accuracy and self-maintenance. In Springer-Verlag, editor, Lecture Notes in Artificial Intelligence. Proc. of the 11 ^th IEA & AIE, volume 1416, pages 448–457, 1998.
Google Scholar
E. Montañés, J. Fernández, I. Díaz, E. F. Combarro, and J. Ranilla. Text categorisation with support vector machines and feature reduction. In Proceedings of the International Conference on Computational Intelligence for Modelling, Control and Automation CIMCA2003, 2003.
Google Scholar
J. R. Quinlan. Constructing decision tree in c4.5. In Programs of Machine Learning, pages 17–26. Morgan Kaufman, 1993.
Google Scholar
J. Ranilla and A. Bahamonde. Fan: Finding accurate inductions. International Journal of Human Computer Studies, 56(4):445–474, 2002.
Article Google Scholar
J. Ranilla, O. Luaces, and A. Bahamonde. A heuristic for learning decision trees and pruning them into classification rules. AICom (Artificial Intelligence Communication), 16(2):in press, 2003.
Google Scholar
Reuters. Reuters collection. http://www.research.attp.com/lewis/reuters21578.html.
G. Salton and M. J. McGill. An introduction to modern information retrieval. McGraw-Hill, 1983.
Google Scholar
F. Sebastiani. Machine learning in automated text categorisation. ACM Computing Survey, 34(1), 2002.
Google Scholar
M. R. Spiegel. Estadística. McGraw-Hill, 1970.
Google Scholar
L. Todorovski, P. Flach, and N. Lavrač. Predictive performance of weighted relative accuracy. In 4th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD2000), pages 255–264. Springer-Verlag, 2000.
Google Scholar
C. J. Van-Rijsbergen. Information retrieval. Butterworths, 2 edition, 1979.
Google Scholar
D. R. Wilson and T. R. Martínez. Improved heterogeneous distance functions. Journal of Artificial Intelligence Research, 6(1):1–34, 1997.
MATH MathSciNet Google Scholar
T. Yang and J. P. Pedersen. Feature selection in statistical learning of text categorization. In Proceedings of the 14th Int. Conf. on Machine Learning, pages 412–420, 1997.
Google Scholar

Download references

Author information

Authors and Affiliations

Artificial Intelligence Center, Campus de Viesques, E-33271, Gijón, Spain
J. Ranilla, I. Díaz & J. Fernández

Authors

J. Ranilla
View author publications
You can also search for this author in PubMed Google Scholar
I. Díaz
View author publications
You can also search for this author in PubMed Google Scholar
J. Fernández
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

E.T.S. de Ingeniería Informática Departamento de Inteligencia Artificial, Universidad Nacional de Educación a Distancia, Juan del Rosal, 16, 28040, Madrid, Spain
José Mira & José R. Álvarez &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ranilla, J., Díaz, I., Fernández, J. (2003). Text Categorisation Using a Partial-Matching Strategy. In: Mira, J., Álvarez, J.R. (eds) Computational Methods in Neural Modeling. IWANN 2003. Lecture Notes in Computer Science, vol 2686. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44868-3_34

Download citation

DOI: https://doi.org/10.1007/3-540-44868-3_34
Published: 18 June 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40210-7
Online ISBN: 978-3-540-44868-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics