Learning to Tag Text from Rules and Examples

Diligenti, Michelangelo; Gori, Marco; Maggini, Marco

doi:10.1007/978-3-642-23954-0_7

Learning to Tag Text from Rules and Examples

Michelangelo Diligenti¹⁹,
Marco Gori¹⁹ &
Marco Maggini¹⁹

Conference paper

953 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6934))

Abstract

Tagging has become a popular way to improve the access to resources, especially in social networks and folksonomies. Most of the resource sharing tools allow a manual labeling of the available items by the community members. However, the manual approach can fail to provide a consistent tagging especially when the dimension of the vocabulary of the tags increases and, consequently, the users do not comply to a shared semantic knowledge. Hence, automatic tagging can provide an effective way to complete the manual added tags, especially for dynamic or very large collections of documents like the Web. However, when an automatic text tagger is trained over the tags inserted by the users, it may inherit the inconsistencies of the training data. In this paper, we propose a novel approach where a set of text categorizers, each associated to a tag in the vocabulary, are trained both from examples and a higher level abstract representation consisting of FOL clauses that describe semantic rules constraining the use of the corresponding tags. The FOL clauses are compiled into a set of equivalent continuous constraints, and the integration between logic and learning is implemented in a multi-task learning scheme. In particular, we exploit the kernel machine mathematical apparatus casting the problem as primal optimization of a function composed of the loss on the supervised examples, the regularization term, and a penalty term deriving from forcing the constraints resulting from the conversion of the logic knowledge. The experimental results show that the proposed approach provides a significant accuracy improvement on the tagging of bibtex entries.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bengio, S., Weston, J., Grangier, D.: Label embedding trees for large multi-class tasks. Advances in Neural Information Processing Systems 23, 163–171 (2010)
Google Scholar
Caponnetto, A., Micchelli, C., Pontil, M., Ying, Y.: Universal Kernels for Multi-Task Learning. Journal of Machine Learning Research (2008)
Google Scholar
Diligenti, M., Gori, M., Maggini, M., Rigutini, L.: Multitask Kernel-based Learning with Logic Constraints. In: Proceedings of the 19th European Conference on Artificial Intelligence, pp. 433–438. IOS Press, Amsterdam (2010)
Google Scholar
Katakis, I., Tsoumakas, G., Vlahavas, I.: Multilabel text classification for automated tag suggestion. ECML PKDD Discovery Challenge 75 (2008)
Google Scholar
Klement, E.P., Mesiar, R., Pap, E.: Triangular Norms. Kluwer Academic Publishers, Dordrecht (2000)
Book MATH Google Scholar
Laclavik, M., Seleng, M., Gatial, E., Balogh, Z., Hluchy, L.: Ontology based text annotation. In: Proceedings of the 18th International Conference on Information Modelling and Knowledge Bases, pp. 311–315. IOS Press, Amsterdam (2007)
Google Scholar
Liu, D., Hua, X., Yang, L., Wang, M., Zhang, H.: Tag ranking. In: Proceedings of the 18th International Conference on World Wide Web, pp. 351–360. ACM, New York (2009)
Chapter Google Scholar
Matusiak, K.: Towards user-centered indexing in digital image collections. OCLC Systems & Services 22(4), 283–298 (2006)
Article Google Scholar
Peters, S., Denoyer, L., Gallinari, P.: Iterative Annotation of Multi-relational Social Networks. In: Proceedings of the International Conference on Advances in Social Networks Analysis and Mining, pp. 96–103. IEEE, Los Alamitos (2010)
Google Scholar
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys (CSUR) 34(1), 1–47 (2002)
Article MathSciNet Google Scholar
Weinberger, K., Saul, L.: Distance metric learning for large margin nearest neighbor classification. The Journal of Machine Learning Research 10, 207–244 (2009)
MATH Google Scholar
Zavitsanos, E., Tsatsaronis, G., Varlamis, I., Paliouras, G.: Scalable Semantic Annotation of Text Using Lexical and Web Resources. Artificial Intelligence: Theories, Models and Applications, 287–296 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Dipartimento di Ingegneria dell’Informazione, Università di Siena, Italy
Michelangelo Diligenti, Marco Gori & Marco Maggini

Authors

Michelangelo Diligenti
View author publications
You can also search for this author in PubMed Google Scholar
Marco Gori
View author publications
You can also search for this author in PubMed Google Scholar
Marco Maggini
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Chemical, Management, Computer, and Mechanical Engineering (DICGIM), University of Palermo, Viale delle Scienze, Edificio 6, 90128, Palermo, Italy
Roberto Pirrone & Filippo Sorbello &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Diligenti, M., Gori, M., Maggini, M. (2011). Learning to Tag Text from Rules and Examples. In: Pirrone, R., Sorbello, F. (eds) AI*IA 2011: Artificial Intelligence Around Man and Beyond. AI*IA 2011. Lecture Notes in Computer Science(), vol 6934. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23954-0_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-23954-0_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23953-3
Online ISBN: 978-3-642-23954-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics