Abstract
Nowadays media companies have difficulties for managing large amounts of news from agencies and self-made articles. Journalists and documentalists must face categorization tasks every day. There is also an additional trouble due to the usual large size of the list of words in a thesaurus, the typical tool used to tag news in the media.
In this paper, we present a new method to tackle the problem of information extraction over a set of texts where the annotation must be composed by thesaurus elements. The method consists of applying lemmatization, obtaining keywords, and finally using a combination of Support Vector Machines (SVM), ontologies and heuristics to deduce appropriate tags for the annotation. We have evaluated it with a real set of changing news and we compared our tagging with the annotation performed by a real documentation department, obtaining very good results.
This research work has been supported by the CICYT project TIN2010-21387-C02-02.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Gilchrist, A.: Thesauri, taxonomies and ontologies: an etymological note. Journal of Documentation 59(1), 7–18 (2003)
Garrido, A.L., Gomez, O., Ilarri, S., Mena, E.: NASS: news annotation semantic system. In: Proceedings of ICTAI 2011, International Conference on Tools with Artificial Intelligence, pp. 904–905. IEEE (2011)
Smeaton, A.F.: Using NLP or NLP resources for information retrieval tasks. Natural Language Information Retrieval. Kluwer Academic Publishers (1997)
Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)
Wimalasuriya, D.C., Dou, D.: Ontology-Based Information Extraction: an introduction and a survey of current approaches. Journal of Information Science 36(3), 306–323 (2010)
Carreras, X., Chao, I., Padró, L., Padró, M.: Freeling: an open-source suite of language analyzers. In: Proceedings of the 4th International Conference on Language Resources and Evaluation, pp. 239–242. European Language Resources Association (2004)
Sekine, S., Ranchhod, E.: Named entities: recognition, classification and use. John Benjamins (2009)
Leopold, E., Kindermann, J.: Text categorization with support vector machines. How to Represent Texts in Input Space? Machine Learning 46, 423–444 (2002)
Fernandez-Lopez, M., Corcho, O.: Ontological engineering. Springer (2004)
Cortes, C., Vapnik, V.N.: Support-vector networks. Machine Learning 20(3), 273–297 (1995)
Vogrincic, S., Bosnic, Z.: Ontology-based multi-label classification of economic articles. Computer Science and Information Systems 8(1), 101–119 (2011), ComSIS Consortium
Wu, X., Xie, F., Wu, G., Ding, W.: Personalized news filtering and summarization on the web. In: Proceedings of ICTAI 2011, International Conference on Tools with Artificial Intelligence, pp. 414–421. IEEE (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Garrido, A.L., Gómez, O., Ilarri, S., Mena, E. (2012). An Experience Developing a Semantic Annotation System in a Media Group. In: Bouma, G., Ittoo, A., Métais, E., Wortmann, H. (eds) Natural Language Processing and Information Systems. NLDB 2012. Lecture Notes in Computer Science, vol 7337. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31178-9_43
Download citation
DOI: https://doi.org/10.1007/978-3-642-31178-9_43
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31177-2
Online ISBN: 978-3-642-31178-9
eBook Packages: Computer ScienceComputer Science (R0)