Abstract
Nowadays media companies have difficulties for managing large amounts of news from agencies and self-made articles. Journalists and documentalists must face categorization tasks every day. There is also an additional trouble due to the usual large size of the list of words in a thesaurus, the typical tool used to tag news in the media.
In this paper, we present a new method to tackle the problem of information extraction over a set of texts where the annotation must be composed by thesaurus elements. The method consists of applying lemmatization, obtaining keywords, and finally using a combination of Support Vector Machines (SVM), ontologies and heuristics to deduce appropriate tags for the annotation. We have evaluated it with a real set of changing news and we compared our tagging with the annotation performed by a real documentation department, obtaining very good results.
This research work has been supported by the CICYT project TIN2010-21387-C02-02.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Gilchrist, A.: Thesauri, taxonomies and ontologies: an etymological note. Journal of Documentation 59(1), 7–18 (2003)
Garrido, A.L., Gomez, O., Ilarri, S., Mena, E.: NASS: news annotation semantic system. In: Proceedings of ICTAI 2011, International Conference on Tools with Artificial Intelligence, pp. 904–905. IEEE (2011)
Smeaton, A.F.: Using NLP or NLP resources for information retrieval tasks. Natural Language Information Retrieval. Kluwer Academic Publishers (1997)
Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)
Wimalasuriya, D.C., Dou, D.: Ontology-Based Information Extraction: an introduction and a survey of current approaches. Journal of Information Science 36(3), 306–323 (2010)
Carreras, X., Chao, I., Padró, L., Padró, M.: Freeling: an open-source suite of language analyzers. In: Proceedings of the 4th International Conference on Language Resources and Evaluation, pp. 239–242. European Language Resources Association (2004)
Sekine, S., Ranchhod, E.: Named entities: recognition, classification and use. John Benjamins (2009)
Leopold, E., Kindermann, J.: Text categorization with support vector machines. How to Represent Texts in Input Space? Machine Learning 46, 423–444 (2002)
Fernandez-Lopez, M., Corcho, O.: Ontological engineering. Springer (2004)
Cortes, C., Vapnik, V.N.: Support-vector networks. Machine Learning 20(3), 273–297 (1995)
Vogrincic, S., Bosnic, Z.: Ontology-based multi-label classification of economic articles. Computer Science and Information Systems 8(1), 101–119 (2011), ComSIS Consortium
Wu, X., Xie, F., Wu, G., Ding, W.: Personalized news filtering and summarization on the web. In: Proceedings of ICTAI 2011, International Conference on Tools with Artificial Intelligence, pp. 414–421. IEEE (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Garrido, A.L., Gómez, O., Ilarri, S., Mena, E. (2012). An Experience Developing a Semantic Annotation System in a Media Group. In: Bouma, G., Ittoo, A., Métais, E., Wortmann, H. (eds) Natural Language Processing and Information Systems. NLDB 2012. Lecture Notes in Computer Science, vol 7337. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31178-9_43
Download citation
DOI: https://doi.org/10.1007/978-3-642-31178-9_43
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31177-2
Online ISBN: 978-3-642-31178-9
eBook Packages: Computer ScienceComputer Science (R0)