An Experience Developing a Semantic Annotation System in a Media Group

Garrido, Angel L.; Gómez, Oscar; Ilarri, Sergio; Mena, Eduardo

doi:10.1007/978-3-642-31178-9_43

Angel L. Garrido¹⁹,
Oscar Gómez¹⁹,
Sergio Ilarri²⁰ &
…
Eduardo Mena²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7337))

Included in the following conference series:

International Conference on Application of Natural Language to Information Systems

2280 Accesses
13 Citations

Abstract

Nowadays media companies have difficulties for managing large amounts of news from agencies and self-made articles. Journalists and documentalists must face categorization tasks every day. There is also an additional trouble due to the usual large size of the list of words in a thesaurus, the typical tool used to tag news in the media.

In this paper, we present a new method to tackle the problem of information extraction over a set of texts where the annotation must be composed by thesaurus elements. The method consists of applying lemmatization, obtaining keywords, and finally using a combination of Support Vector Machines (SVM), ontologies and heuristics to deduce appropriate tags for the annotation. We have evaluated it with a real set of changing news and we compared our tagging with the annotation performed by a real documentation department, obtaining very good results.

This research work has been supported by the CICYT project TIN2010-21387-C02-02.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Enriching news events with meta-knowledge information

Article Open access 13 February 2016

Semantic Annotating of Text Documents: Basic Concepts and Taxonomic Approach

Article 01 May 2018

Slovene Multi-word Units: Identification, Categorization, and Representation

References

Gilchrist, A.: Thesauri, taxonomies and ontologies: an etymological note. Journal of Documentation 59(1), 7–18 (2003)
Article Google Scholar
Garrido, A.L., Gomez, O., Ilarri, S., Mena, E.: NASS: news annotation semantic system. In: Proceedings of ICTAI 2011, International Conference on Tools with Artificial Intelligence, pp. 904–905. IEEE (2011)
Google Scholar
Smeaton, A.F.: Using NLP or NLP resources for information retrieval tasks. Natural Language Information Retrieval. Kluwer Academic Publishers (1997)
Google Scholar
Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)
Chapter Google Scholar
Wimalasuriya, D.C., Dou, D.: Ontology-Based Information Extraction: an introduction and a survey of current approaches. Journal of Information Science 36(3), 306–323 (2010)
Article Google Scholar
Carreras, X., Chao, I., Padró, L., Padró, M.: Freeling: an open-source suite of language analyzers. In: Proceedings of the 4th International Conference on Language Resources and Evaluation, pp. 239–242. European Language Resources Association (2004)
Google Scholar
Sekine, S., Ranchhod, E.: Named entities: recognition, classification and use. John Benjamins (2009)
Google Scholar
Leopold, E., Kindermann, J.: Text categorization with support vector machines. How to Represent Texts in Input Space? Machine Learning 46, 423–444 (2002)
MATH Google Scholar
Fernandez-Lopez, M., Corcho, O.: Ontological engineering. Springer (2004)
Google Scholar
Cortes, C., Vapnik, V.N.: Support-vector networks. Machine Learning 20(3), 273–297 (1995)
MATH Google Scholar
Vogrincic, S., Bosnic, Z.: Ontology-based multi-label classification of economic articles. Computer Science and Information Systems 8(1), 101–119 (2011), ComSIS Consortium
Google Scholar
Wu, X., Xie, F., Wu, G., Ding, W.: Personalized news filtering and summarization on the web. In: Proceedings of ICTAI 2011, International Conference on Tools with Artificial Intelligence, pp. 414–421. IEEE (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Grupo Heraldo - Grupo La Información, Zaragoza, Pamplona, Spain
Angel L. Garrido & Oscar Gómez
IIS Department, University of Zaragoza, Zaragoza, Spain
Sergio Ilarri & Eduardo Mena

Authors

Angel L. Garrido
View author publications
You can also search for this author in PubMed Google Scholar
Oscar Gómez
View author publications
You can also search for this author in PubMed Google Scholar
Sergio Ilarri
View author publications
You can also search for this author in PubMed Google Scholar
Eduardo Mena
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Information Science Department, University of Groningen, Oude Kijk in ’t Jatstraat 26, 9712 EK, Groningen, The Netherlands
Gosse Bouma
Faculty of Economics and Business, University of Groningen, Nettelbosje 2, 9747 AE, Groningen, The Netherlands
Ashwin Ittoo & Hans Wortmann &
CNAM-Laboratoire Cédric, 292 rue St. Martin, 75141, Paris Cedex 03, France
Elisabeth Métais

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Garrido, A.L., Gómez, O., Ilarri, S., Mena, E. (2012). An Experience Developing a Semantic Annotation System in a Media Group. In: Bouma, G., Ittoo, A., Métais, E., Wortmann, H. (eds) Natural Language Processing and Information Systems. NLDB 2012. Lecture Notes in Computer Science, vol 7337. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31178-9_43

Download citation

DOI: https://doi.org/10.1007/978-3-642-31178-9_43
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31177-2
Online ISBN: 978-3-642-31178-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An Experience Developing a Semantic Annotation System in a Media Group

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Enriching news events with meta-knowledge information

Semantic Annotating of Text Documents: Basic Concepts and Taxonomic Approach

Slovene Multi-word Units: Identification, Categorization, and Representation

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

An Experience Developing a Semantic Annotation System in a Media Group

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Enriching news events with meta-knowledge information

Semantic Annotating of Text Documents: Basic Concepts and Taxonomic Approach

Slovene Multi-word Units: Identification, Categorization, and Representation

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation