Skip to main content

An Experience Developing a Semantic Annotation System in a Media Group

  • Conference paper
Natural Language Processing and Information Systems (NLDB 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7337))

Abstract

Nowadays media companies have difficulties for managing large amounts of news from agencies and self-made articles. Journalists and documentalists must face categorization tasks every day. There is also an additional trouble due to the usual large size of the list of words in a thesaurus, the typical tool used to tag news in the media.

In this paper, we present a new method to tackle the problem of information extraction over a set of texts where the annotation must be composed by thesaurus elements. The method consists of applying lemmatization, obtaining keywords, and finally using a combination of Support Vector Machines (SVM), ontologies and heuristics to deduce appropriate tags for the annotation. We have evaluated it with a real set of changing news and we compared our tagging with the annotation performed by a real documentation department, obtaining very good results.

This research work has been supported by the CICYT project TIN2010-21387-C02-02.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Gilchrist, A.: Thesauri, taxonomies and ontologies: an etymological note. Journal of Documentation 59(1), 7–18 (2003)

    Article  Google Scholar 

  2. Garrido, A.L., Gomez, O., Ilarri, S., Mena, E.: NASS: news annotation semantic system. In: Proceedings of ICTAI 2011, International Conference on Tools with Artificial Intelligence, pp. 904–905. IEEE (2011)

    Google Scholar 

  3. Smeaton, A.F.: Using NLP or NLP resources for information retrieval tasks. Natural Language Information Retrieval. Kluwer Academic Publishers (1997)

    Google Scholar 

  4. Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  5. Wimalasuriya, D.C., Dou, D.: Ontology-Based Information Extraction: an introduction and a survey of current approaches. Journal of Information Science 36(3), 306–323 (2010)

    Article  Google Scholar 

  6. Carreras, X., Chao, I., Padró, L., Padró, M.: Freeling: an open-source suite of language analyzers. In: Proceedings of the 4th International Conference on Language Resources and Evaluation, pp. 239–242. European Language Resources Association (2004)

    Google Scholar 

  7. Sekine, S., Ranchhod, E.: Named entities: recognition, classification and use. John Benjamins (2009)

    Google Scholar 

  8. Leopold, E., Kindermann, J.: Text categorization with support vector machines. How to Represent Texts in Input Space? Machine Learning 46, 423–444 (2002)

    MATH  Google Scholar 

  9. Fernandez-Lopez, M., Corcho, O.: Ontological engineering. Springer (2004)

    Google Scholar 

  10. Cortes, C., Vapnik, V.N.: Support-vector networks. Machine Learning 20(3), 273–297 (1995)

    MATH  Google Scholar 

  11. Vogrincic, S., Bosnic, Z.: Ontology-based multi-label classification of economic articles. Computer Science and Information Systems 8(1), 101–119 (2011), ComSIS Consortium

    Google Scholar 

  12. Wu, X., Xie, F., Wu, G., Ding, W.: Personalized news filtering and summarization on the web. In: Proceedings of ICTAI 2011, International Conference on Tools with Artificial Intelligence, pp. 414–421. IEEE (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Garrido, A.L., Gómez, O., Ilarri, S., Mena, E. (2012). An Experience Developing a Semantic Annotation System in a Media Group. In: Bouma, G., Ittoo, A., Métais, E., Wortmann, H. (eds) Natural Language Processing and Information Systems. NLDB 2012. Lecture Notes in Computer Science, vol 7337. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31178-9_43

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-31178-9_43

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31177-2

  • Online ISBN: 978-3-642-31178-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics