Skip to main content

Scalable Semantic Annotation of Text Using Lexical and Web Resources

  • Conference paper
Artificial Intelligence: Theories, Models and Applications (SETN 2010)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6040))

Included in the following conference series:

Abstract

In this paper we are dealing with the task of adding domain-specific semantic tags to a document, based solely on the domain ontology and generic lexical and Web resources. In this manner, we avoid the need for trained domain-specific lexical resources, which hinder the scalability of semantic annotation. More specifically, the proposed method maps the content of the document to concepts of the ontology, using the WordNet lexicon and Wikipedia. The method comprises a novel combination of measures of semantic relatedness and word sense disambiguation techniques to identify the most related ontology concepts for the document. We test the method on two case studies: (a) a set of summaries, accompanying environmental news videos, (b) a set of medical abstracts. The results in both cases show that the proposed method achieves reasonable performance, thus pointing to a promising path for scalable semantic annotation of documents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agirre, E., Rigau, G.: A proposal for word sense disambiguation using conceptual distance. In: International Conference on Recent Advances in NLP (1995)

    Google Scholar 

  2. Budanitsky, A., Hirst, G.: Evaluating wordnet-based measures of lexical semantic relatedness. Computational Linguistics 32(1), 13–47 (2006)

    Article  Google Scholar 

  3. Cimiano, P., Ladwig, G., Staab, S.: Gimme’ the context: context-driven automatic semantic annotation with c-pankow. In: WWW, pp. 332–341 (2005)

    Google Scholar 

  4. Ding, Y., Embley, D.W.: Using data-extraction ontologies to foster automating semantic annotation. In: ICDE Workshops (2006)

    Google Scholar 

  5. El-Beltagy, S.R., Hazman, M., Rafea, A.A.: Ontology based annotation of text segments. In: SAC (2007)

    Google Scholar 

  6. Erdmann, M., Maedche, A., Schnurr, H.P., Staab, S.: From manual to semi-automatic semantic annotation: About ontology-based text annotation tools. ETAI Journal - Section on Semantic Web 6(2) (2001)

    Google Scholar 

  7. Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: IJCAI, pp. 1606–1611 (2007)

    Google Scholar 

  8. Laclavik, M., Seleng, M., Gatial, E., Balogh, Z., Hluchý, L.: Ontology based text annotation - ontea. In: EJC (2006)

    Google Scholar 

  9. Leacock, C., Miller, G., Chodorow, M.: Using corpus statistics and wordnet relations for sense identification. Computational Linguistics 24(1), 147–165 (1998)

    Google Scholar 

  10. Lesk, M.: Automated sense disambiguation using machine-readable dictionaries: How to tell a pine cone from an ice cream cone. In: SIGDOC (1986)

    Google Scholar 

  11. Milne, D., Witten, I.H.: An effective, low-cost measure of semantic relatedness obtained from wikipedia links. In: AAAI Workshop on Wikipedia and Artificial Intelligence (2008)

    Google Scholar 

  12. Patwardhan, S., Pedersen, T.: Using wordnet based context vectors to estimate the semantic relatedness of concepts. In: EACL 2006 Workshop Making Sense of Sense - Bringing Computational Linguistics and Psycholinguistics Together (2006)

    Google Scholar 

  13. Resnik, P.: Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. Journal of Artificial Intelligence Research 11, 95–130 (1999)

    MATH  Google Scholar 

  14. Tsatsaronis, G., Varlamis, I., Nørvåg, K.: An experimental study on unsupervised graph-based word sense disambiguation. In: CICLing (2010)

    Google Scholar 

  15. Tsatsaronis, G., Varlamis, I., Nørvåg, K., Vazirgiannis, M.: Omiotis: A thesaurus-based measure of text relatedness. In: ECML-PKDD (2009)

    Google Scholar 

  16. Tsatsaronis, G., Varlamis, I., Vazirgiannis, M.: Text relatedness based on a word thesaurus. Journal of Artificial Intelligence Research 37, 1–39 (2010)

    MATH  Google Scholar 

  17. Yarowsky, D.: Word-sense disambiguation using statistical models of roget’s categories trained on large corpora. In: Int. Conf. on Compuitational Linguistics (1992)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zavitsanos, E., Tsatsaronis, G., Varlamis, I., Paliouras, G. (2010). Scalable Semantic Annotation of Text Using Lexical and Web Resources. In: Konstantopoulos, S., Perantonis, S., Karkaletsis, V., Spyropoulos, C.D., Vouros, G. (eds) Artificial Intelligence: Theories, Models and Applications. SETN 2010. Lecture Notes in Computer Science(), vol 6040. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12842-4_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12842-4_32

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12841-7

  • Online ISBN: 978-3-642-12842-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics