Skip to main content

Semantic Annotations and Retrieval: Manual, Semiautomatic, and Automatic Generation

  • Reference work entry

Abstract

The semantic annotation of textual Web content is key for the success of the Semantic Web. This entry reviews key approaches and state-of-the-art systems, as well as drawing conclusions on outstanding challenges and future work.

First, the problem of semantic annotation is defined and distinguished from other related research fields. Manual annotation tools are discussed next in the context of key requirements, such as support for diverse document formats, multiple ontologies, and collaborative, Web-based annotation.

Next, the entry discusses ontology-oriented, semiautomatic, and automatic systems, which typically target ontologies as their output format, but do not use them as a knowledge resource during semantic analysis. Then a number of more advanced ontology-based semantic annotation approaches are presented and compared to one another. Particular emphasis is on scalability (i.e., the ability to process millions of documents) and customization (i.e., how easy it is to adapt these systems to new domains and/or ontologies).

The semantic retrieval of documents enables users to find all documents that mention one or more instances from the ontology and/or relations. The queries can also mix free-text keywords, not just the annotations. Here different types of retrieval tools are reviewed, some of which provide document browsing functionality as well as search refinement capabilities. The entry then provides in-depth examples of three semantic annotation applications: the GATE framework, News Collector, and large-scale patent processing. Future issues to be addressed are making use of linked data, dealing with large-scale, highly ambiguous ontologies, multilinguality, lexicalization of ontologies, and from an implementational perspective, semantic annotation as a service.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   499.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   549.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. McDowell, L.K., Cafarella, M.: Ontology-driven information extraction with OntoSyphon. In: Proceedings of the Fifth International Semantic Web Conference (ISWC 2006), Athens, GA. Lecture Notes in Computer Science, vol. 4273, pp. 428–444. Springer, Berlin (2006)

    Google Scholar 

  2. Mahesh, K., Kud, J., Dixon, P.: Oracle at TREC8: a lexical approach. In: Proceedings of the Eighth Text Retrieval Conference (TREC-8), Gaithersburg (1999)

    Google Scholar 

  3. Voorhees, E.: Using WordNet for text retrieval. In: Fellbaum, C. (ed.) WordNet: An Electronic Lexical Database, pp. 285–303. MIT Press, Cambridge (1998)

    Google Scholar 

  4. Handschuh, S., Staab, S.: Authoring and annotation of web pages in CREAM. In: Proceedings of the 11th International World Wide Web Conference (WWW 2002), Honolulu (2002)

    Google Scholar 

  5. Uren, V., Cimiano, P., Iria, J., Handschuh, S., Vargas-Vera, M., Motta, E., Ciravegna, F.: Semantic annotation for knowledge management: requirements and a survey of the state of the art. J Web Semant. 4(1), 14–28 (2006)

    Article  Google Scholar 

  6. Schroeter, R., Hunter, J.: Annotating relationships between multiple mixed-media digital objects by extending Annotea. In: Proceedings of the Fourth European Semantic Web Conference (ESWC 2007), Innsbruck. Lecture Notes in Computer Science, vol. 4519, pp. 533–548. Springer, Berlin (2007)

    Google Scholar 

  7. Halaschek-Wiener, C., Golbeck, J., Schain, A., Grove, M., Parsia, B., Hendler, J.A.: Annotation and provenance tracking in semantic web photo libraries. In: Proceedings of the International Provenance and Annotation Workshop (IPAW 2006), Chicago. Lecture Notes in Computer Science, vol. 4145. Springer, Berlin (2006)

    Google Scholar 

  8. Defense Advanced Research Projects Agency: Proceedings of the Sixth Message Understanding Conference (MUC-6). Defense Advanced Research Projects Agency, Morgan Kaufmann, California (1995)

    Google Scholar 

  9. Marsh, E., Perzanowski, D.: Muc-7 evaluation of IE technology: overview of results. In: Proceedings of the Seventh Message Understanding Conference (MUC-7), Fairfax. http://www.itl.nist.gov/iaui/894.02/related projects/muc/index.html (1998)

  10. ACE: Annotation guidelines for Entity Detection and Tracking (EDT). http://www.ldc.upenn.edu/Projects/ACE/ (2004)

  11. Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: A framework and graphical development environment for robust NLP tools and applications. In: Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics (ACL 2002), Philadelphia (2002)

    Google Scholar 

  12. Kogut, P., Holmes, W.: AeroDAML: applying information extraction to generate DAML annotations from web pages. In: First International Conference on Knowledge Capture (K-CAP 2001), Workshop on Knowledge Markup and Semantic Annotation, Victoria (2001)

    Google Scholar 

  13. Fellbaum, C. (ed.): WordNet – An Electronic Lexical Database. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  14. Ciravegna, F., Wilks, Y.: Designing adaptive information extraction for the semantic web in Amilcare. In: Handschuh, S., Staab, S. (eds.) Annotation for the Semantic Web. IOS Press, Amsterdam (2003)

    Google Scholar 

  15. Maynard, D., Tablan, V., Cunningham, H., Ursu, C., Saggion, H., Bontcheva, K., Wilks, Y.: Architectural elements of language engineering robustness. J. Nat. Lang. Eng. 8(2/3), 257–274 (2002). Special Issue on Robust Methods in Analysis of Natural Language Data

    Google Scholar 

  16. Ciravegna, F.: Adaptive information extraction from text by rule induction and generalisation. In: Proceedings of the 17th International Joint Conference on Artificial Intelligence (IJCAI 2001), Seattle (2001)

    Google Scholar 

  17. Ciravegna, F., Dingli, A., Petrelli, D., Wilks, Y.: User-system cooperation in document annotation based on information extraction. In: Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management (EKAW 2002), Siguenza, pp. 122–137 (2002)

    Google Scholar 

  18. Motta, E., Vargas-Vera, M., Domingue, J., Lanzoni, M., Stutt, A., Ciravegna, F.: MnM: Ontology driven semi-automatic and automatic support for semantic markup. In: Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management (EKAW 2002), Siguenza, pp. 379–391 (2002)

    Google Scholar 

  19. Handschuh, S., Staab, S., Ciravegna, F.: S-CREAM–semi-automatic CREAtion of metadata. In: Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management (EKAW 2002), Siguenza, pp. 358–372 (2002)

    Google Scholar 

  20. Handschuh, S., Staab, S., Maedche, A.: CREAM – creating relational metadata with a component-based, ontology-driven framework. In: Proceedings of the First International Conference on Knowledge Capture (K-CAP 2001), Victoria (2001)

    Google Scholar 

  21. Baumgartner, R., Froelich, O., Gottlob, G.: The Lixto systems applications in business intelligence and semantic web. In: Proceedings of the Fourth European Semantic Web Conference (ESWC 2007), Innsbruck. Lecture Notes in Computer Science, vol. 4519, pp. 16–26. Springer, Berlin (2007)

    Book  Google Scholar 

  22. Domingue, J., Dzbor, M., Motta, E.: Magpie: Supporting browsing and navigation on the semantic web. In: Nunes, N., Rich, C. (eds.) Proceedings of the ACM Conference on Intelligent User Interfaces (IUI 2004), Portugal, pp. 191–197 (2004)

    Google Scholar 

  23. Gridinoc, L., Sabou, M., D’Aquin, M., Dzbor, M., Motta, E.: Semantic browsing with PowerMagpie. In: Proceedings of the Fifth European Semantic Web Conference on the Semantic Web (ESWC 2008), Tenerife. Lecture Notes in Computer Science, vol. 5021, pp. 802–806. Springer, Heidelberg (2008)

    Google Scholar 

  24. Cimiano, P., Handschuh, S., Staab, S.: Towards the self-annotating web. In: Proceedings of the 13th International World Wide Web Conference (WWW 2004), New York (2004)

    Google Scholar 

  25. Shchekotykhin, K.M., Jannach, D., Friedrich, G., Kozeruk, O.: AllRight: automatic ontology instantiation from tabular web documents. In: Proceedings of the Sixth International Semantic Web Conference and Second Asian Semantic Web Conference (ISWC/ASWC 2007), Busan. Lecture Notes in Computer Science, vol. 4825, pp. 466–479. Springer, Berlin (2007)

    Google Scholar 

  26. Dill, S., Eiron, N., Gibson, D., Gruhl, D., Guha, R., Jhingran, A., Kanungo, T., Rajagopalan, S., Tomkins, A., Tomlin, J.A., Zien, J.Y.: SemTag and seeker: boot-strapping the semantic web via automated semantic annotation. In: Proceedings of the 12th International World Wide Web Conference (WWW 2003), Budapest (2003)

    Google Scholar 

  27. Mahesh, K., Nirenburg, S., Cowie, J., Farwell, D.: An assessment of Cyc for natural language processing. Technical report MCCS report, New Mexico State University (1966)

    Google Scholar 

  28. Popov, B., Kiryakov, A., Kirilov, A., Manov, D., Ognyanoff, D., Goranov, M.: KIM – semantic annotation platform. In: Proceedings of the Second International Semantic Web Conference (ISWC 2003), Sanibel Island. Lecture Notes in Computer Science, vol. 2870, pp. 484–499. Springer, Heidelberg (2003)

    Google Scholar 

  29. Kiryakov, A., Popov, B., Ognyanoff, D., Manov, D., Kirilov, A., Goranov, M.: Semantic annotation, indexing and retrieval. J. Web Semant. 1(2), 671–680 (2004). ISWC 2003 Special Issue

    Google Scholar 

  30. Mikheev, A., Moens, M., Grover, C.: Named entity recognition without gazetteers. In: Proceedings of the Ninth Conference of the European Chapter of the Association for Computational Linguistics (EACL 1999), Bergen, pp. 1–8 (1999)

    Google Scholar 

  31. Li, Y., Bontcheva, K., Cunningham, H.: Hierarchical, perceptron-like learning for ontology based information extraction. In: Proceedings of the 16th International World Wide Web Conference (WWW 2007), Banff, pp. 777–786 (2007)

    Google Scholar 

  32. Gruhl, D., Nagarajan, M., Pieper, J., Robson, C., Sheth, A.: Context and domain knowledge enhanced entity spotting in informal text. In: Proceedings of the Eighth International Semantic Web Conference (ISWC 2009), Chantilly. Lecture Notes in Computer Science, vol. 5823, pp. 260–276. Springer, Berlin (2009)

    Google Scholar 

  33. Aswani, N., Bontcheva, K., Cunningham, H.: Mining information for instance unification. In: Proceedings of the Fifth International Semantic Web Conference (ISWC 2006), Athens. Lecture Notes in Computer Science, vol. 4273, pp. 329–342. Springer, Berlin (2006)

    Google Scholar 

  34. Fernandez, N., Blazquez, J.M., Sanchez, L., Bernardi, A.: Identityrank: named entity disambiguation in the context of the NEWS project. In: Proceedings of the Fourth European Semantic Web Conference (ESWC 2007), Innsbruck. Lecture Notes in Computer Science, vol. 4519, pp. 604–654. Springer, Heidelberg (2007)

    Google Scholar 

  35. Yankova, M., Saggion, H., Cunningham, H.: Adopting ontologies for multisource identity resolution. In: Duke, A., Hepp, M., Bontcheva, K., Vilain, M.B. (eds.) Proceedings of the First International Workshop on Ontology-supported Business Intelligence (OBI 2008), Karlsruhe. ACM International Conference Proceeding Series, vol. 308, p. 6. ACM, New York (2008)

    Google Scholar 

  36. Ding, L., Finin, T., Joshi, A., Pan, R., Cost, R.S., Peng, Y., Reddivari, P., Doshi, V.C., Sachs, J.: Swoogle: a search and metadata engine for the semantic web. In: Proceedings of the 13th ACM Conference on Information and Knowledge Management (CIKM 2004), Washington, DC (2004)

    Google Scholar 

  37. Hildebrand, M., van Ossenbruggen, J., Hardman, J.: /facet: a browser for heterogeneous semantic web repositories. In: Proceedings of the Fifth International Semantic Web Conference (ISWC 2006), Athens, GA. Lecture Notes in Computer Science, vol. 4273, pp. 272–285. Springer, Berlin (2006)

    Google Scholar 

  38. Damljanovic, D., Agatonovic, M., Cunningham, H.: Natural language interfaces to ontologies: combining syntactic analysis and ontology-based lookup through the user interaction. In: Proceedings of the Seventh Extended Semantic Web Conference (ESWC 2010), Heraklion. Lecture Notes in Computer Science, vol. 6088, pp. 106–120. Springer, Heidelberg (2010)

    Google Scholar 

  39. Lopez, V., Uren, V., Motta, E., Pasin, M.: Aqualog: an ontology-driven question answering system for organizational semantic intranets. J. Web Semant. 5(2), 72–105 (2007)

    Article  Google Scholar 

  40. Kaufmann, E., Bernstein, A.: How useful are natural language interfaces to the semantic web for casual end-users? In: Proceedings of the Fourth European Semantic Web Conference (ESWC 2007), Innsbruck. Lecture Notes in Computer Science, vol. 4519. Springer, Berlin (2007)

    Google Scholar 

  41. Funk, A., Tablan, V., Bontcheva, K., Cunningham, H., Davis, B., Handschuh, S.: CLOnE: controlled language for ontology editing. In: Proceedings of the Sixth International Semantic Web Conference (ISWC 2007), Busan. Lecture Notes in Computer Science, vol. 4825, pp. 142–155. Springer, Berlin (2007)

    Google Scholar 

  42. Bernstein, A., Kaufmann, E.: GINO – a guided input natural language ontology editor. In: Proceedings of the Fifth International Semantic Web Conference (ISWC 2006), Athens. Lecture Notes in Computer Science, vol. 4273, pp. 144–157. Springer, Berlin (2006)

    Google Scholar 

  43. Damljanovic, D., Bontcheva, K.: Enhanced semantic access to software artefacts. In: Proceedings of the Fourth International Workshop on Semantic Web Enabled Software Engineering (SWESE 2008), Karlsruhe (2008)

    Google Scholar 

  44. Lei, Y., Uren, V., Motta, E.: Semsearch: a search engine for the semantic web. In: Managing Knowledge in a World of Networks, pp. 238–245. Springer, Berlin/Heidelberg (2006)

    Google Scholar 

  45. Cimiano, P., Voelker, J.: Text2Onto – a framework for ontology learning and data-driven change discovery. In: Proceedings of the 10th International Conference on Applications of Natural Language to Information Systems (NLDB 2005), Alicante (2005)

    Google Scholar 

  46. Sabou, M.: From software APIs to web service ontologies: a semi-automatic extraction method. In: Proceedings of the Third International Semantic Web Conference (ISWC 2004), Hiroshima. Lecture Notes in Computer Science, vol. 3298, pp. 410–424. Springer, Berlin (2004)

    Google Scholar 

  47. Maynard, D., Funk, A., Peters, W.: Using lexico-syntactic ontology design patterns for ontology creation and population. In: Proceedings of the ISWC Workshop on Ontology Patterns (WOP 2009), Washington, DC (2009)

    Google Scholar 

  48. van Rijsbergen, C.: Information Retrieval. Butterworths, London (1979)

    Google Scholar 

  49. Maynard, D., Peters, W., Li, Y.: Metrics for evaluation of ontology-based information extraction. In: Proceedings of the Fourth International Workshop on Evaluation of Ontologies for the Web (EON 2006) at the 15th International World Wide Web Conference (WWW 2006), Edinburgh (2006)

    Google Scholar 

  50. Cimiano, P., Staab, S., Tane, J.: Automatic acquisition of taxonomies from text: FCA meets NLP. In: Proceedings of the ECML/PKDD Workshop on Adaptive Text Extraction and Mining, Cavtat-Dubrovnik, Croatia, pp. 10–17 (2003)

    Google Scholar 

  51. Kiryakov, A.: OWLIM: balancing between scalable repository and light-weight reasoner. In: Proceedings of the 15th International World Wide Web Conference (WWW 2006), Edinburgh (2006)

    Google Scholar 

  52. Agatonovic, M., Aswani, N., Bontcheva, K., Cunningham, H., Heitz, T., Li, Y., Roberts, I., Tablan, V.: Large-scale, parallel automatic patent annotation. In: Proceedings of First International CIKM Workshop on Patent Information Retrieval (PaIR 2008), Napa Valley (2008)

    Google Scholar 

  53. Buitelaar, P., Cimiano, P., Haase, P., Sintek, M.: Towards linguistically grounded ontologies. In: Proceedings of the Sixth European Semantic Web Conference (ESWC 2009), Heraklion. Lecture Notes in Computer Science, vol. 5554, pp. 111–125. Springer, Heidelberg (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kalina Bontcheva .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this entry

Cite this entry

Bontcheva, K., Cunningham, H. (2011). Semantic Annotations and Retrieval: Manual, Semiautomatic, and Automatic Generation. In: Domingue, J., Fensel, D., Hendler, J.A. (eds) Handbook of Semantic Web Technologies. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-92913-0_3

Download citation

Publish with us

Policies and ethics