Skip to main content

Integrated Linguistic Annotation Models and Their Application in the Domain of Antecedent Detection

  • Chapter
Modeling, Learning, and Processing of Text Technological Data Structures

Part of the book series: Studies in Computational Intelligence ((SCI,volume 370))

Abstract

Seamless integration of various, often heterogeneous linguistic resources in terms of their output formats and a combined analysis of the respective annotation layers are crucial tasks for linguistic research. After a decade of concentration on the development of formats to structure single annotations for specific linguistic issues, in the last years a variety of specifications to store multiple annotations over the same primary data has been developed. The paper focuses on the integration of the knowledge resource logical document structure information into a text document to enhance the task of automatic anaphora resolution both for the task of candidate detection and antecedent selection. The paper investigates data structures necessary for knowledge integration and retrieval.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Aho, A.V., Hopcroft, J.E., Ullman, J.D.: Data Structures and Algorithms. Addison-Wesley, Reading (1983)

    MATH  Google Scholar 

  2. Alink, W., Bhoedjang, R., de Vries, A.P., Boncz, P.A.: Efficient XQuery Support for Stand-Off Annotation. In: Proceedings of the 3rd International Workshop on XQuery Implementation, Experience and Perspectives, in Cooperation with ACM SIGMOD, Chicago, USA (2006)

    Google Scholar 

  3. Alink, W., Jijkoun, V., Ahn, D., de Rijke, M.: Representing and Querying Multi-dimensional Markup for Question Answering. In: Proceedings of the 5th EACL Workshop on NLP and XML (NLPXML 2006): Multi-Dimensional Markup in Natural Language Processing, EACL, Trento (2006)

    Google Scholar 

  4. Bird, S., Liberman, M.: Annotation graphs as a framework for multidimensional linguistic data analysis. In: Proceedings of the Workshop “Towards Standards and Tools for Discourse Tagging”, Association for Computational Linguistics, pp. 1–10 (1999)

    Google Scholar 

  5. Burnard, L., Bauman, S. (eds.): TEI P5: Guidelines for Electronic Text Encoding and Interchange. published for the TEI Consortium by Humanities Computing Unit, University of Oxford, Oxford, Providence, Charlottesville, Bergen (2007)

    Google Scholar 

  6. Carletta, J., Evert, S., Heid, U., Kilgour, J.: The NITE XML toolkit: data model and query language. Language Resources and Evaluation 39(4), 313–334 (2005)

    Article  Google Scholar 

  7. Clark, H.: Bridging. In: Johnson-Laird, P.N., Wason, P.C. (eds.) Thinking: Readings in Cognitive Science, pp. 411–420. Cambridge University Press, Cambridge (1977)

    Google Scholar 

  8. Cowan, J., Tennison, J., Piez, W.: LMNL update. In: Proceedings of Extreme Markup Languages, Montréal, Québec (2006)

    Google Scholar 

  9. DeRose, S.J.: Markup Overlap: A Review and a Horse. In: Proceedings of Extreme Markup Languages (2004)

    Google Scholar 

  10. DeRose, S.J., Durand, D.G., Mylonas, E., Renear, A.H.: What is text, really? Journal of Computing in Higher Education 1(2), 3–26 (1990)

    Article  Google Scholar 

  11. Diewald, N., Goecke, D., Stührenberg, M., Garbar, A.: Serengeti - webbasierte annotation semantischer relationen. appears in: LDV-Forum GLDV-Journal for Computational Linguistics and language Technology (2009)

    Google Scholar 

  12. Dipper, S.: Xml-based stand-off representation and exploitation of multi-level linguistic annotation. In: Proceedings of Berliner XML Tage 2005 (BXML 2005), Berlin, Deutschland, pp. 39–50 (2005)

    Google Scholar 

  13. Dipper, S., Götze, M., Küssner, U., Stede, M.: Representing and Querying Standoff XML. In: Rehm, G., Witt, A., Lemnitzer, L. (eds.) Datenstrukturen für linguistische Ressourcen und ihre Anwendungen. Data Structures for Linguistic Resources and Applications. Proceedings of the Biennial GLDV Conference 2007, pp. 337–346. Gunter Narr Verlag, Tübingen (2007)

    Google Scholar 

  14. Durusau, P., O’Donnel, M.B.: Tabling the overlap discussion. In: Proceedings of Extreme Markup Languages (2004)

    Google Scholar 

  15. Goecke, D., Witt, A.: Exploiting logical document structure for anaphora resolution. In: Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006), Genoa, Italy (2006)

    Google Scholar 

  16. Goecke, D., Stührenberg, M., Holler, A.: Koreferenz, Kospezifikation und Bridging: Annotationsschema. Research group Text-technological Modelling of Information, Universität Bielefeld, Fakultät für Linguistik und Literaturwissenschaft, & Georg-August-Universität Göttingen, Seminar für Deutsche Philologie (2007)

    Google Scholar 

  17. Goecke, D., Stührenberg, M., Wandmacher, T.: A hybrid approach to resolve nominal anaphora. LDV Forum – Zeitschrift für Computerlinguistik und Sprachtechnologie 23(1), 43–58 (2008)

    Google Scholar 

  18. Goecke, D., Stührenberg, M., Witt, A.: Influence of text type and text length on anaphoric annotation. In: ELRA (ed.) Proceedings of the Sixth International Language Resources and Evaluation (LREC 2008), Marrakech, Morocco (2008)

    Google Scholar 

  19. Goecke, D., Lüngen, H., Metzing, D., Stührenberg, M., Witt, A.: Different views on markup. distinguishing levels and layers. In: Witt, A., Metzing, D. (eds.) Linguistic Modeling of Information and Markup Languages. Contributions to Language Technology, pp. 1–21. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  20. Goldfarb, C.F.: The SGML Handbook. Oxford University Press, Oxford (1991)

    Google Scholar 

  21. Hilbert, M., Schonefeld, O., Witt, A.: Making CONCUR work. In: Proceedings of Extreme Markup Languages (2005)

    Google Scholar 

  22. Hopcroft, J., Motwani, R., Ullman, J.: Introduction to Automata Theory, Languages, and Computation, 2nd edn. Addison-Wesley, Reading (2000)

    Google Scholar 

  23. Iacob, I.E., Dekhtyar, A.: Processing XML documents with overlapping hierarchies. In: JCDL 2005: Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries, pp. 409–409. ACM Press, New York (2005)

    Chapter  Google Scholar 

  24. Iacob, I.E., Dekhtyar, A.: Towards a query language for multihierarchical xml: Revisiting xpath. In: Proceedings of the 8th International Workshop on the Web & Databases (WebDB 2005), Baltimore, Maryland, USA, pp. 49–54 (2005)

    Google Scholar 

  25. Ide, N., Suderman, K.: GrAF: A Graph-based Format for Linguistic Annotations. In: Proceedings of the Linguistic Annotation Workshop, Association for Computational Linguistics, Prague, Czech Republic, pp. 1–8 (2007)

    Google Scholar 

  26. ISO/IEC 19757-2:2003, Information technology – Document Schema Definition Language (DSDL) – Part 2: Regular-grammar-based validation – RELAX NG (ISO/IEC 19757-2). International Standard, International Organization for Standardization, Geneva (2003)

    Google Scholar 

  27. Jagadish, H.V., Lakshmanany, L.V.S., Scannapieco, M., Srivastava, D., Wiwatwattana, N.: Colorful XML: One hierarchy isn’t enough. In: Proceedings of ACM SIGMOD International Conference on Management of Data (SIGMOD 2004), pp. 251–262. ACM Press, New York (2004)

    Chapter  Google Scholar 

  28. Langer, H., Lüngen, H., Bayerl, P.S.: Text type structure and logical document structure. In: Proceedings of the ACL 2004 Workshop on Discourse Annotation, Barcelona, pp. 49–56 (2004), http://www.uni-giessen.de/germanistik/ascl/dfg-projekt/pdfs/aclws.pdf

  29. Le Maitre, J.: Describing multistructured XML documents by means of delay nodes. In: DocEng 2006: Proceedings of the 2006 ACM symposium on Document engineering, pp. 155–164. ACM Press, New York (2006)

    Chapter  Google Scholar 

  30. Lenz, E.A., Lüngen, H.: Dokumentation: Annotationsschicht: Logische Dokumentstruktur. Research group Text-technological Modelling of Information, Universität Dortmund, Institut für deutsche Sprache und Literatur, & Justus-Liebig-Universität Gießen, Fachgebiet Angewandte Sprachwissenschaft und Computerlinguistik (2004)

    Google Scholar 

  31. Lobin, H.: Informationsmodellierung in XML und SGML. Springer, Heidelberg (2000)

    Book  MATH  Google Scholar 

  32. Metzing, D.: Diskurs-Anaphern. Texttechnologische Informationsmodellierung und benachbarte linguistische Forschungskontexte. In: Marello, C., Hölker, K. (eds.) Dimensionen der Analyse von Texten und Diskursen, LIT Verlag (to appear 2011)

    Google Scholar 

  33. Paraboni, I.: Generating references in hierarchical domains: the case of document deixis. PhD thesis, Information Technology Research Institute, University of Brighton (2003)

    Google Scholar 

  34. Paraboni, I., van Deemter, K., Masthoff, J.: Generating referring expressions: Making referents easy to identify. Computational Linguistics 33(2), 229–254 (2007)

    Article  Google Scholar 

  35. Pianta, E., Bentivogli, L.: Annotating Discontinuous Structures in XML: the Multiword Case. In: Proceedings of LREC 2004 Workshop on “XML-based richly annotated corpora”, Lisbon, Portugal, pp. 30–37 (2004)

    Google Scholar 

  36. Poesio, M., Diewald, N., Stührenberg, M., Chamberlain, J., Jettka, D., Goecke, D., Kruschwitz, U.: Markup infrastructure for the anaphoric bank: Supporting web collaboration. In: Mehler, A., Kühnberger, K.U., Lobin, H., Lüngen, H., Storrer, A., Witt, A. (eds.) Modelling, Learning and Processing of Text-Technological Data Structures. Springer, Berlin (2011)

    Google Scholar 

  37. Power, R., Scott, D., Bouayad-Agha, N.: Document structure. Computational Linguistics 29(2), 211–260 (2003)

    Article  Google Scholar 

  38. Rizzi, R.: Complexity of context-free grammars with exceptions and the inadequacy of grammars as models for xml and sgml. Markup Languages – Theory & Practice 3(1), 107–116 (2001)

    Article  Google Scholar 

  39. Schonefeld, O.: XCONCUR and XCONCUR-CL: A constraint-based approach for the validation of concurrent markup. In: Rehm, G., Witt, A., Lemnitzer, L. (eds.) Datenstrukturen für linguistische Ressourcen und ihre Anwendungen. Data Structures for Linguistic Resources and Applications. Proceedings of the Biennial GLDV Conference 2007. Gunter Narr Verlag, Tübingen (2007)

    Google Scholar 

  40. Schonefeld, O.: A simple API for XCONCUR. In: Proceedings of Balisage: The Markup Conference, Montréal, Québec (2008)

    Google Scholar 

  41. Soon, W.M., Lim, D.C.Y., Ng, H.T.: A machine learning approach to coreference resolution of noun phrases. Computational Linguistics 27(4), 521–544 (2001)

    Article  Google Scholar 

  42. Sperberg-McQueen, C., Huitfeldt, C.: Markup discontinued discontinuity in texmecs, goddag structures, and rabbit/duck grammars. In: Proceedings of Balisage: The Markup Conference, Balisage Series on Markup Technologies, vol. 1 (2008)

    Google Scholar 

  43. Sperberg-McQueen, C.M.: Rabbit/duck grammars: a validation method for overlapping structures. In: Proceedings of Extreme Markup Languages (2006)

    Google Scholar 

  44. Sperberg-McQueen, C.M.: Representation of overlapping structures. In: Proceedings of Extreme Markup Languages (2007)

    Google Scholar 

  45. Sperberg-McQueen, C.M., Huitfeldt, C.: GODDAG: A data structure for overlapping hierarchies. In: King, P., Munson, E.V. (eds.) PODDP 2000 and DDEP 2000. LNCS, vol. 2023, pp. 139–160. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  46. Strube, M., Müller, C.: A machine learning approach to pronoun resolution in spoken dialogue. In: ACL 2003: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics, Morristown, NJ, USA, pp. 168–175 (2003)

    Google Scholar 

  47. Stührenberg, M., Goecke, D.: SGF – an integrated model for multiple annotations and its application in a linguistic domain. In: Proceedings of Balisage: The Markup Conference, Montréal, Québec (2008)

    Google Scholar 

  48. Stührenberg, M., Jettka, D.: A toolkit for multi-dimensional markup: The development of SGF to XStandoff. In: Proceedings of Balisage: The Markup Conference, Montréal, Québec, Balisage Series on Markup Technologies, vol. 3 (2009)

    Google Scholar 

  49. Tennison, J.: Layered markup and annotation language (LMNL). In: Proceedings of Extreme Markup Languages, Montréal, Québec (2002)

    Google Scholar 

  50. Tennison, J.: Creole: Validating overlapping markup. In: Proceedings of XTech 2007: The Ubiquitous Web Conference, Paris, France (2007)

    Google Scholar 

  51. Thompson, H.S., McKelvie, D.: Hyperlink semantics for standoff markup of read-only documents. In: Proceedings of SGML Europe 1997: The Next Decade –Pushing the Envelope, Barcelona, pp. 227–229 (1997)

    Google Scholar 

  52. Vieira, R., Poesio, M.: An empirically based system for processing definite descriptions. Computational Linguistics 26(4), 539–593 (2001)

    Article  Google Scholar 

  53. Walsh, N., Muellner, L.: Doc-Book: The Definitive Guide. O’Reilly, Sebastopol (1999)

    Google Scholar 

  54. Witt, A.: Meaning and interpretation of concurrent markup. In: Proceedings of ALLC-ACH 2002, Joint Conference of the ALLC and ACH, Tübingen (2002)

    Google Scholar 

  55. Witt, A., Goecke, D., Sasaki, F., Lüngen, H.: Unification of XML Documents with Concurrent Markup. Literary and Lingustic Computing 20(1), 103–116 (2005)

    Article  Google Scholar 

  56. Witt, A., Schonefeld, O., Rehm, G., Khoo, J., Evang, K.: On the lossless transformation of single-file, multi-layer annotations into multi-rooted trees. In: Proceedings of Extreme Markup Languages, Montréal, Québec (2007)

    Google Scholar 

  57. Witt, A., Rehm, G., Hinrichs, E., Lehmberg, T., Stegmann, J.: SusTEInability of linguistic resources through feature structures. Literary and Linguistic Computing (2009) (to appear)

    Google Scholar 

  58. Yang, X., Su, J., Zhou, G., Tan, C.L.: Improving pronoun resolution by incorporating coreferential information of candidates. In: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL 2004), Barcelona, Spain (2004)

    Google Scholar 

  59. Zacchiroli PMFVS: Towards the unification of formats for overlapping markup. New Review of Hypermedia and Multimedia 14(1):57–94 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Witt, A., Stührenberg, M., Goecke, D., Metzing, D. (2011). Integrated Linguistic Annotation Models and Their Application in the Domain of Antecedent Detection. In: Mehler, A., Kühnberger, KU., Lobin, H., Lüngen, H., Storrer, A., Witt, A. (eds) Modeling, Learning, and Processing of Text Technological Data Structures. Studies in Computational Intelligence, vol 370. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22613-7_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-22613-7_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-22612-0

  • Online ISBN: 978-3-642-22613-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics