Integrated Linguistic Annotation Models and Their Application in the Domain of Antecedent Detection

Witt, Andreas; Stührenberg, Maik; Goecke, Daniela; Metzing, Dieter

doi:10.1007/978-3-642-22613-7_11

Andreas Witt⁷,
Maik Stührenberg⁸,
Daniela Goecke⁸ &
…
Dieter Metzing⁸

Part of the book series: Studies in Computational Intelligence ((SCI,volume 370))

880 Accesses
1 Citations

Abstract

Seamless integration of various, often heterogeneous linguistic resources in terms of their output formats and a combined analysis of the respective annotation layers are crucial tasks for linguistic research. After a decade of concentration on the development of formats to structure single annotations for specific linguistic issues, in the last years a variety of specifications to store multiple annotations over the same primary data has been developed. The paper focuses on the integration of the knowledge resource logical document structure information into a text document to enhance the task of automatic anaphora resolution both for the task of candidate detection and antecedent selection. The paper investigates data structures necessary for knowledge integration and retrieval.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Designing Annotation Schemes: From Model to Representation

The Evolution of Text Annotation Frameworks

A Model for Language Annotations on the Web

References

Aho, A.V., Hopcroft, J.E., Ullman, J.D.: Data Structures and Algorithms. Addison-Wesley, Reading (1983)
MATH Google Scholar
Alink, W., Bhoedjang, R., de Vries, A.P., Boncz, P.A.: Efficient XQuery Support for Stand-Off Annotation. In: Proceedings of the 3rd International Workshop on XQuery Implementation, Experience and Perspectives, in Cooperation with ACM SIGMOD, Chicago, USA (2006)
Google Scholar
Alink, W., Jijkoun, V., Ahn, D., de Rijke, M.: Representing and Querying Multi-dimensional Markup for Question Answering. In: Proceedings of the 5th EACL Workshop on NLP and XML (NLPXML 2006): Multi-Dimensional Markup in Natural Language Processing, EACL, Trento (2006)
Google Scholar
Bird, S., Liberman, M.: Annotation graphs as a framework for multidimensional linguistic data analysis. In: Proceedings of the Workshop “Towards Standards and Tools for Discourse Tagging”, Association for Computational Linguistics, pp. 1–10 (1999)
Google Scholar
Burnard, L., Bauman, S. (eds.): TEI P5: Guidelines for Electronic Text Encoding and Interchange. published for the TEI Consortium by Humanities Computing Unit, University of Oxford, Oxford, Providence, Charlottesville, Bergen (2007)
Google Scholar
Carletta, J., Evert, S., Heid, U., Kilgour, J.: The NITE XML toolkit: data model and query language. Language Resources and Evaluation 39(4), 313–334 (2005)
Article Google Scholar
Clark, H.: Bridging. In: Johnson-Laird, P.N., Wason, P.C. (eds.) Thinking: Readings in Cognitive Science, pp. 411–420. Cambridge University Press, Cambridge (1977)
Google Scholar
Cowan, J., Tennison, J., Piez, W.: LMNL update. In: Proceedings of Extreme Markup Languages, Montréal, Québec (2006)
Google Scholar
DeRose, S.J.: Markup Overlap: A Review and a Horse. In: Proceedings of Extreme Markup Languages (2004)
Google Scholar
DeRose, S.J., Durand, D.G., Mylonas, E., Renear, A.H.: What is text, really? Journal of Computing in Higher Education 1(2), 3–26 (1990)
Article Google Scholar
Diewald, N., Goecke, D., Stührenberg, M., Garbar, A.: Serengeti - webbasierte annotation semantischer relationen. appears in: LDV-Forum GLDV-Journal for Computational Linguistics and language Technology (2009)
Google Scholar
Dipper, S.: Xml-based stand-off representation and exploitation of multi-level linguistic annotation. In: Proceedings of Berliner XML Tage 2005 (BXML 2005), Berlin, Deutschland, pp. 39–50 (2005)
Google Scholar
Dipper, S., Götze, M., Küssner, U., Stede, M.: Representing and Querying Standoff XML. In: Rehm, G., Witt, A., Lemnitzer, L. (eds.) Datenstrukturen für linguistische Ressourcen und ihre Anwendungen. Data Structures for Linguistic Resources and Applications. Proceedings of the Biennial GLDV Conference 2007, pp. 337–346. Gunter Narr Verlag, Tübingen (2007)
Google Scholar
Durusau, P., O’Donnel, M.B.: Tabling the overlap discussion. In: Proceedings of Extreme Markup Languages (2004)
Google Scholar
Goecke, D., Witt, A.: Exploiting logical document structure for anaphora resolution. In: Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006), Genoa, Italy (2006)
Google Scholar
Goecke, D., Stührenberg, M., Holler, A.: Koreferenz, Kospezifikation und Bridging: Annotationsschema. Research group Text-technological Modelling of Information, Universität Bielefeld, Fakultät für Linguistik und Literaturwissenschaft, & Georg-August-Universität Göttingen, Seminar für Deutsche Philologie (2007)
Google Scholar
Goecke, D., Stührenberg, M., Wandmacher, T.: A hybrid approach to resolve nominal anaphora. LDV Forum – Zeitschrift für Computerlinguistik und Sprachtechnologie 23(1), 43–58 (2008)
Google Scholar
Goecke, D., Stührenberg, M., Witt, A.: Influence of text type and text length on anaphoric annotation. In: ELRA (ed.) Proceedings of the Sixth International Language Resources and Evaluation (LREC 2008), Marrakech, Morocco (2008)
Google Scholar
Goecke, D., Lüngen, H., Metzing, D., Stührenberg, M., Witt, A.: Different views on markup. distinguishing levels and layers. In: Witt, A., Metzing, D. (eds.) Linguistic Modeling of Information and Markup Languages. Contributions to Language Technology, pp. 1–21. Springer, Heidelberg (2010)
Chapter Google Scholar
Goldfarb, C.F.: The SGML Handbook. Oxford University Press, Oxford (1991)
Google Scholar
Hilbert, M., Schonefeld, O., Witt, A.: Making CONCUR work. In: Proceedings of Extreme Markup Languages (2005)
Google Scholar
Hopcroft, J., Motwani, R., Ullman, J.: Introduction to Automata Theory, Languages, and Computation, 2nd edn. Addison-Wesley, Reading (2000)
Google Scholar
Iacob, I.E., Dekhtyar, A.: Processing XML documents with overlapping hierarchies. In: JCDL 2005: Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries, pp. 409–409. ACM Press, New York (2005)
Chapter Google Scholar
Iacob, I.E., Dekhtyar, A.: Towards a query language for multihierarchical xml: Revisiting xpath. In: Proceedings of the 8th International Workshop on the Web & Databases (WebDB 2005), Baltimore, Maryland, USA, pp. 49–54 (2005)
Google Scholar
Ide, N., Suderman, K.: GrAF: A Graph-based Format for Linguistic Annotations. In: Proceedings of the Linguistic Annotation Workshop, Association for Computational Linguistics, Prague, Czech Republic, pp. 1–8 (2007)
Google Scholar
ISO/IEC 19757-2:2003, Information technology – Document Schema Definition Language (DSDL) – Part 2: Regular-grammar-based validation – RELAX NG (ISO/IEC 19757-2). International Standard, International Organization for Standardization, Geneva (2003)
Google Scholar
Jagadish, H.V., Lakshmanany, L.V.S., Scannapieco, M., Srivastava, D., Wiwatwattana, N.: Colorful XML: One hierarchy isn’t enough. In: Proceedings of ACM SIGMOD International Conference on Management of Data (SIGMOD 2004), pp. 251–262. ACM Press, New York (2004)
Chapter Google Scholar
Langer, H., Lüngen, H., Bayerl, P.S.: Text type structure and logical document structure. In: Proceedings of the ACL 2004 Workshop on Discourse Annotation, Barcelona, pp. 49–56 (2004), http://www.uni-giessen.de/germanistik/ascl/dfg-projekt/pdfs/aclws.pdf
Le Maitre, J.: Describing multistructured XML documents by means of delay nodes. In: DocEng 2006: Proceedings of the 2006 ACM symposium on Document engineering, pp. 155–164. ACM Press, New York (2006)
Chapter Google Scholar
Lenz, E.A., Lüngen, H.: Dokumentation: Annotationsschicht: Logische Dokumentstruktur. Research group Text-technological Modelling of Information, Universität Dortmund, Institut für deutsche Sprache und Literatur, & Justus-Liebig-Universität Gießen, Fachgebiet Angewandte Sprachwissenschaft und Computerlinguistik (2004)
Google Scholar
Lobin, H.: Informationsmodellierung in XML und SGML. Springer, Heidelberg (2000)
Book MATH Google Scholar
Metzing, D.: Diskurs-Anaphern. Texttechnologische Informationsmodellierung und benachbarte linguistische Forschungskontexte. In: Marello, C., Hölker, K. (eds.) Dimensionen der Analyse von Texten und Diskursen, LIT Verlag (to appear 2011)
Google Scholar
Paraboni, I.: Generating references in hierarchical domains: the case of document deixis. PhD thesis, Information Technology Research Institute, University of Brighton (2003)
Google Scholar
Paraboni, I., van Deemter, K., Masthoff, J.: Generating referring expressions: Making referents easy to identify. Computational Linguistics 33(2), 229–254 (2007)
Article Google Scholar
Pianta, E., Bentivogli, L.: Annotating Discontinuous Structures in XML: the Multiword Case. In: Proceedings of LREC 2004 Workshop on “XML-based richly annotated corpora”, Lisbon, Portugal, pp. 30–37 (2004)
Google Scholar
Poesio, M., Diewald, N., Stührenberg, M., Chamberlain, J., Jettka, D., Goecke, D., Kruschwitz, U.: Markup infrastructure for the anaphoric bank: Supporting web collaboration. In: Mehler, A., Kühnberger, K.U., Lobin, H., Lüngen, H., Storrer, A., Witt, A. (eds.) Modelling, Learning and Processing of Text-Technological Data Structures. Springer, Berlin (2011)
Google Scholar
Power, R., Scott, D., Bouayad-Agha, N.: Document structure. Computational Linguistics 29(2), 211–260 (2003)
Article Google Scholar
Rizzi, R.: Complexity of context-free grammars with exceptions and the inadequacy of grammars as models for xml and sgml. Markup Languages – Theory & Practice 3(1), 107–116 (2001)
Article Google Scholar
Schonefeld, O.: XCONCUR and XCONCUR-CL: A constraint-based approach for the validation of concurrent markup. In: Rehm, G., Witt, A., Lemnitzer, L. (eds.) Datenstrukturen für linguistische Ressourcen und ihre Anwendungen. Data Structures for Linguistic Resources and Applications. Proceedings of the Biennial GLDV Conference 2007. Gunter Narr Verlag, Tübingen (2007)
Google Scholar
Schonefeld, O.: A simple API for XCONCUR. In: Proceedings of Balisage: The Markup Conference, Montréal, Québec (2008)
Google Scholar
Soon, W.M., Lim, D.C.Y., Ng, H.T.: A machine learning approach to coreference resolution of noun phrases. Computational Linguistics 27(4), 521–544 (2001)
Article Google Scholar
Sperberg-McQueen, C., Huitfeldt, C.: Markup discontinued discontinuity in texmecs, goddag structures, and rabbit/duck grammars. In: Proceedings of Balisage: The Markup Conference, Balisage Series on Markup Technologies, vol. 1 (2008)
Google Scholar
Sperberg-McQueen, C.M.: Rabbit/duck grammars: a validation method for overlapping structures. In: Proceedings of Extreme Markup Languages (2006)
Google Scholar
Sperberg-McQueen, C.M.: Representation of overlapping structures. In: Proceedings of Extreme Markup Languages (2007)
Google Scholar
Sperberg-McQueen, C.M., Huitfeldt, C.: GODDAG: A data structure for overlapping hierarchies. In: King, P., Munson, E.V. (eds.) PODDP 2000 and DDEP 2000. LNCS, vol. 2023, pp. 139–160. Springer, Heidelberg (2004)
Chapter Google Scholar
Strube, M., Müller, C.: A machine learning approach to pronoun resolution in spoken dialogue. In: ACL 2003: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics, Morristown, NJ, USA, pp. 168–175 (2003)
Google Scholar
Stührenberg, M., Goecke, D.: SGF – an integrated model for multiple annotations and its application in a linguistic domain. In: Proceedings of Balisage: The Markup Conference, Montréal, Québec (2008)
Google Scholar
Stührenberg, M., Jettka, D.: A toolkit for multi-dimensional markup: The development of SGF to XStandoff. In: Proceedings of Balisage: The Markup Conference, Montréal, Québec, Balisage Series on Markup Technologies, vol. 3 (2009)
Google Scholar
Tennison, J.: Layered markup and annotation language (LMNL). In: Proceedings of Extreme Markup Languages, Montréal, Québec (2002)
Google Scholar
Tennison, J.: Creole: Validating overlapping markup. In: Proceedings of XTech 2007: The Ubiquitous Web Conference, Paris, France (2007)
Google Scholar
Thompson, H.S., McKelvie, D.: Hyperlink semantics for standoff markup of read-only documents. In: Proceedings of SGML Europe 1997: The Next Decade –Pushing the Envelope, Barcelona, pp. 227–229 (1997)
Google Scholar
Vieira, R., Poesio, M.: An empirically based system for processing definite descriptions. Computational Linguistics 26(4), 539–593 (2001)
Article Google Scholar
Walsh, N., Muellner, L.: Doc-Book: The Definitive Guide. O’Reilly, Sebastopol (1999)
Google Scholar
Witt, A.: Meaning and interpretation of concurrent markup. In: Proceedings of ALLC-ACH 2002, Joint Conference of the ALLC and ACH, Tübingen (2002)
Google Scholar
Witt, A., Goecke, D., Sasaki, F., Lüngen, H.: Unification of XML Documents with Concurrent Markup. Literary and Lingustic Computing 20(1), 103–116 (2005)
Article Google Scholar
Witt, A., Schonefeld, O., Rehm, G., Khoo, J., Evang, K.: On the lossless transformation of single-file, multi-layer annotations into multi-rooted trees. In: Proceedings of Extreme Markup Languages, Montréal, Québec (2007)
Google Scholar
Witt, A., Rehm, G., Hinrichs, E., Lehmberg, T., Stegmann, J.: SusTEInability of linguistic resources through feature structures. Literary and Linguistic Computing (2009) (to appear)
Google Scholar
Yang, X., Su, J., Zhou, G., Tan, C.L.: Improving pronoun resolution by incorporating coreferential information of candidates. In: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL 2004), Barcelona, Spain (2004)
Google Scholar
Zacchiroli PMFVS: Towards the unification of formats for overlapping markup. New Review of Hypermedia and Multimedia 14(1):57–94 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Institut für Deutsche Sprache, Zentrale Forschung, R5, 6 - 13, D-68016, Mannheim, Germany
Andreas Witt
Faculty of Linguistics and Literary Studies, Bielefeld University, Universitätsstraße 25, D-33615, Bielefeld, Germany
Maik Stührenberg, Daniela Goecke & Dieter Metzing

Authors

Andreas Witt
View author publications
You can also search for this author in PubMed Google Scholar
Maik Stührenberg
View author publications
You can also search for this author in PubMed Google Scholar
Daniela Goecke
View author publications
You can also search for this author in PubMed Google Scholar
Dieter Metzing
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Linguistics and Literature, Bielefeld University, Universitätsstraße 25, 33615, Bielefeld, Germany
Alexander Mehler
Institute of Cognitive Science, University of Osnabrück, Albrechtstr. 28, 49076, Osnabrück, Germany
Kai-Uwe Kühnberger
Angewandte Sprachwissenschaft und, Justus-Liebig-Universität Gießen, Computerlinguistik, Otto-Behaghel-Straße 10D, 35394, Gießen, Germany
Henning Lobin & Harald Lüngen &
Institut für deutsche Sprache und Literatur, Technical University Dortmund, Emil-Figge-Straße 50, 44227, Dortmund, Germany
Angelika Storrer
SFB 441 Linguistic Data Structures, Eberhard Karls Universität Tübingen, Nauklerstraße 35, 72074, Tübingen, Germany
Andreas Witt

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Witt, A., Stührenberg, M., Goecke, D., Metzing, D. (2011). Integrated Linguistic Annotation Models and Their Application in the Domain of Antecedent Detection. In: Mehler, A., Kühnberger, KU., Lobin, H., Lüngen, H., Storrer, A., Witt, A. (eds) Modeling, Learning, and Processing of Text Technological Data Structures. Studies in Computational Intelligence, vol 370. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22613-7_11

Download citation

DOI: https://doi.org/10.1007/978-3-642-22613-7_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22612-0
Online ISBN: 978-3-642-22613-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Integrated Linguistic Annotation Models and Their Application in the Domain of Antecedent Detection

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Designing Annotation Schemes: From Model to Representation

The Evolution of Text Annotation Frameworks

A Model for Language Annotations on the Web

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Integrated Linguistic Annotation Models and Their Application in the Domain of Antecedent Detection

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Designing Annotation Schemes: From Model to Representation

The Evolution of Text Annotation Frameworks

A Model for Language Annotations on the Web

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation