Definitional and human constraints on structural annotation of English*

GEOFFREY SAMPSON; ANNA BABARCZY

doi:10.1017/S1351324908004695

Definitional and human constraints on structural annotation of English*

Published online by Cambridge University Press: 01 October 2008

GEOFFREY SAMPSON and

ANNA BABARCZY

Show author details

GEOFFREY SAMPSON: Affiliation:
Department of Informatics, University of Sussex, Falmer, Brighton, BN1 9QJ, England e-mail: grs2@sussex.ac.uk
ANNA BABARCZY: Affiliation:
Department of Cognitive Science, Budapest University of Technology & Economics, 1111 Budapest, Stoczek utca 2, Hungary e-mail: babarczy@cogsci.bme.hu

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

The limits on predictability and refinement of English structural annotation are examined by comparing independent annotations, by experienced analysts using the same detailed published guidelines, of a common sample of written texts. Three conclusions emerge. First, while it is not easy to define watertight boundaries between the categories of a comprehensive structural annotation scheme, limits on inter-annotator agreement are in practice set more by the difficulty of conforming to a well-defined scheme than by the difficulty of making a scheme well defined. Secondly, although usage is often structurally ambiguous, commonly the alternative analyses are logical distinctions without a practical difference – which raises questions about the role of grammar in human linguistic behaviour. Finally, one specific area of annotation is strikingly more problematic than any other area examined, though this area (classifying the functions of clause-constituents) seems a particularly significant one for human language use. These findings should be of interest both to computational linguists and to students of language as an aspect of human cognition.

Type: Papers
Information: Natural Language Engineering , Volume 14 , Issue 4 , October 2008 , pp. 471 - 494

DOI: https://doi.org/10.1017/S1351324908004695 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2008

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Babarczy, Anna, Carroll, J. A. and Sampson, G. R. 2006. Definitional, personal, and mechanical constraints on part of speech annotation performance. Journal of Natural Language Engineering 12: 77–90.CrossRef Google Scholar

Bird, S. and Liberman, M. 2001. Linguistic annotation. www.ldc.upenn.edu/annotation/Google Scholar

Fillmore, C. J. 1968. The case for case. In Bach, E. and Harms, R. T. (eds.), Universals in Linguistic Theory, Holt, Rinehart & Winston, pp. 0–88.Google Scholar

Gildea, D. and Jurafsky, D. 2002. Automatic labeling of semantic roles. Computational Linguistics 28: 245–88.CrossRef Google Scholar

Kübler, Sandra and Telljohann, J. 2002. Towards a dependency-oriented evaluation for partial parsing. In Proceedings of the Workshop ‘Beyond Parseval – Towards Improved Evaluation Measures for Parsing Systems’ LREC 2002, Las Palmas, 2 June 2002, pp. 9–16.Google Scholar

Manning, C.D. and Schütze, H. 1999. Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press.Google Scholar

Màrquez, Ll., Surdeanu, M., Comas, P. and Turmo, J. 2005. A robust combination strategy for semantic role labeling. In Proceedings of the Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP 2005). Vancouver, pp. 644–51.Google Scholar

Ruppenhofer, J., Ellsworth, M., Petruck, , Miriam, R. L. and Johnson, C. R. 2005. FrameNet: Theory and Practice. framenet.icsi.berkeley.edu/book/book.html Google Scholar

Sampson, G. R. 1995. English for the Computer: The SUSANNE Corpus and Annotation Scheme. Oxford: Clarendon Press (Oxford University Press).CrossRef Google Scholar

Sampson, G. R. 2000. A proposal for improving the measurement of parse accuracy. International Journal of Corpus Linguistics 5: 53–68.CrossRef Google Scholar

Sampson, G. R. 2001. Demographic correlates of complexity in English speech. In Sampson, G.R. (ed), Empirical Linguistics. London: Continuum, pp. 57–73.Google Scholar

Sampson, G. R. and Babarczy, Anna. 2003. A test of the leaf-ancestor metric for parse accuracy. Journal of Natural Language Engineering 9: 365–80.CrossRef Google Scholar

Sapir, E. 1921. Language. New York: Harcourt, Brace & World.Google Scholar

Stockwell, R. P., Schachter, P. and Partee, B. H. 1973. The Major Syntactic Structures of English. New York: Holt, Rinehart & Winston.Google Scholar

Xue, N. and Palmer, Marta. 2004. Calibrating features for semantic role labeling. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2004). Barcelona, pp. 88–94.Google Scholar

Xue, N., Xia, Fei, Chiou, Fu-Dong, and Palmer, Marta. 2005. The Penn Chinese TreeBank: phrase structure annotation of a large corpus. Journal of Natural Language Engineering 11: 207–38.CrossRef Google Scholar

Article contents

Definitional and human constraints on structural annotation of English*

Abstract

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests