Abstract
There are two different levels of interoperability for language resources: operational interoperability and conceptual interoperability. The former refers to the standardization of the formal aspects of language resources so that different resources can work together. The latter refers to the standardization of the notional representation of the semantic content of the analysis. This article addresses both issues but focuses on the latter through a description of the annotation and analysis of the International Corpus of English, which is a corpus for the study of English as a global language. The project is parameterised by component, regional sub-corpora and a set of pre-defined textual categories. The one-million-word British component has been constructed, grammatically tagged, and syntactically parsed. This article is first of all a description of steps taken to ensure conformity within the project. These include corpus design, part-of-speech tagging, and syntactic parsing. The article will then present a study that examines the use of adverbial clauses across speech and writing, illustrating the imminent necessity for interoperable analysis of linguistic data.
Similar content being viewed by others
Notes
The X axis in Fig. 3 has legends indicating the proportion of adverbial clauses in the following groups of samples in ICE–GB:
-
Spon: spontaneous conversations
-
Speech: complete spoken samples
-
Scripted: scripted broadcast news and talks
-
Timed: timed university essays
-
Writing: complete written samples
-
Untimed: untimed university essays.
-
References
Biber, D. (1988). Variation across speech and writing. Cambridge: Cambridge University Press.
Fang, A. C. (1996a). Grammatical tagging and cross-tagset mapping. In S. Greenbaum (Ed.), Comparing English worldwide: The international corpus of English (pp. 110–124). Oxford: Oxford University Press.
Fang, A. C. (1996b). The survey parser: Design and development. In S. Greenbaum (Ed.), Comparing English worldwide: The international corpus of English (pp. 142–160). Oxford: Oxford University Press.
Fang, A. C. (2000). From cases to rules and vice versa: robust practical parsing with analogy. In Proceedings of the sixth international workshop on parsing technologies, 23–25 February 2000, Trento, Italy, pp. 77–88.
Fang, A. C. (2008). Measuring a syntactically Rich Parser with an evaluation scheme for automatic speech recognition. In Proceedings of the first workshop on syntactic annotations for interoperable language resources, Hong Kong, 8 January 2008.
Greenbaum, S. (1992). A new corpus of English: ICE. In J. Svartvik (Ed), Directions in corpus linguistics: Proceedings of nobel symposium 82, Stockholm 4–8 August 199 (pp. 171–179). Berlin: Mouton de Gruyter.
Greenbaum, S. (1996). The international corpus of English. Oxford: Oxford University Press.
Greenbaum, S., & Ni, Y. (1996). About the ICE tagset. In S. Greenbaum (Ed.), Comparing English worldwide: The international corpus of English (pp. 92–109). Oxford: Oxford University Press.
Thompson, S. (1984). Subordination in formal and informal discourse. In D. Schffrin (Ed.), Meaning, form, and use in context: Linguistic applications (pp. 85–94). Washington DC: Georgetown University Press.
Witt, A., Heid, U., Sasaki, F., & Sérasset, G. (2009). Multilingual language resources and interoperability. Language Resource and Evaluation, 43, 1–14.
Acknowledgments
This work was supported in part by research grants from City University of Hong Kong (Project Nos 7002387, 7008002, 9610126 and 9610053).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Fang, A.C. Creating an interoperable language resource for interoperable linguistic studies. Lang Resources & Evaluation 46, 327–340 (2012). https://doi.org/10.1007/s10579-012-9189-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10579-012-9189-9