Abstract
This paper explores interoperability for data represented using the Graph Annotation Framework (GrAF) (Ide and Suderman, 2007) and the data formats utilized by two general-purpose annotation systems: the General Architecture for Text Engineering (GATE) (Cunningham et al., 2002) and the Unstructured Information Management Architecture (UIMA) (Ferrucci and Lally in Nat Lang Eng 10(3–4):327–348, 2004). GrAF is intended to serve as a “pivot” to enable interoperability among different formats, and both GATE and UIMA are at least implicitly designed with an eye toward interoperability with other formats and tools. We describe the steps required to perform a round-trip rendering from GrAF to GATE and GrAF to UIMA CAS and back again, and outline the commonalities as well as the differences and gaps that came to light in the process.
Similar content being viewed by others
Notes
XML Corpus Encoding Standard, http://www.xces.org.
Efficient algorithms for graph merging exist; see, e.g., Habib et al. (2000).
References
Bird, S., & Liberman, M. (2001). A formal framework for linguistic annotation. Speech Communication, 33(1–2), 23–60.
Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with python (1st ed.). Sebastopol, CA: O’Reilly Media.
Bontcheva, K., Tablan, V., Maynard, D., & Cunningham, H. (2004). Evolving GATE to meet new challenges in language engineering. Natural Language Engineering, 10(3–4), 349–373.
Bunescu, R. C., & Mooney, R. J. (2007). Extracting relations from text: From word sequences to dependency paths. In: A. Kao & S. Poteet (Eds.), Text mining and natural language processing (pp. 29–44). Berlin: Springer.
Cotton, S., & Bird, S. (2002). An integrated framework for treebanks and multilayer annotations. In Proceedings of the Third International Conference on Language Resources and Evaluation.
Cui, H., Sun, R., Li, K., yen Kan, M., & seng Chua, T. (2005). Question answering passage retrieval using dependency relations. In: SIGIR 2005 (pp. 400–407). New York, NY: ACM Press.
Cunningham, H., Maynard, D., Bontcheva, K., & Tablan, V. (2002). GATE: A framework and graphical development environment for robust nlp tools and applications. In: Proceedings of ACL’02.
Ferrucci, D., & Lally, A. (2004). UIMA: An architectural approach to unstructured information processing in the corporate research environment. Natural Language Engineering, 10(3–4), 327–348.
Gabrilovich, E., & Markovitch, S. (2007). Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: Proceedings of the 20th International Joint Conference on artificial intelligence (pp. 1606–1611).
Grishman, R. (1997). TIPSTER architecture design document version 2.3, technical report, DARPA.
Habib, M., Paul, C., & Viennot, L. (2000). Partition refinement techniques: An interesting algorithmic tool kit. International Journal of Foundations of Computer Science 175.
Ide, N., & Bunt, H. (2010). Anatomy of annotation schemes: Mapping to GrAF. In: Proceedings of the Fourth Linguistic Annotation Workshop (pp. 247–255). Uppsala, Sweden: Association for Computational Linguistics.
Ide, N., & Romary, L. (2004). International standard for a linguistic annotation framework. Journal of Natural Language Engineering, 10(3–4), 211–225.
Ide, N., & Suderman, K. (2007). GrAF: A graph-based format for linguistic annotations. In: Proceedings of the linguistic annotation workshop (pp. 1–8). Uppsala, Sweden: Association for Computational Linguistics.
Ide, N., Bonhomme, P., & Romary, L. (2000). XCES: An XML-based encoding standard for linguistic corpora. In: Proceedings of the Second International Language Resources and Evaluation Conference. Paris: European Language Resources Association.
Ide, N., Baker, C., Fellbaum, C., & Passonneau, R. (2010a). The Manually Annotated Sub-Corpus: A community resource for and by the people. In: Proceedings of the ACL 2010 Conference Short Papers, Association for Computational Linguistics (pp. 68–73) Uppsala, Sweden.
Ide, N., Suderman, K., & Simms, B. (2010b). ANC2Go: A web application for customized corpus creation. In: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC). Valletta, Malta: European Language Resources Association.
ISO. (2008). Language resource management—linguistic annotation framework. ISO Document WD 24611.
Nguyen, D. P. T., Matsuo, Y., & Ishizuka, M. (2007). Exploiting syntactic and semantic information for relation extraction from Wikipedia. In: IJCAI’ 07 Workshop on Text-Mining and Link-Analysis (TextLink 2007).
Acknowledgments
This work was supported by an IBM UIMA Innovation Award and National Science Foundation grant INT-0753069.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ide, N., Suderman, K. Bridging the gaps: interoperability for language engineering architectures using GrAF. Lang Resources & Evaluation 46, 75–89 (2012). https://doi.org/10.1007/s10579-011-9175-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10579-011-9175-7