Skip to main content
Log in

Bridging the gaps: interoperability for language engineering architectures using GrAF

  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

This paper explores interoperability for data represented using the Graph Annotation Framework (GrAF) (Ide and Suderman, 2007) and the data formats utilized by two general-purpose annotation systems: the General Architecture for Text Engineering (GATE) (Cunningham et al., 2002) and the Unstructured Information Management Architecture (UIMA) (Ferrucci and Lally in Nat Lang Eng 10(3–4):327–348, 2004). GrAF is intended to serve as a “pivot” to enable interoperability among different formats, and both GATE and UIMA are at least implicitly designed with an eye toward interoperability with other formats and tools. We describe the steps required to perform a round-trip rendering from GrAF to GATE and GrAF to UIMA CAS and back again, and outline the commonalities as well as the differences and gaps that came to light in the process.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. http://www.oasis-open.org/committees/uima/.

  2. http://incubator.apache.org/uima/index.html.

  3. http://www.anc.org.

  4. http://framenet.icsi.berkeley.edu/.

  5. http://www.anc.org.

  6. XML Corpus Encoding Standard, http://www.xces.org.

  7. http://xaira.sourceforge.net/.

  8. http://www.athel.com/mono.html.

  9. http://ifarm.nl/signll/conll;

  10. http://www.anc.org/graf-api.

  11. http://www.graphviz.org/.

  12. Efficient algorithms for graph merging exist; see, e.g., Habib et al. (2000).

References

  • Bird, S., & Liberman, M. (2001). A formal framework for linguistic annotation. Speech Communication, 33(1–2), 23–60.

    Article  Google Scholar 

  • Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with python (1st ed.). Sebastopol, CA: O’Reilly Media.

    Google Scholar 

  • Bontcheva, K., Tablan, V., Maynard, D., & Cunningham, H. (2004). Evolving GATE to meet new challenges in language engineering. Natural Language Engineering, 10(3–4), 349–373.

    Article  Google Scholar 

  • Bunescu, R. C., & Mooney, R. J. (2007). Extracting relations from text: From word sequences to dependency paths. In: A. Kao & S. Poteet (Eds.), Text mining and natural language processing (pp. 29–44). Berlin: Springer.

    Chapter  Google Scholar 

  • Cotton, S., & Bird, S. (2002). An integrated framework for treebanks and multilayer annotations. In Proceedings of the Third International Conference on Language Resources and Evaluation.

  • Cui, H., Sun, R., Li, K., yen Kan, M., & seng Chua, T. (2005). Question answering passage retrieval using dependency relations. In: SIGIR 2005 (pp. 400–407). New York, NY: ACM Press.

  • Cunningham, H., Maynard, D., Bontcheva, K., & Tablan, V. (2002). GATE: A framework and graphical development environment for robust nlp tools and applications. In: Proceedings of ACL’02.

  • Ferrucci, D., & Lally, A. (2004). UIMA: An architectural approach to unstructured information processing in the corporate research environment. Natural Language Engineering, 10(3–4), 327–348.

    Article  Google Scholar 

  • Gabrilovich, E., & Markovitch, S. (2007). Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: Proceedings of the 20th International Joint Conference on artificial intelligence (pp. 1606–1611).

  • Grishman, R. (1997). TIPSTER architecture design document version 2.3, technical report, DARPA.

  • Habib, M., Paul, C., & Viennot, L. (2000). Partition refinement techniques: An interesting algorithmic tool kit. International Journal of Foundations of Computer Science 175.

  • Ide, N., & Bunt, H. (2010). Anatomy of annotation schemes: Mapping to GrAF. In: Proceedings of the Fourth Linguistic Annotation Workshop (pp. 247–255). Uppsala, Sweden: Association for Computational Linguistics.

  • Ide, N., & Romary, L. (2004). International standard for a linguistic annotation framework. Journal of Natural Language Engineering, 10(3–4), 211–225.

    Article  Google Scholar 

  • Ide, N., & Suderman, K. (2007). GrAF: A graph-based format for linguistic annotations. In: Proceedings of the linguistic annotation workshop (pp. 1–8). Uppsala, Sweden: Association for Computational Linguistics.

  • Ide, N., Bonhomme, P., & Romary, L. (2000). XCES: An XML-based encoding standard for linguistic corpora. In: Proceedings of the Second International Language Resources and Evaluation Conference. Paris: European Language Resources Association.

  • Ide, N., Baker, C., Fellbaum, C., & Passonneau, R. (2010a). The Manually Annotated Sub-Corpus: A community resource for and by the people. In: Proceedings of the ACL 2010 Conference Short Papers, Association for Computational Linguistics (pp. 68–73) Uppsala, Sweden.

  • Ide, N., Suderman, K., & Simms, B. (2010b). ANC2Go: A web application for customized corpus creation. In: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC). Valletta, Malta: European Language Resources Association.

  • ISO. (2008). Language resource management—linguistic annotation framework. ISO Document WD 24611.

  • Nguyen, D. P. T., Matsuo, Y., & Ishizuka, M. (2007). Exploiting syntactic and semantic information for relation extraction from Wikipedia. In: IJCAI’ 07 Workshop on Text-Mining and Link-Analysis (TextLink 2007).

Download references

Acknowledgments

This work was supported by an IBM UIMA Innovation Award and National Science Foundation grant INT-0753069.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nancy Ide.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ide, N., Suderman, K. Bridging the gaps: interoperability for language engineering architectures using GrAF. Lang Resources & Evaluation 46, 75–89 (2012). https://doi.org/10.1007/s10579-011-9175-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-011-9175-7

Keywords

Navigation