Skip to main content
Log in

GATE, a General Architecture for Text Engineering

Computers and the Humanities Aims and scope Submit manuscript

Abstract

This paper presents the design, implementation and evaluation of GATE, a General Architecture for Text Engineering.GATE lies at the intersection of human language computation and software engineering, and constitutes aninfrastructural system supporting research and development of languageprocessing software.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Appelt, D. “An Introduction to Information Extraction”. Artificial Intelligence Communications, 12(3) (1999), pp. 161–172.

    Google Scholar 

  • Bird, S. and M. Liberman. “A Formal Framework for Linguistic Annotation”. Technical Report MS-CIS-99-01, Department of Computer And Information Science, University of Pennsylvania. http://xxx.lanl.gov/-abs.cs.CL/9903003, 1999.

  • Booch, G. Object-Oriented Analysis and Design, 2nd Edn. Benjamin/Cummings, 1994.

  • Booch, G., J. Rumbaugh and I. Jacobson. The Unified Modelling Language User Guide. Addison-Wesley, Reading, MA, 1999.

    Google Scholar 

  • Brughman, H., A. Russel, P. Wittenburg and R. Piepenbrock. “Corpus-based Research Using and Internet”. In First International Conference on Language Resources and Evaluation (LREC) Workshop on Distributing and Accessing Linguistic Resources. Granada, Spain, 1998.

  • Brugman, H., H. Russel and P.Wittenburg. “An Infrastructure for Collaboratively Building and Using Multimedia Corpora in the Humaniora”. In Proceedings of the ED-MEDIA/ED-TELECOM Conference. Freiburg, 1998.

  • Burnett, M.,M. Baker, C. Bohus, P. Carlson, S. Yang and P. van Zee. “Scaling Up Visual Languages”. IEEE Computer, 28(3) (1987), pp. 45–54.

    Google Scholar 

  • Clements, P. and L. Northrop. “Software Architecture: An Executive Overview”. Technical Report CMU/SEI-96-TR-003, Software Engineering Institute, Carnegie Mellon University, 1996.

  • Cockburn, A. “Structuring Use Cases with Goals”. Journal of Object-Oriented Programming, Sept– Oct and Nov–Dec, 1997.

  • Cowie, J. and W. Lehnert. “Information Extraction”. Communications of the ACM, 39(1) (1996), pp. 80–91.

    Google Scholar 

  • Cunningham, H. “A Definition and Short History of Language Engineering”. Journal of Natural Language Engineering, 5(1) (1999a), pp. 1–16.

    Google Scholar 

  • Cunningham, H. “Information Extraction: A User Guide (revised version)”. Research Memorandum CS-99-07, Department of Computer Science, University of Sheffield, 1999b.

  • Cunningham, H. “JAPE: A Java Annotation Patterns Engine”. Research Memorandum CS-99-06, Department of Computer Science, University of Sheffield, 1999c.

  • Cunningham, H. “Software Architecture for Language Engineering”. Ph.D. thesis, University of Sheffield. http://gate.ac.uk/sale/thesis/, 2000.

  • Cunningham, H., K. Bontcheva, V. Tablan and Y. Wilks. “Software Infrastructure for Language Resources: A Taxonomy of Previous Work and a Requirements Analysis”. In Proceedings of the 2nd International Conference On Language Resources and Evaluation (LREC-2). Athens. http://gate.ac.uk/, 2000a.

  • Cunningham, H., M. Freeman and W. Black. “Software Reuse, Object-Oriented Frameworks and Natural Language Processing”. In New Methods in Language Processing (NeMLaP-1), September 1994. lManchester, (Re-published in book form 1997 by UCL Press), 1994.

  • Cunningham, H., R. Gaizauskas, K. Humphreys and Y. Wilks. “Experience with a Language Engineering Architecture: Three Years of GATE”. In Proceedings of the AISB'99 Workshop on Reference Architectures and Data Standards for NLP. Edinburgh, The Society for the Study of Artificial Intelligence and Simulation of Behaviour, 1999.

    Google Scholar 

  • Cunningham, H., R. Gaizauskas and Y.Wilks. “A General Architecture for Text Engineering (GATE) – a New Approach to Language Engineering R&D”. Technical Report CS-95-21, Department of Computer Science, University of Sheffield. http://xxx.lanl.gov/abs/cs.CL/9601009, 1995.

  • Cunningham, H., K. Humphreys, R. Gaizauskas and M. Stower. “CREOLE Developer's Manual”. Technical report, Department of Computer Science, University of Sheffield. http://www.dcs.shef.ac.uk/nlp/gate, 1996a.

  • Cunningham, H., K. Humphreys, R. Gaizauskas and Y. Wilks. “TIPSTER-Compatible Projects at Sheffield”. In Advance in Text Processing, TIPSTER Program Phase II. Morgan Kaufmann, California, 1996b.

    Google Scholar 

  • Cunningham, H., K. Humphreys, R. Gaizauskas and Y. Wilks. “GATE – a TIPSTER-based General Architecture for Text Engineering”. In Proceedings of the TIPSTER Text Program (Phase III) 6 Month Workshop. Morgan Kaufmann, California, 1997b.

    Google Scholar 

  • Cunningham, H., K. Humphreys, R. Gaizauskas and Y. Wilks. “Software Infrastructure for Natural Language Processing”. In Proceedings of the Fifth Conference on Applied Natural Language Processing (ANLP-97). http://xxx.lanl.gov/abs/cs.CL.9702005, 1997a.

  • Cunningham, H., D. Maynard, K. Bontcheva, V. Tablan and Y. Wilks. “Experience of Using GATE for NLP R&D”. In Proceedings of the Workshop on Using Toolsets and Architectures to Build NLP Systems at COLING-2000. Luxembourg. http://gate.ac.uk/, 2000b.

  • Cunningham, H., W. Peters, C. McCauley, K. Bontcheva and Y. Wilks. “A Level Playing Field for Language Resource Evaluation”. In Workshop on Distributing and Accessing Lexical Resources at Conference on Language Resources Evaluation. Granada, Spain, 1998a.

  • Cunningham, H.,M. Stevenson and Y.Wilks. “Implementing a Sense Tagger within a General Architecture for Language Engineering”. In Proceedings of the Third Conference on New Methods in Language Engineering (NeMLaP-3). Sydney, Australia, 1998b, pp. 59–72.

  • Cunningham, H., Y. Wilks and R. Gaizauskas. “GATE – a General. Architecture for Text Engineering”. In Proceedings of the 16th Conference on Computational Linguistics (COLING-96). Gopenhagen, 1996c.

  • Cunningham, H., Y. Wilks and R. Gaizauskas. “New Methods, Current Trends and Software Infrastructure for NLP”. In Proceedings of the Conference on New Methods in Natural Language Processing (NeMLaP-2). Bilkent University, Turkey. http://xxx.lanl.gov/abs/cs.CL/9607025, 1996d.

    Google Scholar 

  • Cunningham, H., Y. Wilks and R. Gaizauskas. “Software Infrastructure for Language Engineering”. In Proceedings of the AISB Workshop on Language Engineering for Document Analysis and Recognition. Brighton, U.K., 1996e.

  • Day, D., J. Aberdeen, L. Hirschman, R. Kozierok, P. Robinson and M. Vilain. “Mixed-Initiative Development of Language Processing Systems”. In Proceedings of the 5th Conference on Applied NLP Syatems (ANLP-97), 1997.

  • Day, D., P. Robinson, M. Vilain and A. Yeh. “MITEE: Description of the Alembic System Used for MUC-7”. In Proceedings of the Seventh Message Understanding Conference (MUC-7). http://www.itl.nist.giv/iaui/894.02/-related_project/muc/index.html, 1998.

  • Dybkjær, L., N. Bernsen, H. Dybkjær, D. McKelvie and A. Mengel. “The MATE Markup Framework. MATE Deliverable Dl.2”. Technical Report D1.2, MATE Project, http://mate.nis.sdu.dk/, 1998.

  • Eriksson, M. “Final Report of Svensk”. Technical report, SICS, http://www.sics.se/humle/ projects/svensk/, 1997.

  • Erikison, M. and B. Gambäck. “SVENSK: A Toolbox of Swedish Language Processing Resources”. In Proceedings of the 2nd Conference on Recent Advances in Natural Language Processing (RANLP-2). Tzigov Chark, Bulgaria, 1997.

  • Fowler, M. and K. Scott. UML Distilled. Addison-Welsey, Reading, MA, 1997.

    Google Scholar 

  • Fowler, M. and K. Scott. UML Distilled, Second Edition. Addison-Welsey, Reading, MA, 2000.

    Google Scholar 

  • Fröhlich, M. and M. Werner. “Demonstration of the Graph Visualization System daVinci”. In Proceedings of DIMACS Workshop on Graph Drawing’ 94, LNCS 894. Springer-Verlag, 1995.

  • Gaizauskas, R., H. Cunningham, Y. Wilks, P. Rodgers and K. Humphreys. “GATE – an Environment to Support Reaearch and Development in Natural Language Engineering”. In Proceedings of the 8th IEEE International Conference on Tool with Artificial Intelligence (ICTAI-96). Toulouse, France, 1996a.

  • Gaizauskas, R., P. Rodgers, H. Cunningham and K. Humphreys. “GATE User Guide”. http:// www.dcs.shef.ac.uk/nlp/gate, 1996b.

  • Gaizauskas, R., T. Wakao, K. Humpbreys, H. Cunningham and Y. Wilks. “Description of the LaSIE system as used for MUC-6”. In Proceedings of the Sixth Message Understanding Conference (MUC-6). Morgan Kaufmann, California, 1995.

    Google Scholar 

  • Gambäck, B. and F. Olason. “Experiences of Language Engineering Algorithm Reuse”. In Second International Conference on Language Resources and Evaluation (LREC). Athens, Greece, 2000, pp. 155–160.

  • Goldfarb, C. and P. Prescod. The XML Handbook. Prentice Hall, New York, 1998.

    Google Scholar 

  • Goldfarb, C.F. The SGML Handbook. Oxford University Press, 1990.

  • Gotoh, Y., S. Renals, R. Gaizauskas, G. Williams and H. Cunningham. “Named Entity Tagged Language Models for LVCSR”. Technical Report CS-98-05, Department of Computer Science, University of Sheffield, 1998.

  • Grishman, R. “TIPSTER Architecture Design Document Version 2.3”. Technical report, DARPA. http://www.itl.nist.gov/div894/894.02/-related_projects/tipster/, 1997.

  • Grishman, R. and B. Sundheim. “Message Understanding Conference – 6: A Brief History”. In Proceedings of the 16 International Conference on Computational Linguistics. Copenhagen, 1996.

  • Harrison, P. “Evluating Syntax Performance of Parsers/Grammars of English”. In Proceedings of the Workshop on Evaluating Natural Language Processing Systems, ACL, 1991.

  • Hayes-Roth, F. “Architecture-Based Acquisition and Development of Software: Guidelines and Recommendations from the ARPA Domain-Specific Software Architecture (DSSA) Program”. Technical report, Techknowledge Federal Systems. http://www.oswego.com/dssa/, visited 29th March 1999, 1994.

  • Jelinek, F. Statistical Methods for Speech Recognition. MIT Press, Cambridge, MA, 1997.

    Google Scholar 

  • Keijola, M. “BRIEFS-Gaining Information of Value in Dynamical Business Environments”. http://www.tuta.hut.fi/briefs, 1999.

  • Kokkinakis, D. “AVENTINUS, GATE and Swedish Lingware”. In Proceedings of the 11th NODALIDA Conference. Copenhagen, 1998, pp. 22–33.

  • Kokkinakis, D. and S. Johansson-Kokkinakis. “A Cascaded Finite-State Parser for Syntactic Analysis of Swedish”. Technical report, Department of Swedish, University of Göteborg, Göteborg, 1999.

    Google Scholar 

  • LREC-1. “Conference on Language Resources Evaluation (LREC-1)”. Granada, Spain, 1998.

  • Manning, C. and H. Schütze. Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press. Supporting materials available at http://www.sultry.arts.usyd.edu.au/fsnlp/, 1999.

    Google Scholar 

  • Maynard, D., H. Cunningham, K. Bontcheva, R. Catizone, G. Demetriou, R. Gaizauskas, O. Hamza, M. Hepple, P. Herring, B. Mitchell, M. Oakes, W. Peters, A. Setzer, M. Stevenson, V. Tablan, C. Ursu and Y. Wilks. “A Survey of Uses of GATE”. Technical Report CS-00-06, Department of Computer Science, University of Sheffield, 2000.

  • McEnery, A., P. Baker, R. Gaizauskas and H. Cunningham. “EMILLE: Building a Corpus of South Asian Languages”. Vivek, A Quarterly in Artificial Intelligence, 13(3) (2000), pp. 23–32.

    Google Scholar 

  • McKelvie, D., C. Brew and H. Thompson. “Using SGML as a Basis for Data-Intensive NLP”. In Proceedings of the fifth Conference on Applied Natural Language Processing (ANLP-97). Washington, DC, 1997.

  • McKelvie, D., C. Brew and H. Thompson. “Using SGML as a Basis for Data-Intensive Natural Language Processing”. Computers and the Humanities, 31(5) (1998), pp. 367–388.

    Google Scholar 

  • Nelson, T. “Embedded Markup Considered Harmful”. In XML: Principles, Tools and Techniques. Ed. D. Connolly, O'Reilly, Cambridge, MA, 1997, pp. 129–134.

    Google Scholar 

  • Olsson, F. “Tagging and Morphological Processing in the SVENSK System”. Master's thesis, University of Uppsala. http://http://stp.ling.uu.se/fredriko/exjobb.ps, 1997.

  • Olsson, F., B. Gambäck and M. Eriksson. “Reusing Swedish Language Processing Resources in SVENSK”. In Workshop on Minimising the Efforts for LR Acquistion. Granada, Spain, 1998.

  • Ousterhout, J. Tcl and the Tk Toolkit. Addison-Wesley, Reading, MA, 1994.

    Google Scholar 

  • Peter, W., H. Cunningham, C. McCauley, K. Bontcheva and Y. Wilks. “Uniform Language Resource Access and Distribution”. In Workshop on Distributing and Accessing Lexical Resources at Conference on Language Resources Evaluation. Granada, Spain, 1998.

  • Roche, E. and Y. Schabes. finite-State Language Processing. MIT Press, Cambridge, MA, 1997.

    Google Scholar 

  • Rodgers, P., R. Gaizauskas, K. Humphreys and H. Cunningham. “Visual Execution and Data Visualisation in Natural Language Processing”. In IEEE Visual Language. Capri, Italy, 1997.

  • Spyropoulos, C. “Final Report of the Greek Information Extraction (GIE) Project”. Technical report, NKSR Demokritus, Athens, 1999.

    Google Scholar 

  • Stevenson, M., H. Cunningham and Y. Wilks. “Sense Tagging and Language Engineering”. In Proceedings of the 13th European Conference on Artificial Intellingence (ECAI-98). Brighton, U.K., 1998, pp. 185–189.

  • The Unicode Consortium. The Unicode Standard, Version 2.0. Addison-Wesley, Reading, MA, 1996.

    Google Scholar 

  • Tracz, W. “Domain-Specific Software Architecture (DSSA) Frequently Asked Questions (FAQ)”. http://www.oswego.com/dssa/faq/faq.html, 1995.

  • Yourdon, E. Modern Structured Analysis. Prentice Hall, New York, 1989.

    Google Scholar 

  • Yourdon, E. The Rise and Resurrection of the American Programmer. Prentice Hall, New York, 1996.

    Google Scholar 

  • Zajac, R. “An Open Distributed Architecture for Reuse and Integration of Heterogenous NLP Components”. In Proceedings of the 5th Conference on Applied Natural Language Processing (ANLP-97), 1997.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cunningham, H. GATE, a General Architecture for Text Engineering. Computers and the Humanities 36, 223–254 (2002). https://doi.org/10.1023/A:1014348124664

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1014348124664

Navigation