Abstract
The ISOcat Data Category Registry provides a community computing environment for creating, storing, retrieving, harmonizing and standardizing data category specifications (DCs), used to register linguistic terms used in various fields. This chapter recounts the history of DC documentation in TC 37, beginning from paper-based lists created for lexicographers and terminologists and progressing to the development of a web-based resource for a much broader range of users. While describing the considerable strides that have been made to collect a very large comprehensive collection of DCs, it also outlines difficulties that have arisen in developing a fully operative web-based computing environment for achieving consensus on data category names, definitions, and selections and describes efforts to overcome some of the present shortcomings and to establish positive working procedures designed to engage a wide range of people involved in the creation of language resources.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
http://lirics.loria.fr retrieved 2012-8-30
- 9.
The email address of the sender is exposed to the recipient so only the first introduction email is mediated.
- 10.
The Athens Core group was named after the first meeting of large number of metadata modellers for language resources, which took place in Athens in 2009. A series of (online) meetings resulted in a set of more than 200 metadata elements, which were made publically available in the ISOcat DCR.
- 11.
- 12.
- 13.
Use of a Schema Registry (SCHEMAcat), will allow the storage of resource schemata persistently, each with a persistent identifier (PID) of its own. SCHEMAcat also allows the storage of different versions of a schema. ISOcat is related to SCHEMAcat, while there are also direct links between SCHEMAcat and RELcat, see also cf. [11]
- 14.
The northern part of Belgium with Dutch as its official language.
- 15.
Only for DCs reflecting standards or contained in CLARIN-accepted/recommended.
- 16.
In such a case a new DC should be constructed (with the same name), the old one should get the status ‘superseded’ and be linked with the new one.
- 17.
A third, assigning two tags, a function-driven plus a form-driven, is rarely used.
- 18.
EAGLES: Expert Advisory Group on Language Engineering Standards, cf. especially http://www.ilc.cnr.it/EAGLES96/annotate/node24.html#SECTION00065000000000000000
- 19.
Cf. http://semantic-annotation.uvt.nl/ISO-TimeML-08-13-2008-vankiyong.pdf, Sect. 3.1 (retrieved 2012-08-31)
- 20.
For the time being, links with DCs for such concepts are mentioned in the note section.
- 21.
In the future such a DC will be explicitly recognizable as such.
- 22.
References
Budin G, Melby A (2000) Accessibility of multilingual terminological resources – current problems and prospects for the future. In: Proceedings of the second international conference on language resources and evaluation (LREC’00), Athens, Greece. ELRA
ISO (2008) Annex ST (normative) procedure for the development and maintenance of standards in database format. Technical report, International Organization of Standardization, Geneva, Switzerland
ISO:12200 (1999) Computer applications in terminology – machine-readable terminology interchange format (MARTIF). Technical report, International Organization of Standardization, Geneva, Switzerland
ISO:12620 (1999) Computer applications in terminology – Data categories. Technical report, International Organization of Standardization, Geneva, Switzerland
ISO:12620 (2009) Terminology and other language and content resources – specification of data categories and management of a Data Category Registry for language resources. Technical report, International Organization of Standardization, Geneva, Switzerland
ISO:16642 (2003) Computer applications in terminology – terminological markup framework. Technical report, International Organization of Standardization, Geneva, Switzerland
ISO:24611 (2012) Language resource management – morpho-syntactic annotation framework (MAF). Technical report, International Organization of Standardization, Geneva, Switzerland
ISO:30042 (2008) Systems to manage terminology, knowledge and content – TermBase eXchange (tbx). Technical report, International Organization of Standardization, Geneva, Switzerland
Kemps-Snijders M, Ducret J, Romary L, Wittenburg P (2006) An API for accessing the Data Category Registry. In: Proceedings of the fifth international conference on language resources and evaluation (LREC’06), Genoa, Italy. ELRA
Kuhn T (2000) The road since structure. University of Chicago Press, Chicago. Chap Commensurability, Comparability, Communicability
Schuurman I, Windhouwer M (2011) Explicit semantics for enriched documents. What do ISOcat, RELcat and SCHEMAcat have to offer? In: Proceedings of the second supporting digital humanities conference, Copenhagen, Denmark
Váradi T, Krauwer S, Wittenburg P, Wynne M, Koskenniemi K (2008) CLARIN: Common language resources and technology infrastructure. In: Proceedings of the sixth international conference on language resources and evaluation (LREC’08), Marrakech, Morocco. ELRA
Windhouwer M (2012) RELcat: a Relation registry for ISOcat data categories. In: Proceedings of the eighth international conference on language resources and evaluation (LREC’12), Istanbul, Turkey. ELRA
Wright SE, Budin G (2001) Handbook of terminology management: application-oriented terminology management. John Benjamins Publishing, Amsterdam
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Wright, S.E., Windhouwer, M., Schuurman, I., Kemps-Snijders, M. (2013). Community Efforts Around the ISOcat Data Category Registry. In: Gurevych, I., Kim, J. (eds) The People’s Web Meets NLP. Theory and Applications of Natural Language Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35085-6_13
Download citation
DOI: https://doi.org/10.1007/978-3-642-35085-6_13
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35084-9
Online ISBN: 978-3-642-35085-6
eBook Packages: Computer ScienceComputer Science (R0)