Skip to main content

Community Efforts Around the ISOcat Data Category Registry

  • Chapter
  • First Online:
The People’s Web Meets NLP

Abstract

The ISOcat Data Category Registry provides a community computing environment for creating, storing, retrieving, harmonizing and standardizing data category specifications (DCs), used to register linguistic terms used in various fields. This chapter recounts the history of DC documentation in TC 37, beginning from paper-based lists created for lexicographers and terminologists and progressing to the development of a web-based resource for a much broader range of users. While describing the considerable strides that have been made to collect a very large comprehensive collection of DCs, it also outlines difficulties that have arisen in developing a fully operative web-based computing environment for achieving consensus on data category names, definitions, and selections and describes efforts to overcome some of the present shortcomings and to establish positive working procedures designed to engage a wide range of people involved in the creation of language resources.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    See http://www.isocat.org.

  2. 2.

    http://www.clarin.eu/cmdi/

  3. 3.

    http://metadata-standards.org/11179/

  4. 4.

    http://ontoiop.org/

  5. 5.

    http://www.w3.org/standards/semanticweb/

  6. 6.

    http://iso-commonlogic.org

  7. 7.

    See http://www.oxford-royale.co.uk/news/2010/12/04/new-online-edition-of-oxford-english-dictionary.html

  8. 8.

    http://lirics.loria.fr retrieved 2012-8-30

  9. 9.

    The email address of the sender is exposed to the recipient so only the first introduction email is mediated.

  10. 10.

    The Athens Core group was named after the first meeting of large number of metadata modellers for language resources, which took place in Athens in 2009. A series of (online) meetings resulted in a set of more than 200 metadata elements, which were made publically available in the ISOcat DCR.

  11. 11.

    See http://www.ttt.org/oscarstandards/tbx/tbx-basic.html

  12. 12.

    http://www.clarin.eu

  13. 13.

    Use of a Schema Registry (SCHEMAcat), will allow the storage of resource schemata persistently, each with a persistent identifier (PID) of its own. SCHEMAcat also allows the storage of different versions of a schema. ISOcat is related to SCHEMAcat, while there are also direct links between SCHEMAcat and RELcat, see also cf. [11]

  14. 14.

    The northern part of Belgium with Dutch as its official language.

  15. 15.

    Only for DCs reflecting standards or contained in CLARIN-accepted/recommended.

  16. 16.

    In such a case a new DC should be constructed (with the same name), the old one should get the status ‘superseded’ and be linked with the new one.

  17. 17.

    A third, assigning two tags, a function-driven plus a form-driven, is rarely used.

  18. 18.

    EAGLES: Expert Advisory Group on Language Engineering Standards, cf. especially http://www.ilc.cnr.it/EAGLES96/annotate/node24.html#SECTION00065000000000000000

  19. 19.

    Cf. http://semantic-annotation.uvt.nl/ISO-TimeML-08-13-2008-vankiyong.pdf, Sect. 3.1 (retrieved 2012-08-31)

  20. 20.

    For the time being, links with DCs for such concepts are mentioned in the note section.

  21. 21.

    In the future such a DC will be explicitly recognizable as such.

  22. 22.

    http://www.isocat.org/interface/index.html?view=CLARIN-NL/VL

References

  1. Budin G, Melby A (2000) Accessibility of multilingual terminological resources – current problems and prospects for the future. In: Proceedings of the second international conference on language resources and evaluation (LREC’00), Athens, Greece. ELRA

    Google Scholar 

  2. ISO (2008) Annex ST (normative) procedure for the development and maintenance of standards in database format. Technical report, International Organization of Standardization, Geneva, Switzerland

    Google Scholar 

  3. ISO:12200 (1999) Computer applications in terminology – machine-readable terminology interchange format (MARTIF). Technical report, International Organization of Standardization, Geneva, Switzerland

    Google Scholar 

  4. ISO:12620 (1999) Computer applications in terminology – Data categories. Technical report, International Organization of Standardization, Geneva, Switzerland

    Google Scholar 

  5. ISO:12620 (2009) Terminology and other language and content resources – specification of data categories and management of a Data Category Registry for language resources. Technical report, International Organization of Standardization, Geneva, Switzerland

    Google Scholar 

  6. ISO:16642 (2003) Computer applications in terminology – terminological markup framework. Technical report, International Organization of Standardization, Geneva, Switzerland

    Google Scholar 

  7. ISO:24611 (2012) Language resource management – morpho-syntactic annotation framework (MAF). Technical report, International Organization of Standardization, Geneva, Switzerland

    Google Scholar 

  8. ISO:30042 (2008) Systems to manage terminology, knowledge and content – TermBase eXchange (tbx). Technical report, International Organization of Standardization, Geneva, Switzerland

    Google Scholar 

  9. Kemps-Snijders M, Ducret J, Romary L, Wittenburg P (2006) An API for accessing the Data Category Registry. In: Proceedings of the fifth international conference on language resources and evaluation (LREC’06), Genoa, Italy. ELRA

    Google Scholar 

  10. Kuhn T (2000) The road since structure. University of Chicago Press, Chicago. Chap Commensurability, Comparability, Communicability

    Google Scholar 

  11. Schuurman I, Windhouwer M (2011) Explicit semantics for enriched documents. What do ISOcat, RELcat and SCHEMAcat have to offer? In: Proceedings of the second supporting digital humanities conference, Copenhagen, Denmark

    Google Scholar 

  12. Váradi T, Krauwer S, Wittenburg P, Wynne M, Koskenniemi K (2008) CLARIN: Common language resources and technology infrastructure. In: Proceedings of the sixth international conference on language resources and evaluation (LREC’08), Marrakech, Morocco. ELRA

    Google Scholar 

  13. Windhouwer M (2012) RELcat: a Relation registry for ISOcat data categories. In: Proceedings of the eighth international conference on language resources and evaluation (LREC’12), Istanbul, Turkey. ELRA

    Google Scholar 

  14. Wright SE, Budin G (2001) Handbook of terminology management: application-oriented terminology management. John Benjamins Publishing, Amsterdam

    Book  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Menzo Windhouwer .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Wright, S.E., Windhouwer, M., Schuurman, I., Kemps-Snijders, M. (2013). Community Efforts Around the ISOcat Data Category Registry. In: Gurevych, I., Kim, J. (eds) The People’s Web Meets NLP. Theory and Applications of Natural Language Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35085-6_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35085-6_13

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35084-9

  • Online ISBN: 978-3-642-35085-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics