Skip to main content

Advertisement

Log in

An ontology for accessing transcription systems

  • Original Paper
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

This paper presents the design and implementation of the Ontology for Accessing Transcription Systems (OATS), a knowledge base that supports interoperation over disparate transcription systems and practical orthographies. OATS uses RDF, SPARQL and Unicode to facilitate resource discovery and intelligent search over linguistic data. The knowledge base includes an ontological description of writing systems and relations for mapping transcription system segments to an interlingua pivot, the IPA. It includes orthographic and phonemic inventories from 203 African languages, which were mined from the Web. OATS is motivated by four use cases: querying data in the knowledge base via IPA, querying it in native orthography, error checking of digitized data, and conversion between transcription systems. The model in this paper implements each of these use cases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. ISO 639-3 language codes are in [ ].

  2. http://linguistics-ontology.org/.

  3. Phonemic and phonetic representations are given in the International Phonetic Alphabet (IPA).

  4. Practical orthographies are intended to jump-start written materials development by correlating a writing system with its sound units, making it easier for speakers to master and acquire literacy.

  5. Kohrt (1986) provides a history of the term grapheme.

  6. The phoneme /d/ has morphologically conditioned allographs <d> (word initial) or <r> (elsewhere) (McGill 2004).

  7. See Scannell, this volume, for discussion on the simplification of orthographies in African languages into plain ASCII.

  8. Guóji\({\overline{\hbox{a}}}\) Bi\({\overline{\hbox{a}}}\)ozhǔ, the national standard character set for the People’s Republic of China.

  9. ISO/IEC 1064.

  10. http://www.unicode.org/Public/UNIDATA/Scripts.txt.

  11. http://www.w3.org/2001/sw/.

  12. http://www.w3.org/XML/.

  13. http://www.w3.org/RDF/.

  14. http://www.w3.org/TR/rdf-sparql-query/.

  15. I have chosen to use a Document class instead of a Language class because transcription systems and phonemic inventories may vary between documents that describe the same language.

  16. http://www.language-archives.org/OLAC/metadata.html.

  17. http://dublincore.org/usage/terms/dcmitype/.

  18. http://sumale.vjf.cnrs.fr/phono/.

  19. For a list of these languages, see: http://phoible.org.

  20. http://protege.stanford.edu/.

  21. http://python.org.

  22. http://rdflib.net/.

  23. This is work that I am currently undertaking for the Phonetics Information Base and Lexicon project. See: http://phoible.org.

  24. http://phoible.org/oats/.

References

  • Avery, P., & Rice, K. (1989). Segment structure and coronal underspecification. Phonology, 6, 179–200.

    Article  Google Scholar 

  • Baader, F., Calvanese, D., McGuinness, D., Nardi, D., & Patel-Schneider, P. (2003). The description logic handbook: Theory, implementation, and applications. New York, NY: Cambridge University Press.

  • Baader, F., & Sattler, U. (2001). An overview of Tableau Algorithms for description logics. Studia Logica, 69(1), 5–40.

    Article  Google Scholar 

  • Baldwin, T., Bird, S., & Hughes, B. (2006). Collecting low-density language materials on the web. In Proceedings of the 12th Australasian world wide web conference (AusWeb06).

  • Beckett, D. (2004). RDF/XML syntax specification (Revised). Technical report, W3C.

  • Bird, S., & Simons, G. (2003). Seven dimensions of portability for language documentation and description. Language, 79(3), 557–582.

    Article  Google Scholar 

  • Blass, R. (1975). Sisaala-English, English-Sisaala dictionary. Tamale, Ghana: Institute of Linguistics.

    Google Scholar 

  • Bodomo, A. (1997). The structure of dagaare. Stanford monographs in African languages. Stanford, CA: CSLI Publications.

    Google Scholar 

  • Calvanese, D., De Giacomo, G., Lenzerini, M., & Nardi, D. (2001). Reasoning in expressive description logics. In Handbook of automated reasoning, vol. II, (pp. 1581–1634). Amsterdam: Elsevier.

  • Chomsky, N., & Halle, M. (1968). The sound pattern of English. New York, NY: Harper & Row.

    Google Scholar 

  • Clements, G. N., & Hume, E. (1995). The internal organization of speech sounds. In J. Goldsmith (Ed.), The handbook of phonological theory (pp. 245–306). Cambridge, MA: Blackwell.

    Google Scholar 

  • Clements, G. N. (1985). The geometry of phonological features. Phonology Yearbook, 2, 225–252.

    Article  Google Scholar 

  • Coulmas, F. (1999). The Blackwell encyclopedia of writing systems. Cambridge, MA: Blackwell.

    Book  Google Scholar 

  • Coulmas, F. (2003). Writing systems: An introduction to their analysis. Cambridge, UK: Cambridge University Press.

    Google Scholar 

  • Daniels, P., & Bright, W. (1996). The world's writing systems. New York, NY: Oxford University Press.

    Google Scholar 

  • Farrar, S., & Langendoen, T. (2003). A linguistic ontology for the semantic web. GLOT, 7(3), 97–100.

    Google Scholar 

  • Farrar, S., & Lewis, W. (2005). The GOLD community of practice: An infrastructure for linguistic data on the web. In E-MELD 2005: Workshop on morphosyntactic annotation and terminology: Linguistic ontologies and data categories for language resources.

  • Gibbon, D., Hughes, B., & Trippel, T. (2005). Semantic decomposition of character encodings for linguistic knowledge discovery. In Proceedings of the 29th annual conference of the Gesellschaft für Klassifikation.

  • Gibbon, D., Hughes, B., & Trippel, T. (2007). The computational semantics of characters. In Proceedings of the 7th international workshop on computational semantics.

  • Gruber, T. R. (1993). A translation approach to portable ontology specifications. Knowledge Acquisition, 5, 199–220.

    Article  Google Scholar 

  • Hartell, R. (1993). Alphabets des langues africaines. Dakar: UNESCO.

    Google Scholar 

  • Jakobson, R., Fant, G., & Halle, M. (1952). Preliminaries to speech analysis. Cambridge, MA: MIT Press.

    Google Scholar 

  • Kohrt, M. (1986). The term ‘Grapheme’ in the history and theory of linguistics. In G. Augst (Ed.), New trends in graphemics and orthography (pp. 80–96). Berlin: de Gruyter.

  • Lewis, W. (2006). ODIN: A model for adapting and enriching legacy infrastructure. In Proceedings of the e-Humanities workshop 2006: 2nd IEEE international conference on e-Science and grid computing.

  • McCarthy J. (1988) Feature geometry and dependency: A review. Phonetica, 45, 84–108

    Article  Google Scholar 

  • McGill, S. (2004). Focus and activation in paasaal: The particle rε. Master’s thesis, University of Reading.

  • Mcgill, S., Fembeti, S., & Toupin, M. (1999) A grammar of Sisaala-Pasaale. Ghana: University of Ghana.

    Google Scholar 

  • Moran, S. (2008). A grammatical sketch of Isaalo (Western Sisaala). Saarbrücken: VDM Verlag Dr Müller.

    Google Scholar 

  • Sagey, E. (1986). The representation of features and relations in non-linear phonology. Ph.D. thesis, MIT.

  • Sampson, G. (1985) Writing systems. Stanford, CA: Stanford University Press.

    Google Scholar 

  • Sowa, J. (2000). Knowledge representation. Pacific Grove, CA: Brooks/Cole

    Google Scholar 

  • Sproat, R. (2000). A computational theory of writing systems. Cambridge, UK: Cambridge University Press.

    Google Scholar 

  • The Unicode Consortium. (2007). The unicode standard, Version 5.0, defined by: The Unicode Standard, Version 5.0.

  • Toupin, M. (1995). The phonology of Sisaale-Pasaale. In Collected language notes, vol. 22. Ghana Institute of Linguistics, Literacy and Bible Translation.

  • Yergeau, F. (2006). Extensible markup language (XML) 1.0 (Fourth Edition).

  • Zuraw, K. (2006). Using the web as a phonological corpus: A case study from tagalog. In Proceedings of the 2nd international workshop on web as corpus.

Download references

Acknowledgments

This work was supported in part by the Max-Planck-Institut für evolutionäre Anthropologie and thanks go to Bernard Comrie, Jeff Good and Michael Cysouw. For helpful comments and reviews, I thank Emily Bender, Scott Farrar, Sharon Hargus, Will Lewis, Dan McCloy, Richard Wright, and three anonymous reviewers.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Steven Moran.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Moran, S. An ontology for accessing transcription systems. Lang Resources & Evaluation 45, 345–360 (2011). https://doi.org/10.1007/s10579-011-9158-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-011-9158-8

Keywords

Navigation