Abstract
This paper presents the design and implementation of the Ontology for Accessing Transcription Systems (OATS), a knowledge base that supports interoperation over disparate transcription systems and practical orthographies. OATS uses RDF, SPARQL and Unicode to facilitate resource discovery and intelligent search over linguistic data. The knowledge base includes an ontological description of writing systems and relations for mapping transcription system segments to an interlingua pivot, the IPA. It includes orthographic and phonemic inventories from 203 African languages, which were mined from the Web. OATS is motivated by four use cases: querying data in the knowledge base via IPA, querying it in native orthography, error checking of digitized data, and conversion between transcription systems. The model in this paper implements each of these use cases.
Similar content being viewed by others
Notes
ISO 639-3 language codes are in [ ].
Phonemic and phonetic representations are given in the International Phonetic Alphabet (IPA).
Practical orthographies are intended to jump-start written materials development by correlating a writing system with its sound units, making it easier for speakers to master and acquire literacy.
Kohrt (1986) provides a history of the term grapheme.
The phoneme /d/ has morphologically conditioned allographs <d> (word initial) or <r> (elsewhere) (McGill 2004).
See Scannell, this volume, for discussion on the simplification of orthographies in African languages into plain ASCII.
Guóji\({\overline{\hbox{a}}}\) Bi\({\overline{\hbox{a}}}\)ozhǔ, the national standard character set for the People’s Republic of China.
ISO/IEC 1064.
I have chosen to use a Document class instead of a Language class because transcription systems and phonemic inventories may vary between documents that describe the same language.
For a list of these languages, see: http://phoible.org.
This is work that I am currently undertaking for the Phonetics Information Base and Lexicon project. See: http://phoible.org.
References
Avery, P., & Rice, K. (1989). Segment structure and coronal underspecification. Phonology, 6, 179–200.
Baader, F., Calvanese, D., McGuinness, D., Nardi, D., & Patel-Schneider, P. (2003). The description logic handbook: Theory, implementation, and applications. New York, NY: Cambridge University Press.
Baader, F., & Sattler, U. (2001). An overview of Tableau Algorithms for description logics. Studia Logica, 69(1), 5–40.
Baldwin, T., Bird, S., & Hughes, B. (2006). Collecting low-density language materials on the web. In Proceedings of the 12th Australasian world wide web conference (AusWeb06).
Beckett, D. (2004). RDF/XML syntax specification (Revised). Technical report, W3C.
Bird, S., & Simons, G. (2003). Seven dimensions of portability for language documentation and description. Language, 79(3), 557–582.
Blass, R. (1975). Sisaala-English, English-Sisaala dictionary. Tamale, Ghana: Institute of Linguistics.
Bodomo, A. (1997). The structure of dagaare. Stanford monographs in African languages. Stanford, CA: CSLI Publications.
Calvanese, D., De Giacomo, G., Lenzerini, M., & Nardi, D. (2001). Reasoning in expressive description logics. In Handbook of automated reasoning, vol. II, (pp. 1581–1634). Amsterdam: Elsevier.
Chomsky, N., & Halle, M. (1968). The sound pattern of English. New York, NY: Harper & Row.
Clements, G. N., & Hume, E. (1995). The internal organization of speech sounds. In J. Goldsmith (Ed.), The handbook of phonological theory (pp. 245–306). Cambridge, MA: Blackwell.
Clements, G. N. (1985). The geometry of phonological features. Phonology Yearbook, 2, 225–252.
Coulmas, F. (1999). The Blackwell encyclopedia of writing systems. Cambridge, MA: Blackwell.
Coulmas, F. (2003). Writing systems: An introduction to their analysis. Cambridge, UK: Cambridge University Press.
Daniels, P., & Bright, W. (1996). The world's writing systems. New York, NY: Oxford University Press.
Farrar, S., & Langendoen, T. (2003). A linguistic ontology for the semantic web. GLOT, 7(3), 97–100.
Farrar, S., & Lewis, W. (2005). The GOLD community of practice: An infrastructure for linguistic data on the web. In E-MELD 2005: Workshop on morphosyntactic annotation and terminology: Linguistic ontologies and data categories for language resources.
Gibbon, D., Hughes, B., & Trippel, T. (2005). Semantic decomposition of character encodings for linguistic knowledge discovery. In Proceedings of the 29th annual conference of the Gesellschaft für Klassifikation.
Gibbon, D., Hughes, B., & Trippel, T. (2007). The computational semantics of characters. In Proceedings of the 7th international workshop on computational semantics.
Gruber, T. R. (1993). A translation approach to portable ontology specifications. Knowledge Acquisition, 5, 199–220.
Hartell, R. (1993). Alphabets des langues africaines. Dakar: UNESCO.
Jakobson, R., Fant, G., & Halle, M. (1952). Preliminaries to speech analysis. Cambridge, MA: MIT Press.
Kohrt, M. (1986). The term ‘Grapheme’ in the history and theory of linguistics. In G. Augst (Ed.), New trends in graphemics and orthography (pp. 80–96). Berlin: de Gruyter.
Lewis, W. (2006). ODIN: A model for adapting and enriching legacy infrastructure. In Proceedings of the e-Humanities workshop 2006: 2nd IEEE international conference on e-Science and grid computing.
McCarthy J. (1988) Feature geometry and dependency: A review. Phonetica, 45, 84–108
McGill, S. (2004). Focus and activation in paasaal: The particle rε. Master’s thesis, University of Reading.
Mcgill, S., Fembeti, S., & Toupin, M. (1999) A grammar of Sisaala-Pasaale. Ghana: University of Ghana.
Moran, S. (2008). A grammatical sketch of Isaalo (Western Sisaala). Saarbrücken: VDM Verlag Dr Müller.
Sagey, E. (1986). The representation of features and relations in non-linear phonology. Ph.D. thesis, MIT.
Sampson, G. (1985) Writing systems. Stanford, CA: Stanford University Press.
Sowa, J. (2000). Knowledge representation. Pacific Grove, CA: Brooks/Cole
Sproat, R. (2000). A computational theory of writing systems. Cambridge, UK: Cambridge University Press.
The Unicode Consortium. (2007). The unicode standard, Version 5.0, defined by: The Unicode Standard, Version 5.0.
Toupin, M. (1995). The phonology of Sisaale-Pasaale. In Collected language notes, vol. 22. Ghana Institute of Linguistics, Literacy and Bible Translation.
Yergeau, F. (2006). Extensible markup language (XML) 1.0 (Fourth Edition).
Zuraw, K. (2006). Using the web as a phonological corpus: A case study from tagalog. In Proceedings of the 2nd international workshop on web as corpus.
Acknowledgments
This work was supported in part by the Max-Planck-Institut für evolutionäre Anthropologie and thanks go to Bernard Comrie, Jeff Good and Michael Cysouw. For helpful comments and reviews, I thank Emily Bender, Scott Farrar, Sharon Hargus, Will Lewis, Dan McCloy, Richard Wright, and three anonymous reviewers.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Moran, S. An ontology for accessing transcription systems. Lang Resources & Evaluation 45, 345–360 (2011). https://doi.org/10.1007/s10579-011-9158-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10579-011-9158-8