Skip to main content

The Multilingual Semantic Web as Virtual Knowledge Commons: The Case of the Under-Resourced South African Languages

  • Chapter
  • First Online:
  • 821 Accesses

Abstract

The participation of the under-resourced South African languages in the Multilingual Semantic Web as Virtual Knowledge Commons is imperative in terms of sharing in and contributing to the knowledge commons, in sustaining multilingualism and the technological development of these languages and in preserving cultural diversity and indigenous knowledge systems. This chapter takes a closer look at the challenges that the under-resourced languages of South Africa face in this regard and addresses two of these challenges. It is shown how three different types of high-quality language data, viz. multilingual terminology in English, Afrikaans, Tswana and Zulu; indigenous knowledge on astronomy nomenclature in Tswana; and a parallel corpus of English, Afrikaans, Tswana and Zulu could be exposed as Linked Data in a principled way. The conclusion contains various possibilities for future work.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    The plural form of people is used here to refer to groupings of persons sharing, for example, a culture.

  2. 2.

    http://gama.unisa.ac.za/files/rdf/MSW-chapter-lex.

  3. 3.

    http://gama.unisa.ac.za/files/rdf/MSW-chapter-IKS.

  4. 4.

    Apart from the Bible, the Constitution is one of a small number of high-quality parallel corpora, available in all 11 official languages.

  5. 5.

    http://gama.unisa.ac.za/files/MSW-chapter-annotations.pdf.

  6. 6.

    http://gama.unisa.ac.za/files/MSW-chapter-morphTags.pdf.

  7. 7.

    http://lfg-demo.computing.dcu.ie/lfgparser.html.

  8. 8.

    http://gama.unisa.ac.za/files/MSW-chapter-annotations.pdf.

References

  • Ahrenberg, L., Tiedemann, J., & Volk, M. (Eds.). (2010). Proceedings of the Workshop on Annotation and Exploitation of Parallel Corpora. NEALT Proceedings Series (Vol. 10). Tartu, Estonia: University of Tartu.

    Google Scholar 

  • Bosch, S.E., & Pretorius, L. (2011). Towards Zulu corpus clean-up, lexicon development and corpus annotation by means of computational morphological analysis. South African Journal of African Languages, 31(1), 138–158.

    Google Scholar 

  • Bosch, S. E., Pretorius, L., & Fleisch, A. (2008). Experimental bootstrapping of morphological analysers for Nguni languages. Nordic Journal of African Studies, 17(2), 66–88.

    Google Scholar 

  • Botha, T. J. R., Ponelis, F. A., Combrinck, J. G. H., & Odendal, F. F. (1989). Inleiding tot die Afrikaanse taalkunde. Pretoria, South Africa: Academica.

    Google Scholar 

  • Buitelaar, P., Choi, K.-S., Cimiano, P., & Hovy, E. D. (Eds.). (2012). The multilingual Semantic Web (Dagstuhl Seminar 12362). Dagstuhl Reports, 2(9), 15–94.

    Google Scholar 

  • Chiarcos, C. (2012). Interoperability of corpora and annotations. In C. Chiarcos, S. Hellmann, & S. Nordhoff (Eds.), Linked data in linguistics. Berlin, Germany: Springer.

    Chapter  Google Scholar 

  • Chiarcos, C., McCrae, J., Cimiano, P., & Fellbaum, C. (2013). Towards open data for linguistics: Linguistic linked data. In A. Oltramari, et al. (Eds.), New trends of research in ontologies and lexical resources, theory and applications of natural language processing. Berlin, Germany: Springer.

    Google Scholar 

  • Cimiano, P., Buitelaar, P., McCrae, J., & Sintek, M. (2011). LexInfo: A declarative model for the lexicon-ontology interface. Journal of Web Semantics, 9(1), 29–51.

    Article  Google Scholar 

  • Constitution of the Republic of South Africa (English). (1996). Retrieved from http://www.info.gov.za/documents/constitution/93cons.htm.

  • Deumert, A. (in press). Sites of struggle and possibility in cyberspace - Wikipedia and Facebook in Africa. In J. Androutsopoulos (Ed.), The media and sociolinguistic change. Berlin, Germany: De Gruyter (forthcoming).

    Google Scholar 

  • Griessel, M., & Bosch, S. (2014). Taking stock of the African Wordnets project: 5 years of development. In Proceedings of the 7th Global WordNet Conference (GWC2014), Tartu, Estonia.

    Google Scholar 

  • Grimes, B. F. (2001). Global language viability. In O. Sakiyama (Ed.), Endangered languages of the Pacific rim: Lectures on endangered languages 2. ELPR Publication Series C002. Osaka, Japan: ELPR. Retrieved from http://www.sil.org/sociolx/ndg-lg-grimes.html.

  • Grover, A. S., Van Huyssteen, G. B., & Pretorius, M. W. (2011). South African human language technology audit. Language Resources and Evaluation, 45(3), 271–288.

    Article  Google Scholar 

  • Halpin, H., Hayes, P., McCusker, J. P., McGuinness, D. L., & Thompson, H. S. (2010). When owl: sameAs isnt the same: An analysis of identity in linked data. Lecture Notes in Computer Science (Vol. 6496, pp. 305–320). Berlin/Heidelberg: Springer.

    Google Scholar 

  • Hyvönen, E. (2012). Publishing and using cultural heritage linked data on the semantic web. In J. Hendler (Series Ed.), Synthesis lectures on the semantic web: Theory and technology. CA, USA: Morgan & Claypool Publishers.

    Google Scholar 

  • Index Mundi. (2012). South Africa literacy. Retrieved from http://www.indexmundi.com/south_africa/literacy.html.

  • Innerarity, D. (2011). The democracy of knowledge. For an intelligent society (H. D. DAmbrosio, Trans.). Retrieved from http://www.essayandscience.com/upload/ficheros/libros/201203/cap._innerarity.pdf.

  • In ‘t Veld, R. (Ed.). (2010). Knowledge democracy: Consequences for science, politics, and media. Berlin, Germany: Springer.

    Google Scholar 

  • Kosch, I. M. (2006). Topics in morphology in the African language context. Pretoria, South Africa: University of South Africa.

    Google Scholar 

  • Krüger, C. J. H. (2006). Introduction to the morphology of Setswana. Munich, Germany: Lincom GmbH.

    Google Scholar 

  • Leeuw, L. L. (2007). Setswana astronomical nomenclature. African Skies, 11, 17–18.

    Google Scholar 

  • Leeuw, L. L. (2014). An exemplary astronomical lesson that could potentially show the benefits of multilingual content and language in higher education. In L. Hibbert & C. van der Walt (Eds.), Multilingual Universities in South Africa: Reflecting society in higher education. Bristol, UK: Channel View Publications Ltd.

    Google Scholar 

  • Lewis, M. P. (Ed.). (2009). Ethnologue: Languages of the world (16th ed.). Dallas, TX: SIL International. Retrieved from http://www.ethnologue.com/.

  • McCrae, J., Aguado-de-Cea, G., Buitelaar, P., Cimiano, P., Declerck, T., Gomez Perez, A., et al. (s.a.) The lemon cookbook. Retrieved from http://www.lexinfo.net//sites/default/files/lemon-cookbook.pdf.

  • McCrae, J., Aguado-de-Cea, G., Buitelaar, P., Cimiano, P., Declerck, T., Gomez Perez, A., et al. (2012). Interchanging lexical resources on the Semantic Web. Language Resources and Evaluation, 46(4), 701–719.

    Article  Google Scholar 

  • Ontology-Lexica W3C Community Group. (2013). Specification of requirements on terminological analysis. Retrieved from http://www.w3.org/community/ontolex/wiki/.

  • Pilon, S. (2005). Outomatiese Afrikaanse woordsoortetikettering. Unpublished master’s dissertation, North-West University, Potchefstroom, South Africa.

    Google Scholar 

  • Poulos, G., & Msimang, C. T. (1998). A linguistic analysis of Zulu. Cape Town, South Africa: Via Africa.

    Google Scholar 

  • POWLA. Retrieved from http://nachhalt.sfb632.uni-potsdam.de/powla/.

  • Pretorius, L., & Bosch, S. E. (2010). Finite state morphology of the Nguni language cluster: Modelling and implementation issues. Lecture Notes in Computer Science (Vol. 6062, pp. 123–130). Berlin, Heidelberg: Springer.

    Google Scholar 

  • Pretorius, L., Viljoen, B., Pretorius, R., & Berg, A. (2010). A finite-state approach to Setswana verb morphology. Lecture Notes in Computer Science (Vol. 6062, pp. 131–138). Berlin, Heidelberg: Springer.

    Google Scholar 

  • RMA. (2013). Language resource management agency. Retrieved from http://www.rma.nwu.ac.za/.

  • Scannell, K. (2007). The Crúbadán project: Corpus building for under-resourced languages. In C. Fairon, H. Naets, A. Kilgarriff & G.-M. de Schryver (Eds.), Building and Exploring Web Corpora, Proceedings of the 3rd Web as Corpus Workshop, Louvain-la-Neuve, Belgium.

    Google Scholar 

  • Settee, P. (2008). Native languages supporting indigenous knowledge. United Nations Department of Economic and Social Affairs, Division for Social Policy and Development, New York, PFII/2008/EGM1/13.

    Google Scholar 

  • Statistics South Africa. (2009). Multilingual statistical guide (272 pp.). Pretoria, South Africa: Statistics South Africa. ISBN 978-0-621-38513-7.

    Google Scholar 

  • Statistics South Africa. (2012). Census 2011 census in brief (105 pp.). Report No.: 03-01-41. ISBN 978-0-621-41388-5.

    Google Scholar 

  • TNS. (2013). Navigating growth in Africa. Retrieved from http://uk.kantar.com/media/138155/tns_navigating_growth_in_africa.pdf.

  • UNESCO. (s.a.). Retrieved from http://www.unesco.org/most/bpindi.htm.

  • Van der Velden, M. (2006). A case for cognitive justice. Retrieved from http://www.globalagenda.org/file/6.

  • Visvanathan, S. (2009). The search for cognitive justice. Retrieved from http://www.india-seminar.com/2009/597/597_shiv_visvanathan.htm.

  • Windhouwer, M., Schuurman, I., & Wright, S. E. (2013). Collaboratively defining widely accepted linguistic data categories in the ISOcat data category registry. Retrieved from https://catalog.clarin.eu/isocat/index_bestanden/publications.html.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Laurette Pretorius .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Pretorius, L. (2014). The Multilingual Semantic Web as Virtual Knowledge Commons: The Case of the Under-Resourced South African Languages. In: Buitelaar, P., Cimiano, P. (eds) Towards the Multilingual Semantic Web. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-43585-4_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-43585-4_4

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-43584-7

  • Online ISBN: 978-3-662-43585-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics