Skip to main content

Building Word Representations for Wolof Using Neural Networks

  • Conference paper
  • First Online:
Book cover Innovations and Interdisciplinary Solutions for Underserved Areas (InterSol 2020)

Abstract

Because a large portion of population in rural areas in sub Saharan Africa understand only local languages, they do not have access all to content available in the World Wide Web. Most content are available in English, Spanish, French, etc. Content in low-resource languages such as Wolof, which is mostly spoken in Senegal, are scarce. Automatic systems for natural language understanding such as machine translation systems that can transform information from common to low-resource languages would allow people in rural areas to access relevant scientific or health content.

Nowadays, word representation is the preliminary step of natural language understanding models. This paper presents investigations we conducted to build Wolof words representation using a corpus gathered from Internet. We applied neural word embedding models to the Wolof language corpus. These models are known to be able to capture into the embedding space semantic an syntactic relations between words. Experiments we conducted suggest that, despite a limited corpus size, our models successfully captures relations between words.

Authors thank the CEA MITIC for funding this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. AAI State of Education in Africa Report 2015. http://www.aaionline.org/wp-content/uploads/2015/09/AAI-SOE-report-2015-final.pdf

  2. Bengio, Y., Ducharme, R., Vincent, P., Janvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)

    MATH  Google Scholar 

  3. Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the ICML (2008)

    Google Scholar 

  4. Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: ICLR (2013)

    Google Scholar 

  5. Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS (2013b)

    Google Scholar 

  6. Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: EMNLP (2014)

    Google Scholar 

  7. Arora, S., Li, Y., Liang, Y., Ma, T., Risteski, A.: A latent variable model approach to PMI-based word embeddings. Trans. ACL 4, 385–399 (2016)

    Google Scholar 

  8. Dione, C.B.: LFG parse disambiguation for Wolof. J. Lang. Model. 2(1), 105–165 (2014)

    Article  Google Scholar 

  9. Dione, C.B.: Valency change and complex predicates in Wolof: an LFG account. In: LFG Conference (2013)

    Google Scholar 

  10. Dione C.B.: An LFG approach to Wolof cleft constructions. In: LFG Conference (2012)

    Google Scholar 

  11. Khoule, M., Thiam, M.N., Nguer, E.M.: Towards the establishment of a LMF-based Wolof language lexicon. Traitement Automatique des Langues Africaines (TALAf) (2014)

    Google Scholar 

  12. Pauw, G.D., Wagacha, P.W., de Schryver, G.-M.: Towards English - Swahili machine translation. In: Research Workshop of the Israel Science Foundation (2011)

    Google Scholar 

  13. Ombui, E.O., Wagacha, P.W., Ng’ang’a, W.: InterlinguaPlus machine translation approach for under-resourced languages: Ekegusii & Swahili. In: Workshop on the Use of Computational Methods in the Study of Endangered Languages (2014)

    Google Scholar 

  14. Gebreegziabher, M., Besacier, L.: English-Amharic statistical machine translation. In: Workshop on Spoken Language Technologies for Under-Resourced Languages (2012)

    Google Scholar 

  15. Sichel, H.S.: On a distribution law for word frequencies. J. Am. Stat. Assoc. 70, 542–547 (1975)

    Google Scholar 

  16. Pathe, D.: Grammaire de wolof moderne, Edition Presence Africaine (1971)

    Google Scholar 

  17. Cisse, M.T., Diagne, A.M., Campenhoudt, M.V., Muraille, P.: Mise au point d’une base de données lexicale multifonctionnelle : le dictionnaire unilingue wolof et bilingue wolof-français. Journées LC (2007)

    Google Scholar 

  18. Young, T., Hazarika, D., Poria, S., Cambria, E.: Recent trends in deep learning based natural language processing. arXiv:1708.02709 (2018)

  19. Wild, F., Stahl, C.: Investigating unstructured texts with latent semantic analysis. In: Decker, R., Lenz, H.-J. (eds.) Advances in Data Analysis. SCDAKO, pp. 383–390. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-70981-7_43

    Chapter  Google Scholar 

  20. Ba, M.: So long a letter. Nouvelles Editions Africaines. https://en.wikipedia.org/wiki/So_Long_a_Letter (1979)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Elhadji Mamadou Nguer .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lo, A., Dione, C.M.B., Nguer, E.M., Ba, S.O., Lo, M. (2020). Building Word Representations for Wolof Using Neural Networks. In: Thorn, J., Gueye, A., Hejnowicz, A. (eds) Innovations and Interdisciplinary Solutions for Underserved Areas. InterSol 2020. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 321. Springer, Cham. https://doi.org/10.1007/978-3-030-51051-0_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-51051-0_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-51050-3

  • Online ISBN: 978-3-030-51051-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics