Deep Learning-Based Part-of-Speech Tagging of the Tigrinya Language

Tesfagergish, Senait Gebremichael; Kapociute-Dzikiene, Jurgita

doi:10.1007/978-3-030-59506-7_29

Deep Learning-Based Part-of-Speech Tagging of the Tigrinya Language

Senait Gebremichael Tesfagergish⁹ &
Jurgita Kapociute-Dzikiene⁹

Conference paper
First Online: 08 October 2020

721 Accesses
2 Citations

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1283))

Abstract

Deep Neural Networks have demonstrated the great efficiency in many NLP task for various languages. Unfortunately, some resource-scarce languages as, e.g., Tigrinya still receive too little attention, therefore many NLP applications as part-of-speech tagging are in their early stages. Consequently, the main objective of this research is to offer the effective part-of-speech tagging solutions for the Tigrinya language having rather small training corpus.

In this paper the Deep Neural Network classifiers (i.e., Feed Forward Neural Network, Long Short-Term Memory, Bidirectional LSTM and Convolutional Neural Network) are investigated by applying them on a top of trained distributional neural word2vec embeddings. Seeking for the most accurate solutions, DNN models are optimized manually and automatically. Despite automatic hyper-parameter optimization demonstrates a good performance with the Convolutional Neural Network, the manually tested Bidirectional Long Short – Term Memory method achieves the highest overall accuracy equal to 0.91%.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Available at http://crubadan.org/languages/ti and word list compiled by Biniam Gebremichael's web crawler, available http://www.cs.ru.nl/biniam/geez/crawl.php.
2.
Available at https://eng.jnlp.org/yemane/ntigcorpus.
3.
For representing this and further models plot_model function in Keras was used.

References

Amsalu, S., Gibbon, D.: Finite state morphology of amharic. In: International Conference on Recent Advances in Natural Language Processing, pp. 47–51 (2005)
Google Scholar
Chollet, F.: Keras: deep learning library for Theano and Tensorflow (2015). https://keras.io/. Accessed Mar 2020
Gebregzabiher, T.: Part of speech tagger for tigrigna language. Department of Computer Science, Addis Ababa University, Master thesis (2010)
Google Scholar
Hyperas: Keras + Hyperopt: A Very Simple Wrapper for Convenient Hyperparameter Optimization. https://github.com/maxpumperla/hyperas. Accessed Mar 2020
Keleta, Y., Yamamoto, K., Marasinghe, A.: Nagaoka Tigrinya Corpus: Design and Development of Part-of-Speech Tagged Corpus. The Association for Natural Language Processing, pp. 413–416 (2016)
Google Scholar
Keleta, Y., Yamamoto, K., Marasinghe, A.: Tigrinya part-of-speech tagging with morphological patterns and the New Nagaoka Tigrinya Corpus. Int. J. Comput. Appl. 146(14), 33–41 (2016). https://doi.org/10.5120/ijca2016910943
Article Google Scholar
McNemar, Q.: Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika 12(2), 153–157. https://doi.org/10.1007/bf02295996, PMID 20254758
Nwankpa, Ch., Ijomah, W., Gachagan, A., Marshall, S.: Activation Functions: Comparison of Trends in Practice and Research for Deep Learning (2018). arXiv:1811.03378v1
Řehůřek, R., Sojka, P.: Software framework for topic modeling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50 (2010). https://doi.org/10.13140/2.1.2393.1847
Tensorflow. https://www.tensorflow.org/. Accessed Mar 2020

Download references

Author information

Authors and Affiliations

Vytautas Magnus University, K. Donelaičio, 44248, Kaunas, Lithuania
Senait Gebremichael Tesfagergish & Jurgita Kapociute-Dzikiene

Authors

Senait Gebremichael Tesfagergish
View author publications
You can also search for this author in PubMed Google Scholar
Jurgita Kapociute-Dzikiene
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Senait Gebremichael Tesfagergish .

Editor information

Editors and Affiliations

Kaunas University of Technology, Kaunas, Lithuania
Audrius Lopata
Kaunas University of Technology, Kaunas, Lithuania
Rita Butkienė
Kaunas University of Technology, Kaunas, Lithuania
Daina Gudonienė
Kaunas University of Technology, Kaunas, Lithuania
Vilma Sukackė

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tesfagergish, S.G., Kapociute-Dzikiene, J. (2020). Deep Learning-Based Part-of-Speech Tagging of the Tigrinya Language. In: Lopata, A., Butkienė, R., Gudonienė, D., Sukackė, V. (eds) Information and Software Technologies. ICIST 2020. Communications in Computer and Information Science, vol 1283. Springer, Cham. https://doi.org/10.1007/978-3-030-59506-7_29

Download citation

DOI: https://doi.org/10.1007/978-3-030-59506-7_29
Published: 08 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59505-0
Online ISBN: 978-3-030-59506-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics