Abstract
Deep Neural Networks have demonstrated the great efficiency in many NLP task for various languages. Unfortunately, some resource-scarce languages as, e.g., Tigrinya still receive too little attention, therefore many NLP applications as part-of-speech tagging are in their early stages. Consequently, the main objective of this research is to offer the effective part-of-speech tagging solutions for the Tigrinya language having rather small training corpus.
In this paper the Deep Neural Network classifiers (i.e., Feed Forward Neural Network, Long Short-Term Memory, Bidirectional LSTM and Convolutional Neural Network) are investigated by applying them on a top of trained distributional neural word2vec embeddings. Seeking for the most accurate solutions, DNN models are optimized manually and automatically. Despite automatic hyper-parameter optimization demonstrates a good performance with the Convolutional Neural Network, the manually tested Bidirectional Long Short – Term Memory method achieves the highest overall accuracy equal to 0.91%.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Available at http://crubadan.org/languages/ti and word list compiled by Biniam Gebremichael's web crawler, available http://www.cs.ru.nl/biniam/geez/crawl.php.
- 2.
Available at https://eng.jnlp.org/yemane/ntigcorpus.
- 3.
For representing this and further models plot_model function in Keras was used.
References
Amsalu, S., Gibbon, D.: Finite state morphology of amharic. In: International Conference on Recent Advances in Natural Language Processing, pp. 47–51 (2005)
Chollet, F.: Keras: deep learning library for Theano and Tensorflow (2015). https://keras.io/. Accessed Mar 2020
Gebregzabiher, T.: Part of speech tagger for tigrigna language. Department of Computer Science, Addis Ababa University, Master thesis (2010)
Hyperas: Keras + Hyperopt: A Very Simple Wrapper for Convenient Hyperparameter Optimization. https://github.com/maxpumperla/hyperas. Accessed Mar 2020
Keleta, Y., Yamamoto, K., Marasinghe, A.: Nagaoka Tigrinya Corpus: Design and Development of Part-of-Speech Tagged Corpus. The Association for Natural Language Processing, pp. 413–416 (2016)
Keleta, Y., Yamamoto, K., Marasinghe, A.: Tigrinya part-of-speech tagging with morphological patterns and the New Nagaoka Tigrinya Corpus. Int. J. Comput. Appl. 146(14), 33–41 (2016). https://doi.org/10.5120/ijca2016910943
McNemar, Q.: Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika 12(2), 153–157. https://doi.org/10.1007/bf02295996, PMID 20254758
Nwankpa, Ch., Ijomah, W., Gachagan, A., Marshall, S.: Activation Functions: Comparison of Trends in Practice and Research for Deep Learning (2018). arXiv:1811.03378v1
Řehůřek, R., Sojka, P.: Software framework for topic modeling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50 (2010). https://doi.org/10.13140/2.1.2393.1847
Tensorflow. https://www.tensorflow.org/. Accessed Mar 2020
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Tesfagergish, S.G., Kapociute-Dzikiene, J. (2020). Deep Learning-Based Part-of-Speech Tagging of the Tigrinya Language. In: Lopata, A., Butkienė, R., Gudonienė, D., Sukackė, V. (eds) Information and Software Technologies. ICIST 2020. Communications in Computer and Information Science, vol 1283. Springer, Cham. https://doi.org/10.1007/978-3-030-59506-7_29
Download citation
DOI: https://doi.org/10.1007/978-3-030-59506-7_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59505-0
Online ISBN: 978-3-030-59506-7
eBook Packages: Computer ScienceComputer Science (R0)