Inferring the Complete Set of Kazakh Endings as a Language Resource

Tukeyev, Ualsher; Karibayeva, Aidana

doi:10.1007/978-3-030-63119-2_60

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1287))

Included in the following conference series:

International Conference on Computational Collective Intelligence

1176 Accesses
5 Citations

Abstract

The Kazakh language belongs to low-resource languages. For application of actual modern branches as artificial intelligence, machine translation, summarization, sentiment analysis, etc. to the Kazakh language needs increasing the number of electronic language resources. Although neural machine translation (NMT) has shown impressive results for many world languages, it does not solve the problem of low-resource languages. Therefore, the development of resources and tools perfecting the use of NMT for low-resource languages is relevant. For perfect use of NMT for the Kazakh language needs bilingual parallel corpora, but also needs a perfect method of the segmentation source text. By the opinion of authors, one of the effective ways for source text segmentation is morphological segmentation. The authors propose to use for morphological segmentation of Kazakh text a table of a complete set of Kazakh words’ endings. In this paper is described the inferring of the complete set of Kazakh words’ endings. Development of the table of the complete set of word’ endings of the Kazakh language will allow in one-step (by reference to the table of endings of the language) to perform the segmentation of the word’s ending into suffixes. The complete set of endings of the Kazakh language allows guaranteeing the analysis of any word of the Kazakh language, as this is determined by the inferring of the complete system of words’ endings of the language.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 1715–1725 (2016)
Google Scholar
Tukeyev, U.: Automaton models of the morphology analysis and the completeness of the endings of the Kazakh language. In: Proceedings of the International Conference “Turkic Languages Processing” TURKLANG 2015, Kazan, Tatarstan, Russia, 17–19 September, pp. 91–100 (2015)
Google Scholar
Tacorda, A.J., Ignacio, M.J., Oco, N., Roxas, R.E.: Controlling byte pair encoding for neural machine translation. In: 2017 International Conference on Asian Language Processing, pp. 168–171 (2017)
Google Scholar
Wu, Y., Zhao, H.: Finding better subword segmentation for neural machine translation. In: Sun, M., Liu, T., Wang, X., Liu, Z., Liu, Y. (eds.) CCL/NLP-NABD -2018. LNCS (LNAI), vol. 11221, pp. 53–64. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01716-3_5
Chapter Google Scholar
Ataman, D., Negri, M., Turchi, M., Federico, M.: Linguistically motivated vocabulary reduction for neural machine translation from Turkish to English. Prague Bull. Math. Linguist. 108(1), 331–342 (2017)
Article Google Scholar
Creutz, M., Lagus, K.: Unsupervised discovery of morphemes. In: Proceedings of the ACL 2002 Workshop on Morphological and Phonological Learning, vol. 6, pp. 21–30 (2002)
Google Scholar
Koskenniemi, K.: Two-level morphology: a general computational model for word-form recognition and production. Ph.D. thesis, University of Helsinki (1983)
Google Scholar
Oflazer, K.: two-level description of Turkish morphology. Literary Linguist. Comput. 9(2), 137–148 (1994)
Article Google Scholar
Beesley, K.R., Karttunen, L.: Finite-State Morphology. CSLI Publications, Stanford University (2003)
Google Scholar
Kairakbay, B.: A nominal paradigm of the Kazakh language. In: 11th International Conference on Finite State Methods and Natural Language Processing, pp. 108–112 (2013)
Google Scholar
Kessikbayeva, G., Cicekli, I.: Rule based morphological analyzer of Kazakh language. In: Proceedings of the 2014 Joint Meeting of SIGMORPHON and SIGFSM, Baltimore, Maryland USA, pp. 46–54 (2014)
Google Scholar

Download references

Acknowledgements

This work was carried out under grant No. AP05131415 “Development and research of the neural machine translation system of Kazakh language”, funded by the Ministry of Education and Science of the Republic of Kazakhstan for 2018-2020.

Author information

Authors and Affiliations

Al-Farabi Kazakh National University, al-Farabi Avenue, 71, 050040, Almaty, Kazakhstan
Ualsher Tukeyev & Aidana Karibayeva

Authors

Ualsher Tukeyev
View author publications
You can also search for this author in PubMed Google Scholar
Aidana Karibayeva
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ualsher Tukeyev .

Editor information

Editors and Affiliations

Wroclaw University of Economics and Business, Wrocław, Poland
Marcin Hernes
Wrocław University of Science and Technology, Wrocław, Poland
Krystian Wojtkiewicz
University of Newcastle, Newcastle, Australia
Edward Szczerbicki

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tukeyev, U., Karibayeva, A. (2020). Inferring the Complete Set of Kazakh Endings as a Language Resource. In: Hernes, M., Wojtkiewicz, K., Szczerbicki, E. (eds) Advances in Computational Collective Intelligence. ICCCI 2020. Communications in Computer and Information Science, vol 1287. Springer, Cham. https://doi.org/10.1007/978-3-030-63119-2_60

Download citation

DOI: https://doi.org/10.1007/978-3-030-63119-2_60
Published: 19 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63118-5
Online ISBN: 978-3-030-63119-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics