Skip to main content

Inferring the Complete Set of Kazakh Endings as a Language Resource

  • Conference paper
  • First Online:
Advances in Computational Collective Intelligence (ICCCI 2020)

Abstract

The Kazakh language belongs to low-resource languages. For application of actual modern branches as artificial intelligence, machine translation, summarization, sentiment analysis, etc. to the Kazakh language needs increasing the number of electronic language resources. Although neural machine translation (NMT) has shown impressive results for many world languages, it does not solve the problem of low-resource languages. Therefore, the development of resources and tools perfecting the use of NMT for low-resource languages is relevant. For perfect use of NMT for the Kazakh language needs bilingual parallel corpora, but also needs a perfect method of the segmentation source text. By the opinion of authors, one of the effective ways for source text segmentation is morphological segmentation. The authors propose to use for morphological segmentation of Kazakh text a table of a complete set of Kazakh words’ endings. In this paper is described the inferring of the complete set of Kazakh words’ endings. Development of the table of the complete set of word’ endings of the Kazakh language will allow in one-step (by reference to the table of endings of the language) to perform the segmentation of the word’s ending into suffixes. The complete set of endings of the Kazakh language allows guaranteeing the analysis of any word of the Kazakh language, as this is determined by the inferring of the complete system of words’ endings of the language.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 1715–1725 (2016)

    Google Scholar 

  2. Tukeyev, U.: Automaton models of the morphology analysis and the completeness of the endings of the Kazakh language. In: Proceedings of the International Conference “Turkic Languages Processing” TURKLANG 2015, Kazan, Tatarstan, Russia, 17–19 September, pp. 91–100 (2015)

    Google Scholar 

  3. Tacorda, A.J., Ignacio, M.J., Oco, N., Roxas, R.E.: Controlling byte pair encoding for neural machine translation. In: 2017 International Conference on Asian Language Processing, pp. 168–171 (2017)

    Google Scholar 

  4. Wu, Y., Zhao, H.: Finding better subword segmentation for neural machine translation. In: Sun, M., Liu, T., Wang, X., Liu, Z., Liu, Y. (eds.) CCL/NLP-NABD -2018. LNCS (LNAI), vol. 11221, pp. 53–64. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01716-3_5

    Chapter  Google Scholar 

  5. Ataman, D., Negri, M., Turchi, M., Federico, M.: Linguistically motivated vocabulary reduction for neural machine translation from Turkish to English. Prague Bull. Math. Linguist. 108(1), 331–342 (2017)

    Article  Google Scholar 

  6. Creutz, M., Lagus, K.: Unsupervised discovery of morphemes. In: Proceedings of the ACL 2002 Workshop on Morphological and Phonological Learning, vol. 6, pp. 21–30 (2002)

    Google Scholar 

  7. Koskenniemi, K.: Two-level morphology: a general computational model for word-form recognition and production. Ph.D. thesis, University of Helsinki (1983)

    Google Scholar 

  8. Oflazer, K.: two-level description of Turkish morphology. Literary Linguist. Comput. 9(2), 137–148 (1994)

    Article  Google Scholar 

  9. Beesley, K.R., Karttunen, L.: Finite-State Morphology. CSLI Publications, Stanford University (2003)

    Google Scholar 

  10. Kairakbay, B.: A nominal paradigm of the Kazakh language. In: 11th International Conference on Finite State Methods and Natural Language Processing, pp. 108–112 (2013)

    Google Scholar 

  11. Kessikbayeva, G., Cicekli, I.: Rule based morphological analyzer of Kazakh language. In: Proceedings of the 2014 Joint Meeting of SIGMORPHON and SIGFSM, Baltimore, Maryland USA, pp. 46–54 (2014)

    Google Scholar 

Download references

Acknowledgements

This work was carried out under grant No. AP05131415 “Development and research of the neural machine translation system of Kazakh language”, funded by the Ministry of Education and Science of the Republic of Kazakhstan for 2018-2020.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ualsher Tukeyev .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tukeyev, U., Karibayeva, A. (2020). Inferring the Complete Set of Kazakh Endings as a Language Resource. In: Hernes, M., Wojtkiewicz, K., Szczerbicki, E. (eds) Advances in Computational Collective Intelligence. ICCCI 2020. Communications in Computer and Information Science, vol 1287. Springer, Cham. https://doi.org/10.1007/978-3-030-63119-2_60

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-63119-2_60

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-63118-5

  • Online ISBN: 978-3-030-63119-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics