Skip to main content

Morphology Model and Segmentation for Old Turkic Language

  • Conference paper
  • First Online:
Computational Collective Intelligence (ICCCI 2021)

Abstract

Old Turkic language is the basis of all modern Turkic languages. Its study is very important for Turkic peoples who possess modern Turkic languages. This is important both from a historical point of view and for the study of modern issues of neural machine translation, issues of the linguistic distance of modern Turkic languages from their progenitor. This paper proposes the development of a computational model of the morphology of Old Turkic language based on the CSE (Complete Set of Endings) – model of morphology and a study on this basis of the issue of morphological segmentation of the texts of Old Turkic language, which will subsequently be used for neural machine translation of Old Turkic language into modern Turkic languages. Since most of the modern Turkic languages, except for the Turkish language, belong to low-resource languages, the issues of developing computational models of morphology, developing models, algorithms and software for processing Turkic languages are relevant.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Tukeyev, U.: Automaton models of the morphology analysis and the completeness of the endings of the Kazakh language. In: Proceedings of the International Conference “Turkic Languages Processing” TURKLANG 2015, Kazan, Tatarstan, Russia, pp. 91–100, 17–19 Sep 2015

    Google Scholar 

  2. NLP-KazNU. https://github.com/NLP-KazNU?tab=repositories. Accessed 16 Apr 2021

  3. Harris Z.: Methods in Structural Linguistics. Chicago University Press (1951)

    Google Scholar 

  4. Goldsmith, S.: Unsupervised learning of the morphology of a natural language. Comput. Linguist. 27(2), 153–198 (2001)

    Article  MathSciNet  Google Scholar 

  5. Creutz, M., Lagus, K.: Unsupervised discovery of morphemes. In: Proceedings of the ACL 2002 Workshop on Morphological and Phonological Learning, vol. 6, pp. 21–30 (2002)

    Google Scholar 

  6. Poon, H., Cherry, C., Toutanova, K.: Unsupervised morphological segmentation with log-linear models. In: NAACL-HLT, pp. 209–217. Association for Computational Linguistics (2009)

    Google Scholar 

  7. Gronroos, S., Virpioja, S., Smit, P., Kurimo, M.: Morfessor flatcat: an HMM-based method for unsupervised and semi-supervised learning of morphology. In: COLING, pp. 1177–1185 (2014)

    Google Scholar 

  8. Hulden M.: Foma: a finite-state compiler and library. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics: Demonstrations Session, pp. 29–32. Association for Computational Linguistics (2009)

    Google Scholar 

  9. Lindén, K., Axelson, E., Hardwick, S., Pirinen, T.A., Silfverberg, M.: HFST—framework for compiling and applying morphologies. In: Mahlow, C., Piotrowski, M. (eds.) SFCM 2011. CCIS, vol. 100, pp. 67–85. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23138-4_5

    Chapter  Google Scholar 

  10. Ataman, D., Negri, M., Turchi, M., Federico, M.: Linguistically motivated vocabulary reduction for neural machine translation from Turkish to English. Prague Bull. Math. Linguist. 108(1), 331–342 (2017)

    Article  Google Scholar 

  11. Ataman, D., Federico, M.: An evaluation of two vocabulary reduction methods for neural machine translation. In: Proceedings of the 13th Conference of the Association for Machine Translation in the Americas, vol. 1, pp. 97–110 (2018)

    Google Scholar 

  12. Toral, A., Edman, L., Yeshmagambetova, G., Spenader, J.: Neural machine translation for English-Kazakh with morphological segmentation and synthetic data. In: Proceedings of the Fourth on Machine Translation (Volume 2: Shared Task Papers, Day 1), pp. 386–392. Association for Computational Linguistics, Florence, Italy (2019)

    Google Scholar 

  13. Huck, M., Riess, S., Fraser, A.: Target-side word segmentation strategies for neural machine translation. In: Proceedings of the Second Conference on Machine Translation, pp. 56–67. Association for Computational Linguistics, Copenhagen, Denmark (2017)

    Google Scholar 

  14. Weller-Di, M., Fraser, A.: Modeling word formation in English–German neural ma-chine translation. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 4227–4232, Online. Association for Computational Linguistics (2020)

    Google Scholar 

  15. Schone, P., Jurafsky, D.: Is knowledge-free induction of multiword unit dictionary headwords a solved problem? In: Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing, pp. 100–108 (2001)

    Google Scholar 

  16. Wu, Y., Zhao, H.: Finding better subword segmentation for neural machine translation. In: Sun, M., Liu, T., Wang, X., Liu, Z., Liu, Y. (eds.) CCL/NLP-NABD-2018. LNCS (LNAI), vol. 11221, pp. 53–64. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01716-3_5

    Chapter  Google Scholar 

  17. Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 1715–1725 (2016)

    Google Scholar 

  18. Scherrer, Y., Gronroos, S., Virpioja, S.: The University of Helsinki and Aalto university submissions to the WMT 2020 news and low-resource translation tasks. In: Proceedings of the Fifth Conference on Machine Translation, pp. 1129–1138. Online Association for Computational Linguistics (2020)

    Google Scholar 

  19. Baskakov, A., Xasanov, B.: Languages of interethnic communication in Kazakhstan. In: Wurm, S.A., Mühlhäusler, P., Tryon, D.T. (eds.) Atlas of Languages of Intercultural Communication in the Pacific, Asia and the Americas, vol. 2, pp. 933–936. de Gruyter, Berlin (1996)

    Chapter  Google Scholar 

  20. Kondratiev, V.: The Grammatical Structure of the Language of the Ancient Turkic Manuscripts of the VIII–XI Centuries, pp. 191–200. Publ. House of Leningrad University, Leningrad (1981)

    Google Scholar 

  21. Erdal, M.: A Grammar of Old Turkic. Brill, Leiden and Boston (2004)

    Book  Google Scholar 

  22. Aydarov, G., Kuryshzhanov, A., Tomanov, M.: A Language of Ancient Turkic Written Monuments. Mektep, Almaty (1971)

    Google Scholar 

  23. Aydarov, G., Kuryshzhanov, A., Tomanov, M.: A Language of Ancient Turkic Written Monuments, pp. 111–123. Almaty, Mektep (1971)

    Google Scholar 

  24. Malov, S.: Monuments of the Ancient Turkic Writing of Mongolia and Kyrgyzstan, pp. 48–50. Leningrad, Moscow (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhanabergenova, D., Tukeyev, U. (2021). Morphology Model and Segmentation for Old Turkic Language. In: Nguyen, N.T., Iliadis, L., Maglogiannis, I., Trawiński, B. (eds) Computational Collective Intelligence. ICCCI 2021. Lecture Notes in Computer Science(), vol 12876. Springer, Cham. https://doi.org/10.1007/978-3-030-88081-1_47

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-88081-1_47

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-88080-4

  • Online ISBN: 978-3-030-88081-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics