Abstract
Old Turkic language is the basis of all modern Turkic languages. Its study is very important for Turkic peoples who possess modern Turkic languages. This is important both from a historical point of view and for the study of modern issues of neural machine translation, issues of the linguistic distance of modern Turkic languages from their progenitor. This paper proposes the development of a computational model of the morphology of Old Turkic language based on the CSE (Complete Set of Endings) – model of morphology and a study on this basis of the issue of morphological segmentation of the texts of Old Turkic language, which will subsequently be used for neural machine translation of Old Turkic language into modern Turkic languages. Since most of the modern Turkic languages, except for the Turkish language, belong to low-resource languages, the issues of developing computational models of morphology, developing models, algorithms and software for processing Turkic languages are relevant.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Tukeyev, U.: Automaton models of the morphology analysis and the completeness of the endings of the Kazakh language. In: Proceedings of the International Conference “Turkic Languages Processing” TURKLANG 2015, Kazan, Tatarstan, Russia, pp. 91–100, 17–19 Sep 2015
NLP-KazNU. https://github.com/NLP-KazNU?tab=repositories. Accessed 16 Apr 2021
Harris Z.: Methods in Structural Linguistics. Chicago University Press (1951)
Goldsmith, S.: Unsupervised learning of the morphology of a natural language. Comput. Linguist. 27(2), 153–198 (2001)
Creutz, M., Lagus, K.: Unsupervised discovery of morphemes. In: Proceedings of the ACL 2002 Workshop on Morphological and Phonological Learning, vol. 6, pp. 21–30 (2002)
Poon, H., Cherry, C., Toutanova, K.: Unsupervised morphological segmentation with log-linear models. In: NAACL-HLT, pp. 209–217. Association for Computational Linguistics (2009)
Gronroos, S., Virpioja, S., Smit, P., Kurimo, M.: Morfessor flatcat: an HMM-based method for unsupervised and semi-supervised learning of morphology. In: COLING, pp. 1177–1185 (2014)
Hulden M.: Foma: a finite-state compiler and library. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics: Demonstrations Session, pp. 29–32. Association for Computational Linguistics (2009)
Lindén, K., Axelson, E., Hardwick, S., Pirinen, T.A., Silfverberg, M.: HFST—framework for compiling and applying morphologies. In: Mahlow, C., Piotrowski, M. (eds.) SFCM 2011. CCIS, vol. 100, pp. 67–85. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23138-4_5
Ataman, D., Negri, M., Turchi, M., Federico, M.: Linguistically motivated vocabulary reduction for neural machine translation from Turkish to English. Prague Bull. Math. Linguist. 108(1), 331–342 (2017)
Ataman, D., Federico, M.: An evaluation of two vocabulary reduction methods for neural machine translation. In: Proceedings of the 13th Conference of the Association for Machine Translation in the Americas, vol. 1, pp. 97–110 (2018)
Toral, A., Edman, L., Yeshmagambetova, G., Spenader, J.: Neural machine translation for English-Kazakh with morphological segmentation and synthetic data. In: Proceedings of the Fourth on Machine Translation (Volume 2: Shared Task Papers, Day 1), pp. 386–392. Association for Computational Linguistics, Florence, Italy (2019)
Huck, M., Riess, S., Fraser, A.: Target-side word segmentation strategies for neural machine translation. In: Proceedings of the Second Conference on Machine Translation, pp. 56–67. Association for Computational Linguistics, Copenhagen, Denmark (2017)
Weller-Di, M., Fraser, A.: Modeling word formation in English–German neural ma-chine translation. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 4227–4232, Online. Association for Computational Linguistics (2020)
Schone, P., Jurafsky, D.: Is knowledge-free induction of multiword unit dictionary headwords a solved problem? In: Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing, pp. 100–108 (2001)
Wu, Y., Zhao, H.: Finding better subword segmentation for neural machine translation. In: Sun, M., Liu, T., Wang, X., Liu, Z., Liu, Y. (eds.) CCL/NLP-NABD-2018. LNCS (LNAI), vol. 11221, pp. 53–64. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01716-3_5
Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 1715–1725 (2016)
Scherrer, Y., Gronroos, S., Virpioja, S.: The University of Helsinki and Aalto university submissions to the WMT 2020 news and low-resource translation tasks. In: Proceedings of the Fifth Conference on Machine Translation, pp. 1129–1138. Online Association for Computational Linguistics (2020)
Baskakov, A., Xasanov, B.: Languages of interethnic communication in Kazakhstan. In: Wurm, S.A., Mühlhäusler, P., Tryon, D.T. (eds.) Atlas of Languages of Intercultural Communication in the Pacific, Asia and the Americas, vol. 2, pp. 933–936. de Gruyter, Berlin (1996)
Kondratiev, V.: The Grammatical Structure of the Language of the Ancient Turkic Manuscripts of the VIII–XI Centuries, pp. 191–200. Publ. House of Leningrad University, Leningrad (1981)
Erdal, M.: A Grammar of Old Turkic. Brill, Leiden and Boston (2004)
Aydarov, G., Kuryshzhanov, A., Tomanov, M.: A Language of Ancient Turkic Written Monuments. Mektep, Almaty (1971)
Aydarov, G., Kuryshzhanov, A., Tomanov, M.: A Language of Ancient Turkic Written Monuments, pp. 111–123. Almaty, Mektep (1971)
Malov, S.: Monuments of the Ancient Turkic Writing of Mongolia and Kyrgyzstan, pp. 48–50. Leningrad, Moscow (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhanabergenova, D., Tukeyev, U. (2021). Morphology Model and Segmentation for Old Turkic Language. In: Nguyen, N.T., Iliadis, L., Maglogiannis, I., Trawiński, B. (eds) Computational Collective Intelligence. ICCCI 2021. Lecture Notes in Computer Science(), vol 12876. Springer, Cham. https://doi.org/10.1007/978-3-030-88081-1_47
Download citation
DOI: https://doi.org/10.1007/978-3-030-88081-1_47
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88080-4
Online ISBN: 978-3-030-88081-1
eBook Packages: Computer ScienceComputer Science (R0)