Abstract
Studies of speech-to-speech machine translation for Turkic languages are practically absent due to the difficulties of creating parallel speech corpora for training neural models. Therefore, the actual problem is creation of synthetic parallel corpora for investigation of speech machine translation of Turkic languages. In this work, a technology of formation of Turkic speech parallel corpora on the base of CSE (Complete Set of Endings) morphology model is proposed. For this technology of formation of Turkic speech parallel corpora is used the cascade scheme: speech-to-text, text-to-text and text-to-speech. A feature of this scheme is that it is used for the phase “text-to-text” a relational model of translation based on CSE morphology model. The scientific contribution of this work is the development of the technology of forming parallel speech corpora of the Kazakh-Uzbek language pairs, based on a cascade scheme for machine translation of speech on relational models. In the future, the formed synthetic parallel speech corpora will be used to train the neural machine translation of the speech of the Turkic languages.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Koehn, P., Knowles, R.: Six challenges for neural machine translation. In: Proceedings of the First Workshop on Neural Machine Translation, pp. 28–39, Vancouver, Canada (2017)
Tukeyev, U.: Automaton models of the morphology analysis and the completeness of the endings of the Kazakh language. In: Proceedings of the international conference “Turkic languages processing” TURKLANG-2015 September 17–19, pp. 91–100. Kazan, Tatarstan, Russia (2015). (in Russian)
Tukeyev, U., Karibayeva, Ai.: Inferring the complete set of Kazakh endings as a language resource. In: Hernes, M., Wojtkiewicz, K., Szczerbicki, E. (eds.) Advances in Computational Collective Intelligence: 12th International Conference, ICCCI 2020, Da Nang, Vietnam, 30 Nov–3 Dec 2020, Proceedings, pp. 741–751. Springer International Publishing, Cham (2020). https://doi.org/10.1007/978-3-030-63119-2_60
Tukeyev, U., Sundetova, A., Abduali, B., Akhmadiyeva, Z., Zhanbussunov, N.: Inferring of the morphological chunk transfer rules on the base of complete set of Kazakh endings. In: Nguyen, N.-T., Manolopoulos, Y., Iliadis, L., Trawiński, B. (eds.) ICCCI 2016. LNCS (LNAI), vol. 9876, pp. 563–574. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45246-3_54
Tukeyev, U., Karibayeva, A., Zhumanov, Z.: Morphological segmentation method for Turkic language neural machine translation. Cogent Eng. 7(1), 1856500 (2020). https://doi.org/10.1080/23311916.2020.1856500
Tukeyev, U., Karibayeva, A., Turganbayeva, A., Amirova, D.: Universal programs for stemming, segmentation, morphological analysis of Turkic words. In: Thanh Nguyen, N., Iliadis, L., Maglogiannis, I., Trawiński, B. (eds.) Computational Collective Intelligence: 13th International Conference, ICCCI 2021, Rhodes, Greece, September 29 – October 1, 2021, Proceedings, pp. 643–654. Springer International Publishing, Cham (2021). https://doi.org/10.1007/978-3-030-88081-1_48
Matlatipov, S., Tukeyev, U., Aripov, M.: Towards the uzbek language endings as a language resource. In: Hernes, M., Wojtkiewicz, K., Szczerbicki, E. (eds.) Advances in Computational Collective Intelligence: 12th International Conference, ICCCI 2020, Da Nang, Vietnam, 30 Nov–3 Dec 2020, Proceedings, pp. 729–740. Springer International Publishing, Cham (2020). https://doi.org/10.1007/978-3-030-63119-2_59
Toleush, A., Israilova, N., Tukeyev, U.: Development of morphological segmentation for the kyrgyz language on complete set of endings. In: Nguyen, N.T., Chittayasothorn, S., Niyato, D., Trawiński, B. (eds.) Intelligent Information and Database Systems: 13th Asian Conference, ACIIDS 2021, Phuket, Thailand, 7–10 Apr 2021, Proceedings, pp. 327–339. Springer International Publishing, Cham (2021). https://doi.org/10.1007/978-3-030-73280-6_26
Qamet, A., Zhakypbayeva, K., Turganbayeva, A., Tukeyev, U.: Development Kazakh-Turkish machine translation on the base of complete set of endings model. In: Szczerbicki, E., Wojtkiewicz, K., Van Nguyen, S., Pietranik, M., Krótkiewicz, M. (eds.) Recent Challenges in Intelligent Information and Database Systems: 14th Asian Conference, ACIIDS 2022, Ho Chi Minh City, Vietnam, 28–30 Nov 2022, Proceedings, pp. 543–555. Springer Nature Singapore, Singapore (2022). https://doi.org/10.1007/978-981-19-8234-7_42
Lavie, A., et al.: JANUS-III: Speech-to-speech translation in multiple languages. In: Proceedings of the ICASSP 1997 (1997)
Wahlster, W. (ed.): Verbmobil: Foundations of Speech-to-Speech Translation. Springer, Berlin, Heidelberg (2000)
Nakamura, S., et al.: The ATR multilingual speech-to-speech translation system. IEEE Trans. Audio Speech Language Process. 14(2), 365–376 (2006)
Guo, M., Haque, A., Verma, P.: End-to-end spoken language translation, arXiv preprint arXiv:1904.10760 (2019)
Jia, Y., et al.: Direct speech-to-speech translation with a sequence-to-sequence model arXiv:1904.06037v2 (2019)
Papi, S., Gaido, M., Negri, M., Turchi, M.: Speechformer: Reducing Information Loss in Direct Speech Translation arXiv:2109.04574v1 (2021)
Kano, T., Sakti, S., Nakamura, S.: Transformer-based direct speech-to-speech translation with transcoder. In: 2021 IEEE Spoken Language Technology Workshop (SLT), pp. 958–965. IEEE (2021)
Papi, S., Gaido, M., Karakanta, A., Cettolo, M., Negri, M., Turchi, M.: Direct Speech Translation for Automatic Subtitling. CoRR abs/2209.13192 (2022)
Bentivogli, L., et al.: Cascade versus Direct Speech Translation: Do the Differences Still Make a Difference? ACL/IJCNLP (1) 2021, pp. 2873–2887 (2021)
Niehues, J., Salesky, E., Turchi, M., Negri, M.: Tutorial Proposal: End-to-End Speech Translation. EACL (Tutorial Abstracts) 2021, pp. 10–13 (2021)
Wang, C., et al.: Simple and Effective Unsupervised Speech Translation. arXiv:2210.10191v1 [cs.CL] (2022)
Bansal, S., Kamper, H., Livescu, K., Lopez, A., Goldwater, S.: Low resource speech-to-text translation. Proc. Interspeech 2018, 1298–1302 (2018)
Bansal, S., Kamper, H., Livescu, K., Lopez, A., Goldwater, S.: Pretraining on high-resource speech recognition improves low-resource speech-to-text translation. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long and Short Papers), pp. 58–68 (2019)
Cheng, Y.-F., Hung-Shin Lee, H.-S., Wang, H.-M.: AlloST: Low-Resource Speech Translation Without Source Transcription. In: Proceedings of the Interspeech 2021, pp. 2252–2256 (2021)
Chung, Y.-A., Weng, W.-H., Tong, S., James Glass, J.: Towards unsupervised speech-to-text translation. In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7170–7174. IEEE (2019)
Karakanta, A., Negri, M., Turchi, M.: MuST-Cinema: a Speech-to-Subtitles corpus. LREC 2020, pp. 3727–3734 (2020)
Jia, Y., Ramanovich, M.T., Wang, Q., Zen, H.: Cvss corpus and massively multilingual speech-to-speech translation. arXiv preprint arXiv:2201.03713 (2022)
Bentivogli, L., Mauro, C., Marco, G., Alina, K., Matteo, N., Marco, T.: Extending the MuST-C Corpus for a Comparative Evaluation of Speech Translation Technology. EAMT 2022, pp. 359–360 (2022)
Musaev, M., Mussakhojayeva, S., Khujayorov, I., Khassanov, Y., Ochilov, M., Varol, H.A.: USC: An Open-Source Uzbek Speech Corpus and Initial Speech Recognition Experiments. In: Karpov, A., Potapova, R. (eds.) SPECOM 2021. LNCS (LNAI), vol. 12997, pp. 437–447. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87802-3_40
Mussakhojayeva, S., Janaliyeva, A., Mirzakhmetov, A., Khassanov, Y., Varol, H.A.: KazakhTTS: an open-source kazakh text-to-speech synthesis dataset. In: Proceedings of the Interspeech 2021, pp. 2786–2790. https://doi.org/10.21437/Interspeech.2021-2124Open-Source Kazakh Text-to-Speech Synthesis Dataset. arXiv preprint arXiv:2104.08459 (2021)
Mamyrbayev, O., Alimhan, K., Zhumazhanov, B., Turdalykyzy, T., Gusmanova, F.: End-to-End Speech Recognition in Agglutinative Languages. In: Nguyen, N.T., Jearanaitanakij, K., Selamat, A., Trawiński, B., Chittayasothorn, S. (eds.) ACIIDS 2020. LNCS (LNAI), vol. 12034, pp. 391–401. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-42058-1_33
Mamyrbayev, O., Alimhan, K., Oralbekova, D., Bekarystankyzy A., Zhumazhanov, B.: Identifying the influence of transfer learning methods in developing an end-to-end automatic speech recognition system with a low data level. Eastern-European J. Enterprise Technol. 1(9(115)), 84–92 (2022). https://doi.org/10.15587/1729-4061.2022.252801
Maмыpбaeв, O.Ж., Opaлбeкoвa, Д.O., Aлимxaн, K., Othman, M., Жyмaжaнoв, Б.: Пpимeнeниe гибpиднoй интeгpaльнoй мoдeли для pacпoзнaвaния кaзaxcкoй peчи. News of the National academy of sciences of the republic of Kazakhstan. 1(341), 58–68 (2022). https://doi.org/10.32014/2022.2518-1726.117
Khassanov, Y., Mussakhojayeva, S., Mirzakhmetov, A., Adiyev, A., Nurpeiissov, M., Varol, H.A.: A crowdsourced open-source Kazakh speech corpus and initial speech recognition baseline. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pp. 697–706. Association for Computational Linguistics (2021). https://issai.nu.edu.kz/ru/%d0%b3%d0%bb%d0%b0%d0%b2%d0%bd%d0%b0%d1%8f/#research
NLP-KAZNU/Kazakh-Uzbek machine translation. https://github.com/NLP-KazNU/Kazakh-Uzbek-machine-translation-on-the-base-of-CSE-model. Access date: 1 Mar 2023
Wolf, T., et al.: HuggingFace’s Transformers: State-of-the-art Natural Language Processing. arXiv (2020). https://doi.org/10.48550/arXiv.1910.03771
Baevski, A., Zhou H., Mohamed A., Auli, M.: wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations. arXiv (2020). https://doi.org/10.48550/arXiv.2006.11477.
Mussakhojayeva, S., Janaliyeva, A., Mirzakhmetov, A., Khassanov, Y., Varol, H.A.: KazakhTTS: An Open-Source Kazakh Text-to-Speech Synthesis Dataset. arXiv:2104.08459v3 [eess.AS] (2021)
Shen, J., et al.: Natural TTS synthesis by conditioning Wavenet on MEL spectrogram predictions. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4779–4783. IEEE (2018)
Li, N., Liu, S., Liu, Y., Zhao, S., Liu, M.: Neural speech synthesis with transformer network. Proc. AAAI Conf. Artif. Intell. 33(01), 6706–6713 (2019). https://doi.org/10.1609/aaai.v33i01.33016706
Sacrebleu: https://github.com/mjpost/sacrebleu. Access date 1 Mar 2023
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Balabekova, T., Kairatuly, B., Tukeyev, U. (2023). Kazakh-Uzbek Speech Cascade Machine Translation on Complete Set of Endings. In: Nguyen, N.T., et al. Advances in Computational Collective Intelligence. ICCCI 2023. Communications in Computer and Information Science, vol 1864. Springer, Cham. https://doi.org/10.1007/978-3-031-41774-0_34
Download citation
DOI: https://doi.org/10.1007/978-3-031-41774-0_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-41773-3
Online ISBN: 978-3-031-41774-0
eBook Packages: Computer ScienceComputer Science (R0)