Abstract
This paper presents a new parallel resource – text corpora – for Kazakh language with on-line access. We describe 3 different approaches to collecting parallel text and how much data we managed to collect using them, parallel Kazakh-English text corpora collected from various sources and aligned on sentence level, and web accessible corpus management system that was set up using open source tools – corpus manager Mantee and web GUI KonText. As a result of our work we present working web-accessible corpus management system to work with collected corpora.
References
Sereda, I.: Approaches to corpora classification in modern corpus linguistics (2012)
Makhambetov, O., Makazhanov, A., Yessenbayev, Z., Matkarimov, B., Sabyrgaliyev, I., Sharafudinov, A.: Assembling the Kazakh language corpus. In: EMNLP, pp. 1022–1031, October 2013
Tiedemann, J., Nygaard, L.: OPUS-an open source parallel corpus. In: Proceedings of the 13th Nordic Conference on Computational Linguistics (NODALIDA). University of Iceland, Reykjavik (2003)
Esplá-Gomis, M., Forcada, M.L.: Bitextor, a free/open-source software to harvest translation memories from multilingual websites. In: Proceedings of MT Summit XII, Ottawa, Canada. Association for Machine Translation in the Americas (2009)
Varga, D., Németh, L., Halácsy, P., Kornai, A., Trón, V., Nagy, V.: Parallel corpora for medium density languages In: Proceedings of the RANLP 2005, pp 590–596 (2005)
Vondřička, P.: Aligning parallel texts with InterText. In: Calzolari, N., et al. (eds.) Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014), pp. 1875–1879. European Language Resources Association (ELRA) (2014)
Rychlý, P.: Manatee/bonito-a modular corpus manager. In: 1st Workshop on Recent Advances in Slavonic Natural Language Processing, pp. 65–70, December 2007
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Zhumanov, Z., Madiyeva, A., Rakhimova, D. (2017). New Kazakh Parallel Text Corpora with On-line Access. In: Nguyen, N., Papadopoulos, G., Jędrzejowicz, P., Trawiński, B., Vossen, G. (eds) Computational Collective Intelligence. ICCCI 2017. Lecture Notes in Computer Science(), vol 10449. Springer, Cham. https://doi.org/10.1007/978-3-319-67077-5_48
Download citation
DOI: https://doi.org/10.1007/978-3-319-67077-5_48
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67076-8
Online ISBN: 978-3-319-67077-5
eBook Packages: Computer ScienceComputer Science (R0)