Skip to main content

New Kazakh Parallel Text Corpora with On-line Access

  • Conference paper
  • First Online:
Book cover Computational Collective Intelligence (ICCCI 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10449))

Included in the following conference series:

Abstract

This paper presents a new parallel resource – text corpora – for Kazakh language with on-line access. We describe 3 different approaches to collecting parallel text and how much data we managed to collect using them, parallel Kazakh-English text corpora collected from various sources and aligned on sentence level, and web accessible corpus management system that was set up using open source tools – corpus manager Mantee and web GUI KonText. As a result of our work we present working web-accessible corpus management system to work with collected corpora.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  1. Sereda, I.: Approaches to corpora classification in modern corpus linguistics (2012)

    Google Scholar 

  2. Makhambetov, O., Makazhanov, A., Yessenbayev, Z., Matkarimov, B., Sabyrgaliyev, I., Sharafudinov, A.: Assembling the Kazakh language corpus. In: EMNLP, pp. 1022–1031, October 2013

    Google Scholar 

  3. Tiedemann, J., Nygaard, L.: OPUS-an open source parallel corpus. In: Proceedings of the 13th Nordic Conference on Computational Linguistics (NODALIDA). University of Iceland, Reykjavik (2003)

    Google Scholar 

  4. Esplá-Gomis, M., Forcada, M.L.: Bitextor, a free/open-source software to harvest translation memories from multilingual websites. In: Proceedings of MT Summit XII, Ottawa, Canada. Association for Machine Translation in the Americas (2009)

    Google Scholar 

  5. Varga, D., Németh, L., Halácsy, P., Kornai, A., Trón, V., Nagy, V.: Parallel corpora for medium density languages In: Proceedings of the RANLP 2005, pp 590–596 (2005)

    Google Scholar 

  6. Vondřička, P.: Aligning parallel texts with InterText. In: Calzolari, N., et al. (eds.) Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014), pp. 1875–1879. European Language Resources Association (ELRA) (2014)

    Google Scholar 

  7. Rychlý, P.: Manatee/bonito-a modular corpus manager. In: 1st Workshop on Recent Advances in Slavonic Natural Language Processing, pp. 65–70, December 2007

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhandos Zhumanov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Zhumanov, Z., Madiyeva, A., Rakhimova, D. (2017). New Kazakh Parallel Text Corpora with On-line Access. In: Nguyen, N., Papadopoulos, G., Jędrzejowicz, P., Trawiński, B., Vossen, G. (eds) Computational Collective Intelligence. ICCCI 2017. Lecture Notes in Computer Science(), vol 10449. Springer, Cham. https://doi.org/10.1007/978-3-319-67077-5_48

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-67077-5_48

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-67076-8

  • Online ISBN: 978-3-319-67077-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics