Learning Word Alignment Models for Kazakh-English Machine Translation

Kartbayev, Amandyk

doi:10.1007/978-3-319-25135-6_31

Amandyk Kartbayev¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9376))

Included in the following conference series:

International Symposium on Integrated Uncertainty in Knowledge Modelling and Decision Making

1063 Accesses

Abstract

In this paper, we address to the most essential challenges in the word alignment quality. Word alignment is a widely used phenomenon in the field of machine translation. However, a small research has been dedicated to the revealing of its discrete properties. This paper presents word segmentation, the probability distributions, and the statistical properties of word alignment in the transparent and a real life dataset. The result suggests that there is no single best method for alignment evaluation. For Kazakh-English pair we attempted to improve the phrase tables with the choice of alignment method, which need to be adapted to the requirements in the specific project. Experimental results show that the processed parallel data reduced word alignment error rate and achieved the highest BLEU improvement on the random parallel corpora.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Refining Kazakh Word Alignment Using Simulation Modeling Methods for Statistical Machine Translation

Hybrid Word Alignment

Experimenting with Different Machine Translation Models in Medium-Resource Settings

References

Bekbulatov, E., Kartbayev, A.: A study of certain morphological structures of Kazakh and their impact on the machine translation quality. In: IEEE 8th International Conference on Application of Information and Communication Technologies, Astana, pp. 1–5 (2014)
Google Scholar
Oflazer, K., El-Kahlout, D.: Exploring different representational units in English-to-Turkish statistical machine translation. In: 2nd Workshop on Statistical Machine Translation, Prague, pp. 25–32 (2007)
Google Scholar
Bisazza, A., Federico, M.: Morphological pre-processing for Turkish to English statistical machine translation. In: International Workshop on Spoken Language Translation 2009, Tokyo, pp. 129–135 (2009)
Google Scholar
Moore, R.: Improving IBM word alignment model 1. In: 42nd Annual Meeting on Association for Computational Linguistics, Barcelona, pp. 518–525 (2004)
Google Scholar
Brown, P.F., Della Pietra, V.J., Della Pietra, S.A., Mercer, R.L.: The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19, 263–311 (1993). MIT Press Cambridge, MA
Google Scholar
Vogel, S., Ney, H., Tillmann, C.: HMM-based word alignment in statistical translation. In: 16th International Conference on Computational Linguistics, Copenhagen, pp. 836–841 (1996)
Google Scholar
Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B 39, 1–38 (1977). Wiley-Blackwell, UK
MathSciNet MATH Google Scholar
Creutz, M., Lagus, K.: Unsupervised models for morpheme segmentation and morphology learning. ACM Transactions on Speech and Language Processing 4, article 3. Association for Computing Machinery, New York (2007)
Google Scholar
Beesley, K.R., Karttunen, L.: Finite State Morphology. CSLI Publications, Palo Alto (2003)
Google Scholar
Goldsmith, J.: Unsupervised learning of the morphology of a natural language. Computational Linguistics 27, 153–198 (2001). MIT Press Cambridge, MA
Article MathSciNet Google Scholar
Altenbek, G., Xiao-Long, W.: Kazakh segmentation system of inflectional affixes. In: CIPS-SIGHAN Joint Conference on Chinese Language Processing, Beijing, pp. 183–190 (2010)
Google Scholar
Kairakbay, B.: A nominal paradigm of the Kazakh language. In: 11th International Conference on Finite State Methods and Natural Language Processing, St. Andrews, pp. 108–112 (2013)
Google Scholar
Lindén, K., Axelson, E., Hardwick, S., Pirinen, T.A., Silfverberg, M.: HFST—framework for compiling and applying morphologies. In: Mahlow, C., Piotrowski, M. (eds.) SFCM 2011. CCIS, vol. 100, pp. 67–85. Springer, Heidelberg (2011)
Chapter Google Scholar
Och, F.J., Ney, H.: A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics 29, 19–51 (2003). MIT Press Cambridge, MA
Article MATH Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.: BLEU: a method for automatic evaluation of machine translation. In: 40th Annual Meeting of the Association for Computational Linguistics, Philadephia, pp. 311–318 (2002)
Google Scholar
Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19, 61–64 (1993). MIT Press Cambridge, MA
Google Scholar
Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: 20th International Joint Conference on Artificial Intelligence, Hyderabad, pp. 1606–1611 (2007)
Google Scholar
Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., Herbst, E.: Moses: open source toolkit for statistical machine translation. In: 45th Annual Meeting of the Association for Computational Linguistics, Prague, pp. 177–180 (2007)
Google Scholar
Tapias, D., Rosner, M., Piperidis, S., Odjik, J., Mariani, J., Maegaard, B., Choukri, K., Calzolari, N.: MultiUN: a multilingual corpus from united nation documents. In: Seventh Conference on International Language Resources and Evaluation, La Valletta, pp. 868–872 (2010)
Google Scholar
Och, F.J.: Minimum error rate training in statistical machine translation. In: 41st Annual Meeting of the Association for Computational Linguistics, Sapporo, pp. 160–167 (2003)
Google Scholar
Federico, M., Bertoldi, N., Cettolo, M.: IRSTLM: an open source toolkit for handling large scale language models. In: Interspeech 2008, Brisbane, pp. 1618–1621 (2008)
Google Scholar
Heafield, K.: Kenlm: faster and smaller language model queries. In: Sixth Workshop on Statistical Machine Translation, Edinburgh, pp. 187–197 (2011)
Google Scholar
Clark, J.H., Dyer, C., Lavie, A., Smith, N.A.: Better hypothesis testing for statistical machine translation: controlling for optimizer instability. In: 49th Annual Meeting of the Association for Computational Linguistics, Portland, pp. 176–181 (2011)
Google Scholar
Snover, M., Dorr, B., Schwartz, R., Micciulla, L., Makhoul, J.: A study of translation edit rate with targeted human annotation. In: Association for Machine Translation in the Americas, Cambridge, pp. 223–231 (2006)
Google Scholar
Denkowski, M., Lavie, A.: Meteor 1.3: automatic metric for reliable optimization and evaluation of machine translation systems. In: Workshop on Statistical Machine Translation EMNLP 2011, Edinburgh, pp. 85–91 (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Laboratory of Intelligent Information Systems, Al-Farabi Kazakh National University, Almaty, Kazakhstan
Amandyk Kartbayev

Authors

Amandyk Kartbayev
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Amandyk Kartbayev .

Editor information

Editors and Affiliations

Science and Technology, Japan Advanced Institute of, Nomi, Japan
Van-Nam Huynh
Science, Dept of Systems Innovation, Osaka University,Graduate School of, Osaka, Japan
Masahiro Inuiguchi
Université de Technologie de Compiègne, Compiègne, France
Thierry Demoeux

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kartbayev, A. (2015). Learning Word Alignment Models for Kazakh-English Machine Translation. In: Huynh, VN., Inuiguchi, M., Demoeux, T. (eds) Integrated Uncertainty in Knowledge Modelling and Decision Making. IUKM 2015. Lecture Notes in Computer Science(), vol 9376. Springer, Cham. https://doi.org/10.1007/978-3-319-25135-6_31

Download citation

DOI: https://doi.org/10.1007/978-3-319-25135-6_31
Published: 01 January 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25134-9
Online ISBN: 978-3-319-25135-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Learning Word Alignment Models for Kazakh-English Machine Translation

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Refining Kazakh Word Alignment Using Simulation Modeling Methods for Statistical Machine Translation

Hybrid Word Alignment

Experimenting with Different Machine Translation Models in Medium-Resource Settings

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Learning Word Alignment Models for Kazakh-English Machine Translation

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Refining Kazakh Word Alignment Using Simulation Modeling Methods for Statistical Machine Translation

Hybrid Word Alignment

Experimenting with Different Machine Translation Models in Medium-Resource Settings

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation