Polish-English Statistical Machine Translation of Medical Texts

Wołk, Krzysztof; Marasek, Krzysztof

doi:10.1007/978-3-319-10383-9_16

Krzysztof Wołk⁵ &
Krzysztof Marasek⁵

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 314))

638 Accesses

Abstract

This new research explores the effects of various training methods on a Polish to English Statistical Machine Translation system for medical texts. Various elements of the EMEA parallel text corpora from the OPUS project were used as the basis for training of phrase tables and language models and for development, tuning and testing of the translation system. The BLEU, NIST, METEOR, RIBES and TER metrics have been used to evaluate the effects of various system and data preparations on translation results. Our experiments included systems that used POS tagging, factored phrase models, hierarchical models, syntactic taggers, and many different alignment methods. We also conducted a deep analysis of Polish data as preparatory work for automatic data correction such as true casing and punctuation normalization phase.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Analysis of Complexity Between Spoken and Written Language for Statistical Machine Translation in West-Slavic Group

A Hybrid Approach to Statistical Machine Translation Between Standard and Dialectal Varieties

Slavic languages in phrase-based statistical machine translation: a survey

Article 06 May 2017

References

Goeuriot, L., Jones, G., Kelly, L., Kriewel, S., Pecina, P.: Report on and prototype of the translation support. Khresmoi Public Deliverable 3 (2012)
Google Scholar
Pletneva, N., Vargas, A., Boyer, C.: Requirements for the general public health search. Khresmoi Public Deliverable, D, 8 (2011)
Google Scholar
Gschwandtner, M., Kritz, M., Boyer, C.: Requirements of the health professional search. Khresmoi Project Public Deliverable, D8. 1.2 (2011)
Google Scholar
GCH Benefits, Medical Phrases and Terms Translation Demo, n.d. (accessed February 28, 2014)
Google Scholar
Karliner, L.S., Jacobs, E.A., Chen, A.H., Mutha, S.: Do professional interpreters improve clinical care for patients with limited English proficiency? A systematic review of the literature. Health Services Research 42(2), 727–754 (2007)
Article Google Scholar
Randhawa, G., Ferreyra, M., Ahmed, R., Ezzat, O., Pottie, K.: Using machine translation in clinical practice. Canadian Family Physician 59(4), 382–383 (2013)
Google Scholar
Deschenes, S.: 5 benefits of healthcare translation technology. Healthcare Finance News (October 16, 2012)
Google Scholar
Zadon, C.: Man vs machine: the benefits of medical translation services. Ezine Articles: Healthcare Systems (2013)
Google Scholar
Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zen, S.R., Dyer, C., Bojar, R., Constantin, A., Herbst, E.: Moses: open source toolkit for statistical machine translation. In: Proceedings of the ACL 2007 Demo and Poster Sessions, pp. 177–180 (2007)
Google Scholar
Radziszewski, A.: A tiered CRF tagger for Polish. In: Bembenik, R., Skonieczny, Ł., Rybiński, H., Kryszkiewicz, M., Niezgódka, M. (eds.) Intell. Tools for Building a Scientific Information. SCI, vol. 467, pp. 215–230. Springer, Heidelberg (2013)
Chapter Google Scholar
Axelrod, A.E.: Factored Language Models for Statistical Machine Translation, University of Edinburgh, Master of Science Thesis (2006)
Google Scholar
Koehn, P.: What is a better translation? Reflections on six years of running evaluation campaigns. Auditorium du CNRS, Paris (2011)
Google Scholar
Papineni, K., Rouskos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, pp. 311–318 (2002)
Google Scholar
Banerjee, S., Lavie, A.: METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, Ann Arbor, pp. 65–72 (2005)
Google Scholar
Doddington, G.: Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In: Proceedings of the Second International Conference on Human Language Technology (HLT) Research, San Diego, pp. 138–145 (2002)
Google Scholar
Isozaki, H., Hirao, T., Duh, K., Sudoh, K., Tsukada, H.: Automatic evaluation of translation quality for distant language pairs. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 944–952 (2010)
Google Scholar
Snover, M., Dorr, B., Schwartz, R., Micciulla, L., Makhoul, J.: A study of translation edit rate with targeted human annotation. In: Proceedings of the 7th Conference of the Association for Machine Translation in the Americas, Cambridge (2006)
Google Scholar
Koehn, P., et al.: Moses: open source toolkit for statistical machine translation. In: Annual Meeting of the Association for Computational Linguistics (ACL) Demonstration Session, Prague (June 2007)
Google Scholar
Koehn, P., Axelrod, A., Birch, A., Callison-Burch, C., Osborne, M., Talbot, D., White, M.: Edinburgh system description for the 2005 IWSLT speech translation evaluation. In: IWSLT, pp. 68–75 (2005), http://www.cs.jhu.edu/~ccb/publications/iwslt05-report.pdf
Heafield, K.: KenLM: faster and smaller language model queries. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, Association for Computational Linguistics, pp. 187–197 (2011)
Google Scholar
Ruiz Costa-Jussà, M., Rodríguez Fonollosa, J.A.: Using linear interpolation and weighted reordering hypotheses in the Moses system, Barcelona, Spain (2010)
Google Scholar
Stolcke, A.: SRILM – an extensible language modeling toolkit. In: INTERSPEECH (2002)
Google Scholar
Gao, Q., Vogel, S.: Parallel implementations of word alignment tool. In: Software Engineering, Testing, and Quality Assurance for Natural Language Processing, pp. 49–57 (2008)
Google Scholar
Moses Factored Training Tutorial, http://www.statmt.org/moses/?n=FactoredTraining.EMS
Durrani, N., Schmid, H., Fraser, A., Sajjad, H., Farkas, R.: Munich-Edinburgh-Stuttgart Submissions of OSM Systems at WMT13. In: ACL 2013 Eight Workshop on Statistical Machine Translation, Sofia, Bulgaria (2013)
Google Scholar
Koehn, P., Hoang, H.: Factored translation models. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, pp. 868–876 (2007)
Google Scholar
Bikel, D.: Intricacies of Collins’ parsing model. Computational Linguistics 30(4), 479–511 (2004)
Article MATH Google Scholar
Dyer, C., Chahuneau, V., Smith, N.: A simple, fast and effective reparametrization of IMB Model 2. In: Proceedings of NAACL (2013)
Google Scholar
Bojar, O.: Rich morphology and what can we expect from hybrid approaches to MT. Invited talk at International Workshop on Using Linguistic Information for Hybrid Machine Translation (LIHMT 2011) (2011), http://ufal.mff.cuni.cz/~bojar/publications/2011-FILEbojar_lihmt_2011_pres-PRESENTED.pdf
Hasan, A., Islam, S., Rahman, M.: A comparative study of Witten Bell and Kneser-Ney smoothing methods for statistical machine translation. Journal of Information Technology 1, 1–6 (2012)
Google Scholar
Wolk, K., Marasek, K.: Polish-English speech statistical machine translation systems for the IWSLT 2013. In: Proceedings of the 10th International Workshop on Spoken Language Translation, Heidelberg, Germany (2013)
Google Scholar
Bojar, O., Buck, C., Callison-Burch, C., Federmann, C., Haddow, B., Koehn, P., Monz, C., Post, M., Soricut, R., Specia, L.: Findings of the 2013 Workshop on Statistical Machine Translation. In: Proceedings of the Eight Workshop on Statistical Machine Translation. Association for Computational Linguistics, Sofia (2013)
Google Scholar
Radziszewski, A., Śniatowski, T.: Maca – a configurable tool to integrate Polish morphological data. In: Proceedings of the Second International Workshop on Free/OpenSource Rule-Based Machine Translation, FreeRBMT 2011, Barcelona (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Multimedia, Polish-Japanese Institute of Information Technology, ul. Koszykowa 86, 02-008, Warszawa, Poland
Krzysztof Wołk & Krzysztof Marasek

Authors

Krzysztof Wołk
View author publications
You can also search for this author in PubMed Google Scholar
Krzysztof Marasek
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Krzysztof Wołk .

Editor information

Editors and Affiliations

Division of Information Systems, Wroclaw University of Technology Institute of Informatics, Wrocław, Poland
Aleksander Zgrzywa
Division of Information Systems Institute of Informatics, Wroclaw University of Technology, Wrocław, Poland
Kazimierz Choroś
Division of Information Systems Institute of Informatics, Wroclaw University of Technology, Wrocław, Poland
Andrzej Siemiński

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wołk, K., Marasek, K. (2015). Polish-English Statistical Machine Translation of Medical Texts. In: Zgrzywa, A., Choroś, K., Siemiński, A. (eds) New Research in Multimedia and Internet Systems. Advances in Intelligent Systems and Computing, vol 314. Springer, Cham. https://doi.org/10.1007/978-3-319-10383-9_16

Download citation

DOI: https://doi.org/10.1007/978-3-319-10383-9_16
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10382-2
Online ISBN: 978-3-319-10383-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics