Abstract
This new research explores the effects of various training methods on a Polish to English Statistical Machine Translation system for medical texts. Various elements of the EMEA parallel text corpora from the OPUS project were used as the basis for training of phrase tables and language models and for development, tuning and testing of the translation system. The BLEU, NIST, METEOR, RIBES and TER metrics have been used to evaluate the effects of various system and data preparations on translation results. Our experiments included systems that used POS tagging, factored phrase models, hierarchical models, syntactic taggers, and many different alignment methods. We also conducted a deep analysis of Polish data as preparatory work for automatic data correction such as true casing and punctuation normalization phase.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Goeuriot, L., Jones, G., Kelly, L., Kriewel, S., Pecina, P.: Report on and prototype of the translation support. Khresmoi Public Deliverable 3 (2012)
Pletneva, N., Vargas, A., Boyer, C.: Requirements for the general public health search. Khresmoi Public Deliverable, D, 8 (2011)
Gschwandtner, M., Kritz, M., Boyer, C.: Requirements of the health professional search. Khresmoi Project Public Deliverable, D8. 1.2 (2011)
GCH Benefits, Medical Phrases and Terms Translation Demo, n.d. (accessed February 28, 2014)
Karliner, L.S., Jacobs, E.A., Chen, A.H., Mutha, S.: Do professional interpreters improve clinical care for patients with limited English proficiency? A systematic review of the literature. Health Services Research 42(2), 727–754 (2007)
Randhawa, G., Ferreyra, M., Ahmed, R., Ezzat, O., Pottie, K.: Using machine translation in clinical practice. Canadian Family Physician 59(4), 382–383 (2013)
Deschenes, S.: 5 benefits of healthcare translation technology. Healthcare Finance News (October 16, 2012)
Zadon, C.: Man vs machine: the benefits of medical translation services. Ezine Articles: Healthcare Systems (2013)
Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zen, S.R., Dyer, C., Bojar, R., Constantin, A., Herbst, E.: Moses: open source toolkit for statistical machine translation. In: Proceedings of the ACL 2007 Demo and Poster Sessions, pp. 177–180 (2007)
Radziszewski, A.: A tiered CRF tagger for Polish. In: Bembenik, R., Skonieczny, Ł., Rybiński, H., Kryszkiewicz, M., Niezgódka, M. (eds.) Intell. Tools for Building a Scientific Information. SCI, vol. 467, pp. 215–230. Springer, Heidelberg (2013)
Axelrod, A.E.: Factored Language Models for Statistical Machine Translation, University of Edinburgh, Master of Science Thesis (2006)
Koehn, P.: What is a better translation? Reflections on six years of running evaluation campaigns. Auditorium du CNRS, Paris (2011)
Papineni, K., Rouskos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, pp. 311–318 (2002)
Banerjee, S., Lavie, A.: METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, Ann Arbor, pp. 65–72 (2005)
Doddington, G.: Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In: Proceedings of the Second International Conference on Human Language Technology (HLT) Research, San Diego, pp. 138–145 (2002)
Isozaki, H., Hirao, T., Duh, K., Sudoh, K., Tsukada, H.: Automatic evaluation of translation quality for distant language pairs. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 944–952 (2010)
Snover, M., Dorr, B., Schwartz, R., Micciulla, L., Makhoul, J.: A study of translation edit rate with targeted human annotation. In: Proceedings of the 7th Conference of the Association for Machine Translation in the Americas, Cambridge (2006)
Koehn, P., et al.: Moses: open source toolkit for statistical machine translation. In: Annual Meeting of the Association for Computational Linguistics (ACL) Demonstration Session, Prague (June 2007)
Koehn, P., Axelrod, A., Birch, A., Callison-Burch, C., Osborne, M., Talbot, D., White, M.: Edinburgh system description for the 2005 IWSLT speech translation evaluation. In: IWSLT, pp. 68–75 (2005), http://www.cs.jhu.edu/~ccb/publications/iwslt05-report.pdf
Heafield, K.: KenLM: faster and smaller language model queries. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, Association for Computational Linguistics, pp. 187–197 (2011)
Ruiz Costa-Jussà, M., Rodríguez Fonollosa, J.A.: Using linear interpolation and weighted reordering hypotheses in the Moses system, Barcelona, Spain (2010)
Stolcke, A.: SRILM – an extensible language modeling toolkit. In: INTERSPEECH (2002)
Gao, Q., Vogel, S.: Parallel implementations of word alignment tool. In: Software Engineering, Testing, and Quality Assurance for Natural Language Processing, pp. 49–57 (2008)
Moses Factored Training Tutorial, http://www.statmt.org/moses/?n=FactoredTraining.EMS
Durrani, N., Schmid, H., Fraser, A., Sajjad, H., Farkas, R.: Munich-Edinburgh-Stuttgart Submissions of OSM Systems at WMT13. In: ACL 2013 Eight Workshop on Statistical Machine Translation, Sofia, Bulgaria (2013)
Koehn, P., Hoang, H.: Factored translation models. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, pp. 868–876 (2007)
Bikel, D.: Intricacies of Collins’ parsing model. Computational Linguistics 30(4), 479–511 (2004)
Dyer, C., Chahuneau, V., Smith, N.: A simple, fast and effective reparametrization of IMB Model 2. In: Proceedings of NAACL (2013)
Bojar, O.: Rich morphology and what can we expect from hybrid approaches to MT. Invited talk at International Workshop on Using Linguistic Information for Hybrid Machine Translation (LIHMT 2011) (2011), http://ufal.mff.cuni.cz/~bojar/publications/2011-FILEbojar_lihmt_2011_pres-PRESENTED.pdf
Hasan, A., Islam, S., Rahman, M.: A comparative study of Witten Bell and Kneser-Ney smoothing methods for statistical machine translation. Journal of Information Technology 1, 1–6 (2012)
Wolk, K., Marasek, K.: Polish-English speech statistical machine translation systems for the IWSLT 2013. In: Proceedings of the 10th International Workshop on Spoken Language Translation, Heidelberg, Germany (2013)
Bojar, O., Buck, C., Callison-Burch, C., Federmann, C., Haddow, B., Koehn, P., Monz, C., Post, M., Soricut, R., Specia, L.: Findings of the 2013 Workshop on Statistical Machine Translation. In: Proceedings of the Eight Workshop on Statistical Machine Translation. Association for Computational Linguistics, Sofia (2013)
Radziszewski, A., Śniatowski, T.: Maca – a configurable tool to integrate Polish morphological data. In: Proceedings of the Second International Workshop on Free/OpenSource Rule-Based Machine Translation, FreeRBMT 2011, Barcelona (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Wołk, K., Marasek, K. (2015). Polish-English Statistical Machine Translation of Medical Texts. In: Zgrzywa, A., Choroś, K., Siemiński, A. (eds) New Research in Multimedia and Internet Systems. Advances in Intelligent Systems and Computing, vol 314. Springer, Cham. https://doi.org/10.1007/978-3-319-10383-9_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-10383-9_16
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10382-2
Online ISBN: 978-3-319-10383-9
eBook Packages: EngineeringEngineering (R0)