Skip to main content

Polish-English Statistical Machine Translation of Medical Texts

  • Conference paper
New Research in Multimedia and Internet Systems

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 314))

  • 638 Accesses

Abstract

This new research explores the effects of various training methods on a Polish to English Statistical Machine Translation system for medical texts. Various elements of the EMEA parallel text corpora from the OPUS project were used as the basis for training of phrase tables and language models and for development, tuning and testing of the translation system. The BLEU, NIST, METEOR, RIBES and TER metrics have been used to evaluate the effects of various system and data preparations on translation results. Our experiments included systems that used POS tagging, factored phrase models, hierarchical models, syntactic taggers, and many different alignment methods. We also conducted a deep analysis of Polish data as preparatory work for automatic data correction such as true casing and punctuation normalization phase.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Goeuriot, L., Jones, G., Kelly, L., Kriewel, S., Pecina, P.: Report on and prototype of the translation support. Khresmoi Public Deliverable 3 (2012)

    Google Scholar 

  2. Pletneva, N., Vargas, A., Boyer, C.: Requirements for the general public health search. Khresmoi Public Deliverable, D, 8 (2011)

    Google Scholar 

  3. Gschwandtner, M., Kritz, M., Boyer, C.: Requirements of the health professional search. Khresmoi Project Public Deliverable, D8. 1.2 (2011)

    Google Scholar 

  4. GCH Benefits, Medical Phrases and Terms Translation Demo, n.d. (accessed February 28, 2014)

    Google Scholar 

  5. Karliner, L.S., Jacobs, E.A., Chen, A.H., Mutha, S.: Do professional interpreters improve clinical care for patients with limited English proficiency? A systematic review of the literature. Health Services Research 42(2), 727–754 (2007)

    Article  Google Scholar 

  6. Randhawa, G., Ferreyra, M., Ahmed, R., Ezzat, O., Pottie, K.: Using machine translation in clinical practice. Canadian Family Physician 59(4), 382–383 (2013)

    Google Scholar 

  7. Deschenes, S.: 5 benefits of healthcare translation technology. Healthcare Finance News (October 16, 2012)

    Google Scholar 

  8. Zadon, C.: Man vs machine: the benefits of medical translation services. Ezine Articles: Healthcare Systems (2013)

    Google Scholar 

  9. Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zen, S.R., Dyer, C., Bojar, R., Constantin, A., Herbst, E.: Moses: open source toolkit for statistical machine translation. In: Proceedings of the ACL 2007 Demo and Poster Sessions, pp. 177–180 (2007)

    Google Scholar 

  10. Radziszewski, A.: A tiered CRF tagger for Polish. In: Bembenik, R., Skonieczny, Ł., Rybiński, H., Kryszkiewicz, M., Niezgódka, M. (eds.) Intell. Tools for Building a Scientific Information. SCI, vol. 467, pp. 215–230. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  11. Axelrod, A.E.: Factored Language Models for Statistical Machine Translation, University of Edinburgh, Master of Science Thesis (2006)

    Google Scholar 

  12. Koehn, P.: What is a better translation? Reflections on six years of running evaluation campaigns. Auditorium du CNRS, Paris (2011)

    Google Scholar 

  13. Papineni, K., Rouskos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, pp. 311–318 (2002)

    Google Scholar 

  14. Banerjee, S., Lavie, A.: METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, Ann Arbor, pp. 65–72 (2005)

    Google Scholar 

  15. Doddington, G.: Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In: Proceedings of the Second International Conference on Human Language Technology (HLT) Research, San Diego, pp. 138–145 (2002)

    Google Scholar 

  16. Isozaki, H., Hirao, T., Duh, K., Sudoh, K., Tsukada, H.: Automatic evaluation of translation quality for distant language pairs. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 944–952 (2010)

    Google Scholar 

  17. Snover, M., Dorr, B., Schwartz, R., Micciulla, L., Makhoul, J.: A study of translation edit rate with targeted human annotation. In: Proceedings of the 7th Conference of the Association for Machine Translation in the Americas, Cambridge (2006)

    Google Scholar 

  18. Koehn, P., et al.: Moses: open source toolkit for statistical machine translation. In: Annual Meeting of the Association for Computational Linguistics (ACL) Demonstration Session, Prague (June 2007)

    Google Scholar 

  19. Koehn, P., Axelrod, A., Birch, A., Callison-Burch, C., Osborne, M., Talbot, D., White, M.: Edinburgh system description for the 2005 IWSLT speech translation evaluation. In: IWSLT, pp. 68–75 (2005), http://www.cs.jhu.edu/~ccb/publications/iwslt05-report.pdf

  20. Heafield, K.: KenLM: faster and smaller language model queries. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, Association for Computational Linguistics, pp. 187–197 (2011)

    Google Scholar 

  21. Ruiz Costa-Jussà, M., Rodríguez Fonollosa, J.A.: Using linear interpolation and weighted reordering hypotheses in the Moses system, Barcelona, Spain (2010)

    Google Scholar 

  22. Stolcke, A.: SRILM – an extensible language modeling toolkit. In: INTERSPEECH (2002)

    Google Scholar 

  23. Gao, Q., Vogel, S.: Parallel implementations of word alignment tool. In: Software Engineering, Testing, and Quality Assurance for Natural Language Processing, pp. 49–57 (2008)

    Google Scholar 

  24. Moses Factored Training Tutorial, http://www.statmt.org/moses/?n=FactoredTraining.EMS

  25. Durrani, N., Schmid, H., Fraser, A., Sajjad, H., Farkas, R.: Munich-Edinburgh-Stuttgart Submissions of OSM Systems at WMT13. In: ACL 2013 Eight Workshop on Statistical Machine Translation, Sofia, Bulgaria (2013)

    Google Scholar 

  26. Koehn, P., Hoang, H.: Factored translation models. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, pp. 868–876 (2007)

    Google Scholar 

  27. Bikel, D.: Intricacies of Collins’ parsing model. Computational Linguistics 30(4), 479–511 (2004)

    Article  MATH  Google Scholar 

  28. Dyer, C., Chahuneau, V., Smith, N.: A simple, fast and effective reparametrization of IMB Model 2. In: Proceedings of NAACL (2013)

    Google Scholar 

  29. Bojar, O.: Rich morphology and what can we expect from hybrid approaches to MT. Invited talk at International Workshop on Using Linguistic Information for Hybrid Machine Translation (LIHMT 2011) (2011), http://ufal.mff.cuni.cz/~bojar/publications/2011-FILEbojar_lihmt_2011_pres-PRESENTED.pdf

  30. Hasan, A., Islam, S., Rahman, M.: A comparative study of Witten Bell and Kneser-Ney smoothing methods for statistical machine translation. Journal of Information Technology 1, 1–6 (2012)

    Google Scholar 

  31. Wolk, K., Marasek, K.: Polish-English speech statistical machine translation systems for the IWSLT 2013. In: Proceedings of the 10th International Workshop on Spoken Language Translation, Heidelberg, Germany (2013)

    Google Scholar 

  32. Bojar, O., Buck, C., Callison-Burch, C., Federmann, C., Haddow, B., Koehn, P., Monz, C., Post, M., Soricut, R., Specia, L.: Findings of the 2013 Workshop on Statistical Machine Translation. In: Proceedings of the Eight Workshop on Statistical Machine Translation. Association for Computational Linguistics, Sofia (2013)

    Google Scholar 

  33. Radziszewski, A., Śniatowski, T.: Maca – a configurable tool to integrate Polish morphological data. In: Proceedings of the Second International Workshop on Free/OpenSource Rule-Based Machine Translation, FreeRBMT 2011, Barcelona (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Krzysztof Wołk .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Wołk, K., Marasek, K. (2015). Polish-English Statistical Machine Translation of Medical Texts. In: Zgrzywa, A., Choroś, K., Siemiński, A. (eds) New Research in Multimedia and Internet Systems. Advances in Intelligent Systems and Computing, vol 314. Springer, Cham. https://doi.org/10.1007/978-3-319-10383-9_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-10383-9_16

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-10382-2

  • Online ISBN: 978-3-319-10383-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics