A Hybrid Approach to Compiling Bilingual Dictionaries of Medical Terms from Parallel Corpora

Kontonatsios, Georgios; Mihăilă, Claudiu; Korkontzelos, Ioannis; Thompson, Paul; Ananiadou, Sophia

doi:10.1007/978-3-319-11397-5_4

Georgios Kontonatsios⁷,
Claudiu Mihăilă⁷,
Ioannis Korkontzelos⁷,
Paul Thompson⁷ &
…
Sophia Ananiadou⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8791))

Included in the following conference series:

International Conference on Statistical Language and Speech Processing

1015 Accesses
3 Citations

Abstract

Existing bilingual dictionaries of technical terms suffer from limited coverage and are only available for a small number of language pairs. In response to these problems, we present a method for automatically constructing and updating bilingual dictionaries of medical terms by exploiting parallel corpora. We focus on the extraction of multi-word terms, which constitute a challenging problem for term alignment algorithms. We apply our method to two low resourced language pairs, namely English-Greek and English-Romanian, for which such resources did not previously exist in the medical domain. Our approach combines two term alignment models to improve the accuracy of the extracted medical term translations. Evaluation results show that the precision of our method is \(86\,\%\) and \(81\,\%\) for English-Greek and English-Romanian respectively, considering only the highest ranked candidate translation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
nlm.nih.gov/research/umls

References

Aronson, A.R.: Effective mapping of biomedical text to the umls metathesaurus: the metamap program. In: Proceedings of the AMIA Symposium, p. 17. American Medical Informatics Association (2001)
Google Scholar
Ballesteros, L., Croft, W.: Phrasal translation and query expansion techniques for cross-language information retrieval. In: ACM SIGIR Forum, vol. 31, pp. 84–91. ACM (1997)
Google Scholar
Bouamor, D., Semmar, N., Zweigenbaum, P.: Identifying bilingual multi-word expressions for statistical machine translation. In: LREC, pp. 674–679 (2012)
Google Scholar
Brown, P., Pietra, V., Pietra, S., Mercer, R.: The mathematics of statistical machine translation: parameter estimation. Comput. linguist. 19(2), 263–311 (1993)
Google Scholar
Church, K.W., Hanks, P.: Word association norms, mutual information, and lexicography. Comput. linguist. 16(1), 22–29 (1990)
Google Scholar
Dagan, I., Church, K.: Termight: identifying and translating technical terminology. In: Proceedings of the Fourth Conference on Applied Natural Language Processing, pp. 34–40. Association for Computational Linguistics (1994)
Google Scholar
Dagan, I., Church, K.W., Gale, W.A.: Robust bilingual word alignment for machine aided translation. In: Proceedings of the Workshop on Very Large Corpora, pp. 1–8 (1993)
Google Scholar
Delpech, E.: Evaluation of terminologies acquired from comparable corpora: an application perspective. In: Proceedings of the 18th International Nordic Conference of Computational Linguistics (NODALIDA 2011), pp. 66–73 (2011)
Google Scholar
Van der Eijk, P.: Automating the acquisition of bilingual terminology. In: Proceedings of the Sixth Conference on European Chapter of the Association for Computational Linguistics, pp. 113–119. Association for Computational Linguistics (1993)
Google Scholar
Fung, P., McKeown, K.: A technical word-and term-translation aid using noisy parallel corpora across language groups. Mach. Transl. 12(1), 53–87 (1997)
Article Google Scholar
Fung, P., Yee, L.Y.: An ir approach for translating new words from nonparallel, comparable texts. In: Proceedings of the 17th International Conference on Computational linguistics, vol. 1, pp. 414–420. Association for Computational Linguistics (1998)
Google Scholar
Habash, N.: Four techniques for online handling of out-of-vocabulary words in arabic-english statistical machine translation. In: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers, pp. 57–60. Association for Computational Linguistics (2008)
Google Scholar
Haghighi, A., Liang, P., Berg-Kirkpatrick, T., Klein, D.: Learning bilingual lexicons from monolingual corpora. In: ACL, vol. 2008, pp. 771–779 (2008)
Google Scholar
Harris, Z.: Distributional structure. Word (1954)
Google Scholar
Irvine, A., Callison-Burch, C.: Combining bilingual and comparable corpora for low resource machine translation. In: Proceedings of the Eighth Workshop on Statistical Machine Translation. Association for Computational Linguistics, August 2013
Google Scholar
Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol. 1, pp. 48–54. Association for Computational Linguistics (2003)
Google Scholar
Kontonatsios, G., Korkontzelos, I., Tsujii, J., Ananiadou, S.: Using a random forest classifier to compile bilingual dictionaries of technical terms from comparable corpora. In: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics: Short Papers, vol. 2, pp. 111–116. Association for Computational Linguistics, April 2014, http://www.aclweb.org/anthology/E14-4022
Kontonatsios, G., Korkontzelos, I., Tsujii, J., Ananiadou, S.: Using random forest to recognise translation equivalents of biomedical terms across languages. In: Proceedings of the Sixth Workshop on Building and Using Comparable Corpora, pp. 95–104. Association for Computational Linguistics, August 2013, http://www.aclweb.org/anthology/W13-2512
Kupiec, J.: An algorithm for finding noun phrase correspondences in bilingual corpora. In: Proceedings of the 31st Annual Meeting on Association for Computational Linguistics, pp. 17–22. Association for Computational Linguistics (1993)
Google Scholar
Navigli, R., Ponzetto, S.P.: BabelNet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artif. Intell. 193, 217–250 (2012)
Article MATH MathSciNet Google Scholar
Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Comput. linguist. 29(1), 19–51 (2003)
Article MATH Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318. Association for Computational Linguistics (2002)
Google Scholar
Pustejovsky, J., Castano, J., Cochran, B., Kotecki, M., Morrell, M.: Automatic extraction of acronym-meaning pairs from medline databases. Studies in health technology and informatics, pp. 371–375 (2001)
Google Scholar
Rapp, R.: Automatic identification of word translations from unrelated english and german corpora. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics, pp. 519–526. Association for Computational Linguistics (1999)
Google Scholar
Smadja, F., McKeown, K.R., Hatzivassiloglou, V.: Translating collocations for bilingual lexicons: a statistical approach. Comput. linguist. 22(1), 1–38 (1996)
Google Scholar
Tamura, A., Watanabe, T., Sumita, E.: Bilingual lexicon extraction from comparable corpora using label propagation. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 24–36. Association for Computational Linguistics (2012)
Google Scholar
Tiedemann, J.: News from opus-a collection of multilingual parallel corpora with tools and interfaces. In: Recent Advances in Natural Language Processing, vol. 5, pp. 237–248 (2009)
Google Scholar
Vintar, S., Fiser, D.: Harvesting multi-word expressions from parallel corpora. In: LREC (2008)
Google Scholar
Wu, C.C., Chang, J.S.: Bilingual collocation extraction based on syntactic and statistical analyses. In: ROCLING (2003)
Google Scholar
Yu, K., Tsujii, J.: Bilingual dictionary extraction from wikipedia. In: Proceedings of Machine Translation Summit XII, pp. 379–386 (2009)
Google Scholar

Download references

Acknowledgements

This work was funded by the European Community’s Seventh Framework Program (FP7/2007–2013) [grant number 318736 (OSSMETER)].

Author information

Authors and Affiliations

The National Centre for Text Mining, The University of Manchester, 131 Princess Street, Manchester, M1 7DN, UK
Georgios Kontonatsios, Claudiu Mihăilă, Ioannis Korkontzelos, Paul Thompson & Sophia Ananiadou

Authors

Georgios Kontonatsios
View author publications
You can also search for this author in PubMed Google Scholar
Claudiu Mihăilă
View author publications
You can also search for this author in PubMed Google Scholar
Ioannis Korkontzelos
View author publications
You can also search for this author in PubMed Google Scholar
Paul Thompson
View author publications
You can also search for this author in PubMed Google Scholar
Sophia Ananiadou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Georgios Kontonatsios .

Editor information

Editors and Affiliations

University Joseph Fourier, Grenoble, France
Laurent Besacier
Rovira i Virgili University, Tarragona, Spain
Adrian-Horia Dediu
Rovira i Virgili University, Tarragona, Spain
Carlos Martín-Vide

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kontonatsios, G., Mihăilă, C., Korkontzelos, I., Thompson, P., Ananiadou, S. (2014). A Hybrid Approach to Compiling Bilingual Dictionaries of Medical Terms from Parallel Corpora. In: Besacier, L., Dediu, AH., Martín-Vide, C. (eds) Statistical Language and Speech Processing. SLSP 2014. Lecture Notes in Computer Science(), vol 8791. Springer, Cham. https://doi.org/10.1007/978-3-319-11397-5_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-11397-5_4
Published: 03 September 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11396-8
Online ISBN: 978-3-319-11397-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics