Skip to main content

Integrating Specialized Bilingual Lexicons of Multiword Expressions for Domain Adaptation in Statistical Machine Translation

  • Conference paper
  • First Online:
Computational Linguistics (PACLING 2017)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 781))

Abstract

Domain adaptation consists in adapting Machine Translation (MT) systems designed for one domain to work in another. Multiword expressions generally characterize specific-domains vocabularies. Translating multiword expressions is a challenge for current Statistical Machine Translation (SMT) systems because corpus-based approaches are effective only when large amounts of parallel corpora are available. However, parallel corpora are only available for a limited number of language pairs and domains, and the process of building corpora for several language pairs and domains is time consuming and expensive. This paper describes an experimental evaluation of the impact of using a specialized bilingual lexicon of multiword expressions in order to obtain better domain adaptation for the state of the art statistical machine translation system Moses. Our study concerns the English-French language pair and two kinds of texts: in-domain texts from Europarl (European Parliament Proceedings) and out-of-domain texts from Emea (European Medicines Agency Documents). We introduce three methods to integrate extracted bilingual multiword expressions in Moses. We experimentally show that integrating specialized bilingual lexicons of multiword expressions improve translation quality of Moses for both in-domain and out-of-domain texts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.statmt.org/moses/.

  2. 2.

    http://www.statmt.org/moses/giza/GIZA++.html.

  3. 3.

    http://www.statmt.org/moses/?n=Moses.FactoredModels.

References

  1. Sag, Ivan A., Baldwin, T., Bond, F., Copestake, A., Flickinger, D.: Multiword expressions: a pain in the neck for NLP. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 1–15. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45715-1_1

    Chapter  Google Scholar 

  2. Bungum, L., Gambäck, B.: A survey of domain adaptation in machine translation towards a refinement of domain space. In: Proceedings of the India-Norway Workshop on Web Concepts and Technologies (2011)

    Google Scholar 

  3. CeauÅŸfu, A., Tinsley, J., Zhang, J., Way, A.: Experiments on domain adaptation for patent machine translation in the PLuTO project. In: Proceedings of EAMT (2011)

    Google Scholar 

  4. Mathur, P., Federico, M., Köprü, S., Khadivi, S., Sawaf, H.: Topic adaptation for machine translation of E-commerce content. In: Proceedings of MT Summit XV (2015)

    Google Scholar 

  5. Langlais, P.: Improving a general-purpose statistical translation engine by terminological lexicons. In: Proceedings of COLING: Second International Workshop on Computational Terminology (2002)

    Google Scholar 

  6. Lewis, W.D., Wendt, C., Bullock, D.: Achieving domain specificity in SMT without overt siloing. In: Proceedings of LREC (2010)

    Google Scholar 

  7. Hildebrand, A.S., Eck, M., Vogel, S., Alex, W.: Adaptation of the translation model for statistical machine translation based on information retrieval. In: Proceedings of the EAMT (2005)

    Google Scholar 

  8. Civera, J., Juan, A.: Domain adaptation in statistical machine translation with mixture modelling. In: Proceedings of the Second Workshop on Statistical Machine Translation (2007)

    Google Scholar 

  9. Bertoldi, N., Federico, M.: Domain adaptation for statistical machine translation with monolingual resources. In: Proceedings of the 4th Workshop on Statistical Machine Translation (2009)

    Google Scholar 

  10. Banerjee, P., Du, J., Li, B., Naskar, S.K., Way, A., van Genabith, J.: Combining multi-domain statistical machine translation models using automatic classifiers. In: Proceedings of AMTA (2010)

    Google Scholar 

  11. Daumé III, H., Jagarlamudi, J.: Domain adaptation for machine translation by mining unseen words. In: Proceedings of ACL (2011)

    Google Scholar 

  12. Pecina, P., Toral, A., Way, A., Papa-vassiliou, V., Prokopidis, P., Giagkou, M.: Towards using web-crawled data for domain adaptation in statistical machine translation. In: Proceedings of EAMT (2011)

    Google Scholar 

  13. Wang, W., Macherey, K., Macherey, W., Och, F., Xu, P.: Improved domain adaptation for statistical machine translation. In: Proceedings of AMTA (2012)

    Google Scholar 

  14. Hasler, E., Haddow, B., Koehn, P.: Combining domain and topic adaptation for SMT. In: Proceedings of AMTA (2014)

    Google Scholar 

  15. DeNero, J., Klein, D: The complexity of phrase alignment problems. In: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies (2008)

    Google Scholar 

  16. Daille, B., Gaussier, E., Langé, J.M.: Towards automatic extraction of monolingual and bilingual terminology. In: Proceedings of the 15th Conference on Computational Linguistics ACL (1994)

    Google Scholar 

  17. Blank, I.: Terminology extraction from parallel technical texts. In: Véronis, J. (ed.) Parallel Text Processing, vol. 13. Springer, Dordrecht (2000). https://doi.org/10.1007/978-94-017-2535-4_12

    Chapter  Google Scholar 

  18. Barbu, A.M: Simple linguistic methods for improving a word alignment algorithm. In: Proceedings of the 7th International Conference on the Statistical Analysis of Textual Data (2004)

    Google Scholar 

  19. Semmar, N., Servan, C., De Chalendar, G., Le Ny, B., Bouzaglou, J.J.: A hybrid word alignment approach to improve translation lexicons with compound, words and idiomatic expressions. In: Proceedings of the 32nd Translating and the Computer Conference, ASLIB (2010)

    Google Scholar 

  20. Mihalcea, R., Pedersen, T.: An evaluation exercise for word alignment. In: Proceedings of the HLT-NAACL 2003 Workshop on Building and Using Parallel Texts: Data Driven Machine Translation and Beyond (2003)

    Google Scholar 

  21. Besançon, R., De Chalendar, G., Ferret, O., Gara, F., Laib, M., Mesnard, O., Semmar, N.: LIMA: a multilingual framework for linguistic analysis and linguistic resources development and evaluation. In: Proceedings of LREC (2010)

    Google Scholar 

  22. Germann, U.: Yawat: yet another word alignment tool. In: Proceedings of ACL 2008

    Google Scholar 

  23. Bouamor, D., Semmar, N., Zweigenbaum, P.: Identifying bilingual Multiword expressions for statistical machine translation. In: Proceedings of LREC (2012)

    Google Scholar 

  24. Tiedemann, J.: Parallel data, tools and interfaces in OPUS. In: Proceedings of LREC (2012)

    Google Scholar 

  25. Papineni, K., Roukos, S., Ward, T., Zhu, W.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of ACL (2002)

    Google Scholar 

  26. Semmar, N., Zennaki, O., Laib, M.: Improving the performance of an example-based machine translation system using a domain-specific bilingual lexicon. In: Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation, PACLIC (2015)

    Google Scholar 

  27. Semmar, N., Zennaki, O., Laib, M.: Evaluating the impact of using a domain-specific bilingual lexicon on the performance of a hybrid machine translation approach. In: Proceedings of Recent Advances in Natural Language Processing International Conference, RANLP (2015)

    Google Scholar 

  28. Bouamor, D., Semmar, N., Zweigenbaum, P.: Automatic construction of a multiword expressions bilingual lexicon: a statistical machine translation evaluation perspective. In: Proceedings of the 3rd Workshop on Cognitive Aspects of the Lexicon, COLING (2012)

    Google Scholar 

  29. Ren, Z., Lu, Y., Cao, J., Liu, Q., Huang, Y.: Improving statistical machine translation using domain bilingual multiword expressions. In: Proceedings of the Workshop on Multiword Expressions, ACL-IJCNLP (2009)

    Google Scholar 

  30. Fraser, A., Marcu, D.: Measuring word alignment quality for statistical machine translation. Assoc. Comput. Linguist. 33(3), 293–303 (2007)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 700381.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nasredine Semmar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Semmar, N., Laib, M. (2018). Integrating Specialized Bilingual Lexicons of Multiword Expressions for Domain Adaptation in Statistical Machine Translation. In: Hasida, K., Pa, W. (eds) Computational Linguistics. PACLING 2017. Communications in Computer and Information Science, vol 781. Springer, Singapore. https://doi.org/10.1007/978-981-10-8438-6_9

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-8438-6_9

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-8437-9

  • Online ISBN: 978-981-10-8438-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics