Skip to main content

Opening Statistical Translation Engines to Terminological Resources

  • Conference paper
  • First Online:
Natural Language Processing and Information Systems (NLDB 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2553))

Abstract

The past decade has witnessed exciting work in the field of Statistical Machine Translation (SMT). However, accurate evaluation of its potential in real-life contexts is still a questionable issue. In this study, we investigate the behavior of a SMT engine faced with a corpus far different from the one it has been trained on. We show that terminological databases are obvious resources that should be used to boost the performance of a statistical engine. We propose and evaluate a way of integrating terminology into a SMT engine which yields a significant reduction in word error rate.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Doug Arnold, Louisa Sadler, and R. Lee Humphreys. Evaluation: An assessment. Machine Translation, 8:1–24 (1993)

    Article  Google Scholar 

  2. Adam Berger and John Lafferty. Information Retrieval as Statistical Translation In proceedings of the 22nd Conference on Research and Development in Information Retrieval, (SIGIR), Berkeley, (1999) 222–229.

    Google Scholar 

  3. Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2) (1993) 263–311.

    Google Scholar 

  4. Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, Meredith J. Goldsmith, Jan Hajic, Robert L. Mercer, and Surya Mohanty. But Dictionaries are Data too In Proceedings of Human Language Technology (HLT), Princeton, NJ, March 1993 202–205

    Google Scholar 

  5. M. Chevalier, J. Dansereau, and G. Poulin. Taum-meteo: description du système. Technical report, TAUM, Université de Montréal (1978)

    Google Scholar 

  6. George Foster, and Pierre Isabelle, and Pierre Plamondon. Target-Text Mediated Interactive Machine Translation In Machine Translation, vol 12 (1997) 175–194

    Article  Google Scholar 

  7. F. Jelinek, and R.L. Mercer Interpolated estimation of Markov source parameters from sparse data. In E.S. Gelsema and L.N. Kanal, editors, Pattern Recognition in Practice, North-Holland, Amsterdam (1980)

    Google Scholar 

  8. Philippe Langlais, George Foster, and Guy Lapalme. Unit completion for a computer-aided translation typing system. In Proceedings of the 5th Conference on Applied Natural Language Processing (ANLP), Seattle,Washington, May 2000 135–141

    Google Scholar 

  9. Philippe Langlais, George Foster, and Guy Lapalme. Integrating bilingual lexicons in a probabilistic translation assistant. In Proceedings of the 8th Machine Translation Summit, Santiago de Compostela, Galicia, Spain, September 2001 197–202

    Google Scholar 

  10. Elliott Macklovitch. Can terminological consistency be validated automatically ? Technical report, CITI/RALI, Montréal, Canada (1995)

    Google Scholar 

  11. Daniel Marcu. Towards a unified approach to memory-and statistical-based machine translation. In Proceedings of the 39th Annual Meeting of the ACL, Toulouse, France (2001) 378–385

    Google Scholar 

  12. Daniel Marcu and William Wong. A Phrase-Based, Joint Probability Model for Statistical Machine Translation In proceedings of the 7th Conference on Empirical Methods in Natural Language Processing (EMNLP), 6–7 july, Philadelphia, PS 2002 133–139

    Google Scholar 

  13. J.Y. Nie, P. Isabelle, P. Plamondon and G. Foster. Using a probabilistic translation model for cross-language information retrieval In COLING-ACL, Sixth Workshop on Very Large Corpora, Montreal, August 1998 18–27

    Google Scholar 

  14. Sonja Niessen, Stephan Vogel, Hermann Ney, and Christoph Tillmann. A dp based search algorithm for statistical machine translation. In Proceedings of the 36th Annual Meeting of the ACL and the 17th COLING, Montréal, Canada, August 1998 960–966

    Google Scholar 

  15. Franz Joseph Och and Hermann Ney. A comparison of alignement models for statistical machine translation. In Proceedings of the International Conference on Computational Linguistics (COLING), Saarbrucken, Luxembourg, Nancy, August 2000 1086–1090

    Google Scholar 

  16. Franz Josef Och and Hermann Ney Discriminative Training and Maximum Emtropy Models for Statistical Machine Translation In Proceedings of the 40th Annual Meeting of the ACL, 7–12 july, Philadelphia, PA (2002), 295–302

    Google Scholar 

  17. Kristina Toutanov, H. Toga Ilhan, and Christopher D. Manning Extensions to HMM-based Statistical Word Alignment Models In EMNLP’2002, Philadelphia, 6–7 july, PA (2002), 87–94

    Google Scholar 

  18. Jean Véronis and Philippe Langlais. Evaluation of parallel text alignment systems: The ARCADE project Parallel Text Processing, Kluwer, volume 13, chapter 19 (2000) 369–388.

    Google Scholar 

  19. Stephan Vogel, Hermann Ney, and Christoph Tillmann. Hmm-based word alignement in statistical translation. In Proceedings of the International Conference on Computational Linguistics (COLING), Copenhagen, Denmark, August 1996 836–841

    Google Scholar 

  20. Ye-Yi Wang. Grammar Inference and Statistical Machine Translation. Ph.D. thesis, CMU-LTI, Carnegie Mellon University (1998)

    Google Scholar 

  21. Kenji Yamada and Kevin Knight: A syntax-based statistical translation model. In Proceedings of the 39th Annual Meeting of the ACL Toulouse, France (2001) 531–538

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Langlais, P. (2002). Opening Statistical Translation Engines to Terminological Resources. In: Andersson, B., Bergholtz, M., Johannesson, P. (eds) Natural Language Processing and Information Systems. NLDB 2002. Lecture Notes in Computer Science, vol 2553. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36271-1_17

Download citation

  • DOI: https://doi.org/10.1007/3-540-36271-1_17

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-00307-6

  • Online ISBN: 978-3-540-36271-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics