Abstract
The past decade has witnessed exciting work in the field of Statistical Machine Translation (SMT). However, accurate evaluation of its potential in real-life contexts is still a questionable issue. In this study, we investigate the behavior of a SMT engine faced with a corpus far different from the one it has been trained on. We show that terminological databases are obvious resources that should be used to boost the performance of a statistical engine. We propose and evaluate a way of integrating terminology into a SMT engine which yields a significant reduction in word error rate.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Doug Arnold, Louisa Sadler, and R. Lee Humphreys. Evaluation: An assessment. Machine Translation, 8:1–24 (1993)
Adam Berger and John Lafferty. Information Retrieval as Statistical Translation In proceedings of the 22nd Conference on Research and Development in Information Retrieval, (SIGIR), Berkeley, (1999) 222–229.
Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2) (1993) 263–311.
Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, Meredith J. Goldsmith, Jan Hajic, Robert L. Mercer, and Surya Mohanty. But Dictionaries are Data too In Proceedings of Human Language Technology (HLT), Princeton, NJ, March 1993 202–205
M. Chevalier, J. Dansereau, and G. Poulin. Taum-meteo: description du système. Technical report, TAUM, Université de Montréal (1978)
George Foster, and Pierre Isabelle, and Pierre Plamondon. Target-Text Mediated Interactive Machine Translation In Machine Translation, vol 12 (1997) 175–194
F. Jelinek, and R.L. Mercer Interpolated estimation of Markov source parameters from sparse data. In E.S. Gelsema and L.N. Kanal, editors, Pattern Recognition in Practice, North-Holland, Amsterdam (1980)
Philippe Langlais, George Foster, and Guy Lapalme. Unit completion for a computer-aided translation typing system. In Proceedings of the 5th Conference on Applied Natural Language Processing (ANLP), Seattle,Washington, May 2000 135–141
Philippe Langlais, George Foster, and Guy Lapalme. Integrating bilingual lexicons in a probabilistic translation assistant. In Proceedings of the 8th Machine Translation Summit, Santiago de Compostela, Galicia, Spain, September 2001 197–202
Elliott Macklovitch. Can terminological consistency be validated automatically ? Technical report, CITI/RALI, Montréal, Canada (1995)
Daniel Marcu. Towards a unified approach to memory-and statistical-based machine translation. In Proceedings of the 39th Annual Meeting of the ACL, Toulouse, France (2001) 378–385
Daniel Marcu and William Wong. A Phrase-Based, Joint Probability Model for Statistical Machine Translation In proceedings of the 7th Conference on Empirical Methods in Natural Language Processing (EMNLP), 6–7 july, Philadelphia, PS 2002 133–139
J.Y. Nie, P. Isabelle, P. Plamondon and G. Foster. Using a probabilistic translation model for cross-language information retrieval In COLING-ACL, Sixth Workshop on Very Large Corpora, Montreal, August 1998 18–27
Sonja Niessen, Stephan Vogel, Hermann Ney, and Christoph Tillmann. A dp based search algorithm for statistical machine translation. In Proceedings of the 36th Annual Meeting of the ACL and the 17th COLING, Montréal, Canada, August 1998 960–966
Franz Joseph Och and Hermann Ney. A comparison of alignement models for statistical machine translation. In Proceedings of the International Conference on Computational Linguistics (COLING), Saarbrucken, Luxembourg, Nancy, August 2000 1086–1090
Franz Josef Och and Hermann Ney Discriminative Training and Maximum Emtropy Models for Statistical Machine Translation In Proceedings of the 40th Annual Meeting of the ACL, 7–12 july, Philadelphia, PA (2002), 295–302
Kristina Toutanov, H. Toga Ilhan, and Christopher D. Manning Extensions to HMM-based Statistical Word Alignment Models In EMNLP’2002, Philadelphia, 6–7 july, PA (2002), 87–94
Jean Véronis and Philippe Langlais. Evaluation of parallel text alignment systems: The ARCADE project Parallel Text Processing, Kluwer, volume 13, chapter 19 (2000) 369–388.
Stephan Vogel, Hermann Ney, and Christoph Tillmann. Hmm-based word alignement in statistical translation. In Proceedings of the International Conference on Computational Linguistics (COLING), Copenhagen, Denmark, August 1996 836–841
Ye-Yi Wang. Grammar Inference and Statistical Machine Translation. Ph.D. thesis, CMU-LTI, Carnegie Mellon University (1998)
Kenji Yamada and Kevin Knight: A syntax-based statistical translation model. In Proceedings of the 39th Annual Meeting of the ACL Toulouse, France (2001) 531–538
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Langlais, P. (2002). Opening Statistical Translation Engines to Terminological Resources. In: Andersson, B., Bergholtz, M., Johannesson, P. (eds) Natural Language Processing and Information Systems. NLDB 2002. Lecture Notes in Computer Science, vol 2553. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36271-1_17
Download citation
DOI: https://doi.org/10.1007/3-540-36271-1_17
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00307-6
Online ISBN: 978-3-540-36271-5
eBook Packages: Springer Book Archive