Abstract
Statistical Machine Translation is receiving more and more attention every day due to the success that the phrase-based alignment models are obtaining. However, despite their power, state-of-the-art systems using these models present a series of disadvantages that lessen their effectiveness in working environments where temporal or spacial computational resources are limited. A finite-state framework represents an interesting alternative because it constitutes an efficient paradigm where quality and realtime factors are properly integrated in order to build translation devices that may be of help for their potential users. Here, we describe a way to use the bilingual information in a phrase-based model in order to implement a phrase-based ngram model using finite state transducers. It will be worth the trouble due to the notable decrease in computational requirements that finite state transducers present in practice with respect to the use of some well-known stack-decoding algorithms. Results for the French-English EuroParl benchmark corpus from the 2006 Workshop on Machine Translation of the ACL are reported.
This work has been partially supported by the EC (FEDER) and the Spanish projects TIN2006-15694-C02-01 and the Consolider Ingenio 2010 CSD2007-00018.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Brown, P.F., et al.: A statistical approach to machine translation. Computational Linguistics 16(2), 79–85 (1990)
Brown, P.F., et al.: The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics 19(2), 263–311 (1993)
Ney, H., et al.: Algorithms for statistical translation of spoken language. IEEE Transactions on Speech and Audio Processing 8(1), 24–36 (2000)
Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Computational Linguistics 29(1), 19–51 (2003)
Tomás, J., Casacuberta, F.: Monotone statistical translation using word groups. In: Proceedings of the Machine Translation Summit VIII, Santiago de Compostela, Spain, pp. 357–361 (2001)
Zens, R., Och, F.J., Ney, H.: Phrase-based statistical machine translation. In: Jarke, M., Koehler, J., Lakemeyer, G. (eds.) KI 2002. LNCS (LNAI), vol. 2479, pp. 18–32. Springer, Heidelberg (2002)
Casacuberta, F., et al.: Some approaches to statistical and finite-state speech-to-speech translation. Computer Speech and Language 18, 25–47 (1994)
Casacuberta, F., Vidal, E.: Machine translation with inferred stochastic finite-state transducers. Computational Linguistics 30(2), 205–225 (2004)
Casacuberta, F., Vidal, E., Picó, D.: Inference of finite-state transducers from regular languages. Pattern Recognition 38(9), 1431–1443 (2005)
Berger, A.L., et al.: Language Translation apparatus and method of using context-based translation models. United States Patent, No. 5510981 (1996)
Ortiz, D., García-Varea, I., Casacuberta, F.: An empirical comparison of stack-based decoding algorithms for statistical machine translation. In: New Advance in Computer Vision. LNCS, Springer, Heidelberg (2003)
Germann, U., et al.: Fast Decoding and Optimal Decoding for Machine Translation. In: ACL 2001, Toulouse, France, pp. 228–235 (2001)
Llorens, D.: Suavizado de autómatas y traductores finitos estocásticos. Phd Thesis, Universidad Politécnica de Valencia (2000)
Kumar, S., Deng, Y., Byrne, W.: A weighted finite state transducer translation template model for statistical machine translation. Natural Language Engineering 12(1), 35–75 (2006)
Koehn, P., Monz, C.: Manual and Automatic Evaluation of Machine Translation between European Languages. In: NAACL 2006 Workshop on Statistical Machine Translation, pp. 102–121 (2006)
Koehn, P., Och, F.J., Marcu, D.: Statistical Phrase-Based Translation’. In: NAACL/HLT 2003. In: Proceedings of the Human Language Technology and North American Association for Computational Linguistics Conference, May 27–June 1 2003, Edmonton, Canada,(2003)
Koehn, P.: Europarl: A Parallel Corpus for Statistical Machine Translation. In: Proceedings of the 10th Machine Translation Summit, pp. 79–86 (2005)
Papineni, A.K., et al.: Bleu: A method for automatic evaluation of machine translation. In Technical Report RC22176 (W0109-022), IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY (2001)
Koehn, P., et al.: Moses: Open Source Toolkit for Statistical Machine Translation. Annual Meeting of the Association for Computational Linguistics (demonstration session) (2007)
González, J., Casacuberta, F.: Phrase-based finite state models. In: Proceedings of the 6th International Workshop on Finite-State Methods and Natural Language Processing (2007)
Casacuberta, F.: Inference of finite-state transducers by using regular grammars and morphisms. In: Oliveira, A.L. (ed.) ICGI 2000. LNCS (LNAI), vol. 1891, pp. 1–14. Springer, Heidelberg (2000)
Mariño, J.B., et al.: N-gram-based Machine Translation. Computational Linguistics 32(4), 527–549 (2006)
Ortiz, D., García-Varea, I., Casacuberta, F.: Thot: a Toolkit To Train Phrase-based Statistical Translation Models. In: Proceedings of the 10th Machine Translation Summit, pp. 141–148 (2005)
Casacuberta, F., et al.: Human Interaction for high quality machine translation. In: Communications of the ACM (in press, 2007)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
González, J., Sanchis, G., Casacuberta, F. (2008). Learning Finite State Transducers Using Bilingual Phrases. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2008. Lecture Notes in Computer Science, vol 4919. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78135-6_35
Download citation
DOI: https://doi.org/10.1007/978-3-540-78135-6_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78134-9
Online ISBN: 978-3-540-78135-6
eBook Packages: Computer ScienceComputer Science (R0)