Abstract
Phrase-based statistical models constitute one of the most competitive pattern-recognition approaches to machine translation. In this case, the source sentence is fragmented into phrases, then, each phrase is translated by using a stochastic dictionary. One shortcoming of this phrase-based model is that it does not have an adequate generalization capability. If a sequence of words has not been seen in training, it cannot be translated as a whole phrase. In this paper we try to overcome this drawback. The basic idea is that if a source phrase is not in our dictionary (has not been seen in training), we look for the most similar in our dictionary and try to adapt its translation to the source phrase. We are using the well known edit distance as a measure of similarity. We present results from an English-Spanish task (XRCE).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Tomás, J., Casacuberta, F.: Monotone statistical translation using word groups. In: Procs. of the Machine Translation Summit VIII, Santiago, Spain, pp. 357–361 (2001)
Brown, P.F., Della Pietra, S.A., Della Pietra, V.J., Mercer, R.L.: The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19, 263–311 (1993)
Och, F., Ney, H.: The alignment template approach to statistical machine translation. Computational Linguistics 30(4), 417–450 (2004)
Zens, R., Och, F.J., Ney, H.: Phrase-based statistical machine translation. In: Jarke, M., Koehler, J., Lakemeyer, G. (eds.) KI 2002. LNCS (LNAI), vol. 2479, pp. 18–32. Springer, Heidelberg (2002)
Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: Human Language Technology and North American Association for Computational Linguistics Conference (HLT/NAACL), Edmonton, Canada, pp. 48–54 (2003)
Tomás, J., Casacuberta, F.: Combining phrase-based and template-based models in statistical machine translation. In: Perales, F.J., Campilho, A.C., Pérez, N., Sanfeliu, A. (eds.) IbPRIA 2003. LNCS, vol. 2652, pp. 1021–1031. Springer, Heidelberg (2003)
Chiang, D.: A hierarchical phrase-based model for statistical machine translation. In: Proc. of ACL 2005, Michigan, USA, pp. 263–270 (2005)
Mandreoli, F., Martoglia, R., Tiberio, P.: Searching similar (sub)sentences for example-based machine translation. In: Proc. Atti del Decimo Convegno Nazionale su Sistemi Evoluti per Basi di Dati, Isola d’Elba, Italy (2002)
Och, F.J., Ney, H.: Discriminative training and maximum entropy models for statistical machine translation. In: Proc. of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, PA (2002)
Levenstein, V.: Binary codes capable of correcting deletions, insertions, and reversals. Cybernetics and Control Theory 10(8), 707–710 (1965)
Hall, P.A.V., Dowling, G.R.: Approximate string matching. ACM Comput. Surv. 12(4), 381–402 (1980)
SchlumbergerSema, S.A.: Inst. Tec. de Informática, R.W.T.H. Aachen, University of Montreal, Celer Soluciones, Société Gamma, Xerox Research Centre Europe: TT2. TransType2 - computer assisted translation. Project technical annex (2001)
Bisani, M., Ney, H.: Bootstrap estimates for confidence intervals in asr performance evaluation. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 409–412. Montreal, Canada (2004)
Papineni, K.A., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. Technical Report RC22176, IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY (2001)
Leusch, G., Ueffing, N., Ney, H.: A novel string-to-string distance measure with applications to machine translation evaluation. In: Proc. of Machine Translation Summit IX, New Orleans, USA. pp. 240–247 (2003)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Tomás, J., Lloret, J., Casacuberta, F. (2007). Phrase-Based Statistical Machine Translation Using Approximate Matching. In: Martí, J., Benedí, J.M., Mendonça, A.M., Serrat, J. (eds) Pattern Recognition and Image Analysis. IbPRIA 2007. Lecture Notes in Computer Science, vol 4477. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72847-4_61
Download citation
DOI: https://doi.org/10.1007/978-3-540-72847-4_61
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-72846-7
Online ISBN: 978-3-540-72847-4
eBook Packages: Computer ScienceComputer Science (R0)