Phrase-Based Statistical Machine Translation Using Approximate Matching

Tomás, Jesús; Lloret, Jaime; Casacuberta, Francisco

doi:10.1007/978-3-540-72847-4_61

Jesús Tomás¹,
Jaime Lloret² &
Francisco Casacuberta²

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 4477))

Included in the following conference series:

Iberian Conference on Pattern Recognition and Image Analysis

1545 Accesses

Abstract

Phrase-based statistical models constitute one of the most competitive pattern-recognition approaches to machine translation. In this case, the source sentence is fragmented into phrases, then, each phrase is translated by using a stochastic dictionary. One shortcoming of this phrase-based model is that it does not have an adequate generalization capability. If a sequence of words has not been seen in training, it cannot be translated as a whole phrase. In this paper we try to overcome this drawback. The basic idea is that if a source phrase is not in our dictionary (has not been seen in training), we look for the most similar in our dictionary and try to adapt its translation to the source phrase. We are using the well known edit distance as a measure of similarity. We present results from an English-Spanish task (XRCE).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Tomás, J., Casacuberta, F.: Monotone statistical translation using word groups. In: Procs. of the Machine Translation Summit VIII, Santiago, Spain, pp. 357–361 (2001)
Google Scholar
Brown, P.F., Della Pietra, S.A., Della Pietra, V.J., Mercer, R.L.: The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19, 263–311 (1993)
Google Scholar
Och, F., Ney, H.: The alignment template approach to statistical machine translation. Computational Linguistics 30(4), 417–450 (2004)
Article Google Scholar
Zens, R., Och, F.J., Ney, H.: Phrase-based statistical machine translation. In: Jarke, M., Koehler, J., Lakemeyer, G. (eds.) KI 2002. LNCS (LNAI), vol. 2479, pp. 18–32. Springer, Heidelberg (2002)
Chapter Google Scholar
Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: Human Language Technology and North American Association for Computational Linguistics Conference (HLT/NAACL), Edmonton, Canada, pp. 48–54 (2003)
Google Scholar
Tomás, J., Casacuberta, F.: Combining phrase-based and template-based models in statistical machine translation. In: Perales, F.J., Campilho, A.C., Pérez, N., Sanfeliu, A. (eds.) IbPRIA 2003. LNCS, vol. 2652, pp. 1021–1031. Springer, Heidelberg (2003)
Google Scholar
Chiang, D.: A hierarchical phrase-based model for statistical machine translation. In: Proc. of ACL 2005, Michigan, USA, pp. 263–270 (2005)
Google Scholar
Mandreoli, F., Martoglia, R., Tiberio, P.: Searching similar (sub)sentences for example-based machine translation. In: Proc. Atti del Decimo Convegno Nazionale su Sistemi Evoluti per Basi di Dati, Isola d’Elba, Italy (2002)
Google Scholar
Och, F.J., Ney, H.: Discriminative training and maximum entropy models for statistical machine translation. In: Proc. of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, PA (2002)
Google Scholar
Levenstein, V.: Binary codes capable of correcting deletions, insertions, and reversals. Cybernetics and Control Theory 10(8), 707–710 (1965)
MathSciNet Google Scholar
Hall, P.A.V., Dowling, G.R.: Approximate string matching. ACM Comput. Surv. 12(4), 381–402 (1980)
Article MathSciNet Google Scholar
SchlumbergerSema, S.A.: Inst. Tec. de Informática, R.W.T.H. Aachen, University of Montreal, Celer Soluciones, Société Gamma, Xerox Research Centre Europe: TT2. TransType2 - computer assisted translation. Project technical annex (2001)
Google Scholar
Bisani, M., Ney, H.: Bootstrap estimates for confidence intervals in asr performance evaluation. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 409–412. Montreal, Canada (2004)
Google Scholar
Papineni, K.A., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. Technical Report RC22176, IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY (2001)
Google Scholar
Leusch, G., Ueffing, N., Ney, H.: A novel string-to-string distance measure with applications to machine translation evaluation. In: Proc. of Machine Translation Summit IX, New Orleans, USA. pp. 240–247 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Instituto Tecnolgico de Informtica,
Jesús Tomás
Departamento de Comunicaciones, Universidad Politcnica de Valencia, 46071 Valencia, Spain
Jaime Lloret & Francisco Casacuberta

Authors

Jesús Tomás
View author publications
You can also search for this author in PubMed Google Scholar
Jaime Lloret
View author publications
You can also search for this author in PubMed Google Scholar
Francisco Casacuberta
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Joan Martí José Miguel Benedí Ana Maria Mendonça Joan Serrat

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tomás, J., Lloret, J., Casacuberta, F. (2007). Phrase-Based Statistical Machine Translation Using Approximate Matching. In: Martí, J., Benedí, J.M., Mendonça, A.M., Serrat, J. (eds) Pattern Recognition and Image Analysis. IbPRIA 2007. Lecture Notes in Computer Science, vol 4477. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72847-4_61

Download citation

DOI: https://doi.org/10.1007/978-3-540-72847-4_61
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-72846-7
Online ISBN: 978-3-540-72847-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics