Skip to main content

Phrase-Based Statistical Machine Translation Using Approximate Matching

  • Conference paper
Pattern Recognition and Image Analysis (IbPRIA 2007)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 4477))

Included in the following conference series:

  • 1545 Accesses

Abstract

Phrase-based statistical models constitute one of the most competitive pattern-recognition approaches to machine translation. In this case, the source sentence is fragmented into phrases, then, each phrase is translated by using a stochastic dictionary. One shortcoming of this phrase-based model is that it does not have an adequate generalization capability. If a sequence of words has not been seen in training, it cannot be translated as a whole phrase. In this paper we try to overcome this drawback. The basic idea is that if a source phrase is not in our dictionary (has not been seen in training), we look for the most similar in our dictionary and try to adapt its translation to the source phrase. We are using the well known edit distance as a measure of similarity. We present results from an English-Spanish task (XRCE).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Tomás, J., Casacuberta, F.: Monotone statistical translation using word groups. In: Procs. of the Machine Translation Summit VIII, Santiago, Spain, pp. 357–361 (2001)

    Google Scholar 

  2. Brown, P.F., Della Pietra, S.A., Della Pietra, V.J., Mercer, R.L.: The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19, 263–311 (1993)

    Google Scholar 

  3. Och, F., Ney, H.: The alignment template approach to statistical machine translation. Computational Linguistics 30(4), 417–450 (2004)

    Article  Google Scholar 

  4. Zens, R., Och, F.J., Ney, H.: Phrase-based statistical machine translation. In: Jarke, M., Koehler, J., Lakemeyer, G. (eds.) KI 2002. LNCS (LNAI), vol. 2479, pp. 18–32. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  5. Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: Human Language Technology and North American Association for Computational Linguistics Conference (HLT/NAACL), Edmonton, Canada, pp. 48–54 (2003)

    Google Scholar 

  6. Tomás, J., Casacuberta, F.: Combining phrase-based and template-based models in statistical machine translation. In: Perales, F.J., Campilho, A.C., Pérez, N., Sanfeliu, A. (eds.) IbPRIA 2003. LNCS, vol. 2652, pp. 1021–1031. Springer, Heidelberg (2003)

    Google Scholar 

  7. Chiang, D.: A hierarchical phrase-based model for statistical machine translation. In: Proc. of ACL 2005, Michigan, USA, pp. 263–270 (2005)

    Google Scholar 

  8. Mandreoli, F., Martoglia, R., Tiberio, P.: Searching similar (sub)sentences for example-based machine translation. In: Proc. Atti del Decimo Convegno Nazionale su Sistemi Evoluti per Basi di Dati, Isola d’Elba, Italy (2002)

    Google Scholar 

  9. Och, F.J., Ney, H.: Discriminative training and maximum entropy models for statistical machine translation. In: Proc. of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, PA (2002)

    Google Scholar 

  10. Levenstein, V.: Binary codes capable of correcting deletions, insertions, and reversals. Cybernetics and Control Theory 10(8), 707–710 (1965)

    MathSciNet  Google Scholar 

  11. Hall, P.A.V., Dowling, G.R.: Approximate string matching. ACM Comput. Surv. 12(4), 381–402 (1980)

    Article  MathSciNet  Google Scholar 

  12. SchlumbergerSema, S.A.: Inst. Tec. de Informática, R.W.T.H. Aachen, University of Montreal, Celer Soluciones, Société Gamma, Xerox Research Centre Europe: TT2. TransType2 - computer assisted translation. Project technical annex (2001)

    Google Scholar 

  13. Bisani, M., Ney, H.: Bootstrap estimates for confidence intervals in asr performance evaluation. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 409–412. Montreal, Canada (2004)

    Google Scholar 

  14. Papineni, K.A., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. Technical Report RC22176, IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY (2001)

    Google Scholar 

  15. Leusch, G., Ueffing, N., Ney, H.: A novel string-to-string distance measure with applications to machine translation evaluation. In: Proc. of Machine Translation Summit IX, New Orleans, USA. pp. 240–247 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Joan Martí José Miguel Benedí Ana Maria Mendonça Joan Serrat

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

Tomás, J., Lloret, J., Casacuberta, F. (2007). Phrase-Based Statistical Machine Translation Using Approximate Matching. In: Martí, J., Benedí, J.M., Mendonça, A.M., Serrat, J. (eds) Pattern Recognition and Image Analysis. IbPRIA 2007. Lecture Notes in Computer Science, vol 4477. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72847-4_61

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-72847-4_61

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-72846-7

  • Online ISBN: 978-3-540-72847-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics