Skip to main content

On the Automatic Learning of Bilingual Resources: Some Relevant Factors for Machine Translation

  • Conference paper
Advances in Artificial Intelligence - SBIA 2008 (SBIA 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5249))

Included in the following conference series:

Abstract

In this paper we present experiments concerned with automatically learning bilingual resources for machine translation: bilingual dictionaries and transfer rules. The experiments were carried out with Brazilian Portuguese (pt), English (en) and Spanish (es) texts in two parallel corpora: pten and ptes. They were designed to investigate the relevance of two factors in the induction process, namely: (1) the coverage of linguistic resources used when preprocessing the training corpora and (2) the maximum length threshold (for transfer rules) used in the induction process. From these experiments, it is possible to conclude that both factors have an influence in the automatic learning of bilingual resources.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Wu, D., Xia, X.: Learning an English-Chinese lexicon from parallel corpus. In: Proc. of AMTA 1994, Columbia, MD, pp. 206–213 (October 1994)

    Google Scholar 

  2. Fung, P.: A pattern matching method for finding noun and proper noun translations from noisy parallel corpora. In: Proc. of ACL 1995, pp. 236–243 (1995)

    Google Scholar 

  3. Koehn, P., Knight, K.: Learning a translation lexicon from monolingual corpora. In: Proc. of SIGLEX 2002, Philadelphia, pp. 9–16 (July 2002)

    Google Scholar 

  4. Schafer, C., Yarowsky, D.: Inducing translation lexicons via diverse similarity measures an bridge languages. In: Proc. of CoNLL 2002, pp. 1–7 (2002)

    Google Scholar 

  5. Kaji, H., Kida, Y., Morimoto, Y.: Learning translation templates from bilingual text. In: Proc. of COLING 1992, pp. 672–678 (1992)

    Google Scholar 

  6. McTait, K.: Translation patterns, linguistic knowledge and complexity in an approach to EBMT. In: Carl, M., Way, A. (eds.) Recent Advances in EBMT, pp. 1–28. Kluwer Academic Publishers, Netherlands (2003)

    Google Scholar 

  7. Menezes, A., Richardson, S.D.: A best-first alignment algorithm for automatic extraction of transfer mappings from bilingual corpora. In: Proc. of the Workshop on Data-driven Machine Translation at ACL 2001, Toulouse, France, pp. 39–46 (2001)

    Google Scholar 

  8. Papineni, K., Roukos, S., Ward, T., Zhu, W.: BLEU: a method for automatic evaluation of machine translation. In: Proc. of ACL 2002, pp. 311–318 (2002)

    Google Scholar 

  9. Doddington, G.: Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In: Proc. of ARPA Workshop on Human Language Technology, San Diego, pp. 128–132 (2002)

    Google Scholar 

  10. Caseli, H.M., Nunes, M.G.V., Forcada, M.L.: Automatic induction of bilingual resources from aligned parallel corpora: application to shallow-transfer machine translation. Machine Translation 20(4), 227–245 (2006)

    Article  Google Scholar 

  11. Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: Proc. of HLT/NAACL pp. 127–133 (2003)

    Google Scholar 

  12. Och, F.J., Ney, H.: The alignment template approach to statistical machine translation. Computational Linguistics 30(4), 417–449 (2004)

    Article  MATH  Google Scholar 

  13. Brown, P., Della-Pietra, V., Della-Pietra, S., Mercer, R.: The mathematics of statistical machine translation: parameter estimation. Computational Linguistics 19(2), 263–312 (1993)

    Google Scholar 

  14. Och, F.J., Ney, H.: Improved statistical alignment models. In: Proc. of ACL 2000, Hong Kong, China, pp. 440–447 (October 2000)

    Google Scholar 

  15. Caseli, H.M., Nunes, M.G.V., Forcada, M.L.: Evaluating the LIHLA lexical aligner on Spanish, Brazilian Portuguese and Basque parallel texts. Procesamiento del Lenguaje Natural 35, 237–244 (2005)

    Google Scholar 

  16. Carbonell, J., Probst, K., Peterson, E., Monson, C., Lavie, A., Brown, R., Levin, L.: Automatic rule learning for resource-limited MT. In: Richardson, S.D. (ed.) AMTA 2002. LNCS (LNAI), vol. 2499, pp. 1–10. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  17. Sánchez-Martínez, F., Forcada, M.L.: Automatic induction of shallow-transfer rules for open-source machine translation. In: Proc. of TMI 2007, pp. 181–190 (2007)

    Google Scholar 

  18. Caseli, H.M., Nunes, M.G.V.: Automatic induction of bilingual lexicons for machine translation. International Journal of Translation 19, 29–43 (2007)

    Google Scholar 

  19. Pei, J., Han, J., Mortazavi-Asl, B., Wang, J., Pinto, H., Chen, Q., Dayal, U., Hsu, M.: Mining sequential patterns by pattern-growth: the PrefixSpan approach. IEEE Transactions on Knowledge and Data Engineering 16(10), 1–17 (2004)

    Article  Google Scholar 

  20. Hofland, K.: A program for aligning English and Norwegian sentences. In: Hockey, S., Ide, N., Perissinotto, G. (eds.) Research in Humanities Computing, pp. 165–178. Oxford University Press, Oxford (1996)

    Google Scholar 

  21. Armentano-Oller, C., Carrasco, R.C., Corbí-Bellot, A.M., Forcada, M.L., Ginestí-Rosell, M., Ortiz-Rojas, S., Pérez-Ortiz, J.A., Ramírez-Sánchez, G., Sánchez-Martínez, F., Scalco, M.A.: Open-source Portuguese-Spanish machine translation. In: Vieira, R., Quaresma, P., Nunes, M.d.G.V., Mamede, N.J., Oliveira, C., Dias, M.C. (eds.) PROPOR 2006. LNCS (LNAI), vol. 3960, pp. 50–59. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  22. Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Computational Linguistics 29(1), 19–51 (2003)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

de M. Caseli, H., das Graças V. Nunes, M., Forcada, M.L. (2008). On the Automatic Learning of Bilingual Resources: Some Relevant Factors for Machine Translation. In: Zaverucha, G., da Costa, A.L. (eds) Advances in Artificial Intelligence - SBIA 2008. SBIA 2008. Lecture Notes in Computer Science(), vol 5249. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88190-2_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-88190-2_31

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-88189-6

  • Online ISBN: 978-3-540-88190-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics