Abstract
In this paper we present experiments concerned with automatically learning bilingual resources for machine translation: bilingual dictionaries and transfer rules. The experiments were carried out with Brazilian Portuguese (pt), English (en) and Spanish (es) texts in two parallel corpora: pt–en and pt–es. They were designed to investigate the relevance of two factors in the induction process, namely: (1) the coverage of linguistic resources used when preprocessing the training corpora and (2) the maximum length threshold (for transfer rules) used in the induction process. From these experiments, it is possible to conclude that both factors have an influence in the automatic learning of bilingual resources.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Wu, D., Xia, X.: Learning an English-Chinese lexicon from parallel corpus. In: Proc. of AMTA 1994, Columbia, MD, pp. 206–213 (October 1994)
Fung, P.: A pattern matching method for finding noun and proper noun translations from noisy parallel corpora. In: Proc. of ACL 1995, pp. 236–243 (1995)
Koehn, P., Knight, K.: Learning a translation lexicon from monolingual corpora. In: Proc. of SIGLEX 2002, Philadelphia, pp. 9–16 (July 2002)
Schafer, C., Yarowsky, D.: Inducing translation lexicons via diverse similarity measures an bridge languages. In: Proc. of CoNLL 2002, pp. 1–7 (2002)
Kaji, H., Kida, Y., Morimoto, Y.: Learning translation templates from bilingual text. In: Proc. of COLING 1992, pp. 672–678 (1992)
McTait, K.: Translation patterns, linguistic knowledge and complexity in an approach to EBMT. In: Carl, M., Way, A. (eds.) Recent Advances in EBMT, pp. 1–28. Kluwer Academic Publishers, Netherlands (2003)
Menezes, A., Richardson, S.D.: A best-first alignment algorithm for automatic extraction of transfer mappings from bilingual corpora. In: Proc. of the Workshop on Data-driven Machine Translation at ACL 2001, Toulouse, France, pp. 39–46 (2001)
Papineni, K., Roukos, S., Ward, T., Zhu, W.: BLEU: a method for automatic evaluation of machine translation. In: Proc. of ACL 2002, pp. 311–318 (2002)
Doddington, G.: Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In: Proc. of ARPA Workshop on Human Language Technology, San Diego, pp. 128–132 (2002)
Caseli, H.M., Nunes, M.G.V., Forcada, M.L.: Automatic induction of bilingual resources from aligned parallel corpora: application to shallow-transfer machine translation. Machine Translation 20(4), 227–245 (2006)
Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: Proc. of HLT/NAACL pp. 127–133 (2003)
Och, F.J., Ney, H.: The alignment template approach to statistical machine translation. Computational Linguistics 30(4), 417–449 (2004)
Brown, P., Della-Pietra, V., Della-Pietra, S., Mercer, R.: The mathematics of statistical machine translation: parameter estimation. Computational Linguistics 19(2), 263–312 (1993)
Och, F.J., Ney, H.: Improved statistical alignment models. In: Proc. of ACL 2000, Hong Kong, China, pp. 440–447 (October 2000)
Caseli, H.M., Nunes, M.G.V., Forcada, M.L.: Evaluating the LIHLA lexical aligner on Spanish, Brazilian Portuguese and Basque parallel texts. Procesamiento del Lenguaje Natural 35, 237–244 (2005)
Carbonell, J., Probst, K., Peterson, E., Monson, C., Lavie, A., Brown, R., Levin, L.: Automatic rule learning for resource-limited MT. In: Richardson, S.D. (ed.) AMTA 2002. LNCS (LNAI), vol. 2499, pp. 1–10. Springer, Heidelberg (2002)
Sánchez-Martínez, F., Forcada, M.L.: Automatic induction of shallow-transfer rules for open-source machine translation. In: Proc. of TMI 2007, pp. 181–190 (2007)
Caseli, H.M., Nunes, M.G.V.: Automatic induction of bilingual lexicons for machine translation. International Journal of Translation 19, 29–43 (2007)
Pei, J., Han, J., Mortazavi-Asl, B., Wang, J., Pinto, H., Chen, Q., Dayal, U., Hsu, M.: Mining sequential patterns by pattern-growth: the PrefixSpan approach. IEEE Transactions on Knowledge and Data Engineering 16(10), 1–17 (2004)
Hofland, K.: A program for aligning English and Norwegian sentences. In: Hockey, S., Ide, N., Perissinotto, G. (eds.) Research in Humanities Computing, pp. 165–178. Oxford University Press, Oxford (1996)
Armentano-Oller, C., Carrasco, R.C., Corbí-Bellot, A.M., Forcada, M.L., Ginestí-Rosell, M., Ortiz-Rojas, S., Pérez-Ortiz, J.A., Ramírez-Sánchez, G., Sánchez-Martínez, F., Scalco, M.A.: Open-source Portuguese-Spanish machine translation. In: Vieira, R., Quaresma, P., Nunes, M.d.G.V., Mamede, N.J., Oliveira, C., Dias, M.C. (eds.) PROPOR 2006. LNCS (LNAI), vol. 3960, pp. 50–59. Springer, Heidelberg (2006)
Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Computational Linguistics 29(1), 19–51 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
de M. Caseli, H., das Graças V. Nunes, M., Forcada, M.L. (2008). On the Automatic Learning of Bilingual Resources: Some Relevant Factors for Machine Translation. In: Zaverucha, G., da Costa, A.L. (eds) Advances in Artificial Intelligence - SBIA 2008. SBIA 2008. Lecture Notes in Computer Science(), vol 5249. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88190-2_31
Download citation
DOI: https://doi.org/10.1007/978-3-540-88190-2_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88189-6
Online ISBN: 978-3-540-88190-2
eBook Packages: Computer ScienceComputer Science (R0)