On the Automatic Learning of Bilingual Resources: Some Relevant Factors for Machine Translation

de M. Caseli, Helena; das Graças V. Nunes, Maria; Forcada, Mikel L.

doi:10.1007/978-3-540-88190-2_31

Helena de M. Caseli³,
Maria das Graças V. Nunes³ &
Mikel L. Forcada⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5249))

Included in the following conference series:

Brazilian Symposium on Artificial Intelligence

1194 Accesses
1 Citations

Abstract

In this paper we present experiments concerned with automatically learning bilingual resources for machine translation: bilingual dictionaries and transfer rules. The experiments were carried out with Brazilian Portuguese (pt), English (en) and Spanish (es) texts in two parallel corpora: pt–en and pt–es. They were designed to investigate the relevance of two factors in the induction process, namely: (1) the coverage of linguistic resources used when preprocessing the training corpora and (2) the maximum length threshold (for transfer rules) used in the induction process. From these experiments, it is possible to conclude that both factors have an influence in the automatic learning of bilingual resources.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Wu, D., Xia, X.: Learning an English-Chinese lexicon from parallel corpus. In: Proc. of AMTA 1994, Columbia, MD, pp. 206–213 (October 1994)
Google Scholar
Fung, P.: A pattern matching method for finding noun and proper noun translations from noisy parallel corpora. In: Proc. of ACL 1995, pp. 236–243 (1995)
Google Scholar
Koehn, P., Knight, K.: Learning a translation lexicon from monolingual corpora. In: Proc. of SIGLEX 2002, Philadelphia, pp. 9–16 (July 2002)
Google Scholar
Schafer, C., Yarowsky, D.: Inducing translation lexicons via diverse similarity measures an bridge languages. In: Proc. of CoNLL 2002, pp. 1–7 (2002)
Google Scholar
Kaji, H., Kida, Y., Morimoto, Y.: Learning translation templates from bilingual text. In: Proc. of COLING 1992, pp. 672–678 (1992)
Google Scholar
McTait, K.: Translation patterns, linguistic knowledge and complexity in an approach to EBMT. In: Carl, M., Way, A. (eds.) Recent Advances in EBMT, pp. 1–28. Kluwer Academic Publishers, Netherlands (2003)
Google Scholar
Menezes, A., Richardson, S.D.: A best-first alignment algorithm for automatic extraction of transfer mappings from bilingual corpora. In: Proc. of the Workshop on Data-driven Machine Translation at ACL 2001, Toulouse, France, pp. 39–46 (2001)
Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.: BLEU: a method for automatic evaluation of machine translation. In: Proc. of ACL 2002, pp. 311–318 (2002)
Google Scholar
Doddington, G.: Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In: Proc. of ARPA Workshop on Human Language Technology, San Diego, pp. 128–132 (2002)
Google Scholar
Caseli, H.M., Nunes, M.G.V., Forcada, M.L.: Automatic induction of bilingual resources from aligned parallel corpora: application to shallow-transfer machine translation. Machine Translation 20(4), 227–245 (2006)
Article Google Scholar
Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: Proc. of HLT/NAACL pp. 127–133 (2003)
Google Scholar
Och, F.J., Ney, H.: The alignment template approach to statistical machine translation. Computational Linguistics 30(4), 417–449 (2004)
Article MATH Google Scholar
Brown, P., Della-Pietra, V., Della-Pietra, S., Mercer, R.: The mathematics of statistical machine translation: parameter estimation. Computational Linguistics 19(2), 263–312 (1993)
Google Scholar
Och, F.J., Ney, H.: Improved statistical alignment models. In: Proc. of ACL 2000, Hong Kong, China, pp. 440–447 (October 2000)
Google Scholar
Caseli, H.M., Nunes, M.G.V., Forcada, M.L.: Evaluating the LIHLA lexical aligner on Spanish, Brazilian Portuguese and Basque parallel texts. Procesamiento del Lenguaje Natural 35, 237–244 (2005)
Google Scholar
Carbonell, J., Probst, K., Peterson, E., Monson, C., Lavie, A., Brown, R., Levin, L.: Automatic rule learning for resource-limited MT. In: Richardson, S.D. (ed.) AMTA 2002. LNCS (LNAI), vol. 2499, pp. 1–10. Springer, Heidelberg (2002)
Chapter Google Scholar
Sánchez-Martínez, F., Forcada, M.L.: Automatic induction of shallow-transfer rules for open-source machine translation. In: Proc. of TMI 2007, pp. 181–190 (2007)
Google Scholar
Caseli, H.M., Nunes, M.G.V.: Automatic induction of bilingual lexicons for machine translation. International Journal of Translation 19, 29–43 (2007)
Google Scholar
Pei, J., Han, J., Mortazavi-Asl, B., Wang, J., Pinto, H., Chen, Q., Dayal, U., Hsu, M.: Mining sequential patterns by pattern-growth: the PrefixSpan approach. IEEE Transactions on Knowledge and Data Engineering 16(10), 1–17 (2004)
Article Google Scholar
Hofland, K.: A program for aligning English and Norwegian sentences. In: Hockey, S., Ide, N., Perissinotto, G. (eds.) Research in Humanities Computing, pp. 165–178. Oxford University Press, Oxford (1996)
Google Scholar
Armentano-Oller, C., Carrasco, R.C., Corbí-Bellot, A.M., Forcada, M.L., Ginestí-Rosell, M., Ortiz-Rojas, S., Pérez-Ortiz, J.A., Ramírez-Sánchez, G., Sánchez-Martínez, F., Scalco, M.A.: Open-source Portuguese-Spanish machine translation. In: Vieira, R., Quaresma, P., Nunes, M.d.G.V., Mamede, N.J., Oliveira, C., Dias, M.C. (eds.) PROPOR 2006. LNCS (LNAI), vol. 3960, pp. 50–59. Springer, Heidelberg (2006)
Chapter Google Scholar
Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Computational Linguistics 29(1), 19–51 (2003)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

NILC – ICMC, University of São Paulo, CP 668P – 13.560-970, São Carlos, SP, Brazil
Helena de M. Caseli & Maria das Graças V. Nunes
Departament de Llenguatges i Sistemes Informàtics, Universitat d’Alacant, E-03071, Alacant, Spain
Mikel L. Forcada

Authors

Helena de M. Caseli
View author publications
You can also search for this author in PubMed Google Scholar
Maria das Graças V. Nunes
View author publications
You can also search for this author in PubMed Google Scholar
Mikel L. Forcada
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Systems Engineering and Computer Science - COPPE, Federal University of Rio de Janeiro (UFRJ), Brazil
Gerson Zaverucha
Department of Automation and Systems, Federal University of Santa Catarina, CEP 88.040-900, Brazil
Augusto Loureiro da Costa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

de M. Caseli, H., das Graças V. Nunes, M., Forcada, M.L. (2008). On the Automatic Learning of Bilingual Resources: Some Relevant Factors for Machine Translation. In: Zaverucha, G., da Costa, A.L. (eds) Advances in Artificial Intelligence - SBIA 2008. SBIA 2008. Lecture Notes in Computer Science(), vol 5249. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88190-2_31

Download citation

DOI: https://doi.org/10.1007/978-3-540-88190-2_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88189-6
Online ISBN: 978-3-540-88190-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics