Skip to main content
Log in

Automatic induction of bilingual resources from aligned parallel corpora: application to shallow-transfer machine translation

  • Published:
Machine Translation

Abstract

The availability of machine-readable bilingual linguistic resources is crucial not only for rule-based machine translation but also for other applications such as cross-lingual information retrieval. However, the building of such resources (bilingual single-word and multi-word correspondences, translation rules) demands extensive manual work, and, as a consequence, bilingual resources are usually more difficult to find than “shallow” monolingual resources such as morphological dictionaries or part-of-speech taggers, especially when they involve a less-resourced language. This paper describes a methodology to build automatically both bilingual dictionaries and shallow-transfer rules by extracting knowledge from word-aligned parallel corpora processed with shallow monolingual resources (morphological analysers, and part-of-speech taggers). We present experiments for Brazilian Portuguese–Spanish and Brazilian Portuguese–English parallel texts. The results show that the proposed methodology can enable the rapid creation of valuable computational resources (bilingual dictionaries and shallow-transfer rules) for machine translation and other natural language processing tasks).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Armentano-Oller C, Carrasco RC, Corbí-Bellot AM, Forcada ML, Ginestí-Rosell M, Ortiz-Rojas S, Pérez-Ortiz JA, Ramírez-Sánchez G, Sánchez–Martínez F, Scalco MA (2006) Open-source Portuguese–Spanish machine translation. In: Proceedings of the VII Encontro para o Processamento Computacional da Língua Portuguesa Escrita e Falada. Itatiaia, RJ, Brazil, pp 50–59

  • Bick E (2000) The parsing system Palavras, automatic grammatical analysis of Portuguese in a constraint grammar framework. Ph.D. Thesis, Aarhus University Press, Denmark

  • Brown P, Della-Pietra V, Della-Pietrac S and Mercer R (1993). The mathematics of statistical machine translation: parameter estimation. Comput Linguist 19(2): 263–312

    Google Scholar 

  • Canals-Marote R, Esteve-Guillén A, Garrido-Alenda A, Guardiola-Savall M, Iturraspe-Bellver A, Montserrat-Buendia S, Ortiz-Rojas S, Pastor-Pina H, Pérez-Antón P, Forcada M (2001) The Spanish–Catalan machine translation system interNOSTRUM. In: MT Summit VIII: Machine Translation in the Information Age, Proceedings Santiago de Compostela, Spain, pp 73–76

  • Carbonell J, Probst K, Peterson E, Monson C, Lavie A, Brown R, Levin L (2002) Automatic rule learning for resource-limited MT. In: AMTA’02: Proceedings of the 5th Conference of the Association for Machine Translation in the Americas: From Research to Real Users. Lecture Notes In Computer Science, vol 2499, London, UK, pp 1–10

  • Caseli HM (2007) Indução de léxicos bilíngües e regras para a tradução automática. Ph.D. Thesis, ICMC-USP, São Paulo, Brazil

  • Caseli HM and Nunes MGV (2007). Automatic induction of bilingual lexicons for machine translation. Int J Transl 19: 29–43

    Google Scholar 

  • Caseli HM, Nunes MGV and Forcada ML (2005). Evaluating the LIHLA lexical aligner on Spanish, Brazilian Portuguese and Basque parallel texts. Procesamiento del Lenguaje Natural 35: 237–244

    Google Scholar 

  • Doddington G (2002) Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In: Proceedings of ARPA Workshop on Human Language Technology, San Diego, CA, pp 128–132

  • Fung P (1995) A pattern matching method for finding noun and proper noun translations from noisy parallel corpora. In: Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, Cambridge, MA, pp 236–243

  • Hutchins J and Somers H (1992). An introduction to machine translation. Academic Press, London

    Google Scholar 

  • Kaji H, Kida Y, Morimoto Y (1992) Learning translation templates from bilingual text. In: Proceedings of the fifteenth [sic] International Conference on Computational Linguistics, COLING-92. Nantes, France, pp 672–678

  • Koehn P, Knight K (2002) Learning a translation lexicon from monolingual corpora. In: Proceedings of the Workshop of the ACL Special Interest Group on the Lexicon (SIGLEX), Philadelphia, PA, pp 9–16

  • Langlais P, Foster G, Lapalme G (2001) Integrating bilingual lexicons in a probabilistic translation assistant. In: MT Summit VIII: Machine Translation in the Information Age, Proceedings, Santiago de Compostela, Spain, pp 197–202

  • Lavie A, Probst K, Peterson E, Vogel S, Levin L, Font-Llitjós A, Carbonell J (2004) A trainable transfer-based machine translation approach for languages with limited resources. In: Proceedings of the 9th Workshop of the European Association for Machine Translation (EAMT-04), Valletta, Malta, pp 1–8

  • McTait K (2003). Translation patterns, linguistic knowledge and complexity in an approach to EBMT. In: Carl, M and Way, A (eds) Recent advances in example-based machine translation, pp 307–338. Kluwer Academic Publishers, Dordrecht, The Netherlands

    Google Scholar 

  • Melamed ID, Green R, Turian JP (2003) Precision and recall of machine translation. In: Proceedings of the Conference on Human Language Technology and the North American Chapter of the Association for Computational Linguistics (HLT/NAACL 2003), Edmonton, Canada, pp 61–63

  • Menezes A, Richardson SD (2001) A best-first alignment algorithm for automatic extraction of transfer mappings from bilingual corpora. In: Proceedings of the Workshop on Data-driven Machine Translation at 39th Annual Meeting of the ACL and 10th Meeting of the European Chapter, Toulouse, France, pp 39–46

  • Och FJ, Ney H (2000) Improved statistical alignment models. In: Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics, Hong Kong, China, pp 440–447

  • Och FJ and Ney H (2003). A systematic comparison of various statistical alignment models. Comput Linguist 29(1): 19–51

    Article  Google Scholar 

  • Och FJ and Ney H (2004). The alignment template approach to statistical machine translation. Comput Linguist 30(4): 417–449

    Article  Google Scholar 

  • Papineni K, Roukos S, Ward T, Zhu W (2002) BLEU: a method for automatic evaluation of machine translation. In: ACL-02: the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA, pp 311–318

  • Paumier S (2006). Unitex 1.2 user manual. Université Paris-Est, Marne-la-Vallée, France

    Google Scholar 

  • Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, Dayal U and Hsu M (2004). Mining sequential patterns by pattern-growth: the PrefixSpan approach. IEEE Trans Knowl Data Eng 16(10): 1–17

    Article  Google Scholar 

  • Probst K (2005) Learning transfer rules for machine translation with limited data. Ph.D. Thesis, Carnegie Mellon University, Pittsburgh, PA

  • Sánchez–Martínez F, Ney H (2006) Using alignment templates to infer shallow-transfer machine translation rules. In: Pyysala S, Salakoski T, Ginter D, Pahikkala T (eds) Advances in natural language processing, Proceedings of 5th International Conference on Natural Language Processing FinTAL, vol. 4139 of Lecture Notes in Computer Science, Turku, Finland, pp 756–767

  • Schafer C, Yarowsky D (2002) Inducing translation lexicons via diverse similarity measures and bridge languages. In: Proceedings of CoNLL-2002, Taipei, Taiwan, pp 1–7

  • Wu D, Xia X (1994) Learning an English–Chinese lexicon from parallel corpus. In: Proceedings of the 1st Conference of the Association for Machine Translation in the Americas (AMTA-1994), Columbia, MD pp 206–213

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Helena M. Caseli.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Caseli, H.M., Nunes, M.d.G.V. & Forcada, M.L. Automatic induction of bilingual resources from aligned parallel corpora: application to shallow-transfer machine translation. Machine Translation 20, 227–245 (2006). https://doi.org/10.1007/s10590-007-9027-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10590-007-9027-9

Keywords

Navigation