Automatic induction of bilingual resources from aligned parallel corpora: application to shallow-transfer machine translation

Caseli, Helena M.; Nunes, Maria das Graças V.; Forcada, Mikel L.

doi:10.1007/s10590-007-9027-9

Automatic induction of bilingual resources from aligned parallel corpora: application to shallow-transfer machine translation

Published: 04 January 2008

Volume 20, pages 227–245, (2006)
Cite this article

Machine Translation

Helena M. Caseli¹,
Maria das Graças V. Nunes¹ &
Mikel L. Forcada²

166 Accesses
10 Citations
Explore all metrics

Abstract

The availability of machine-readable bilingual linguistic resources is crucial not only for rule-based machine translation but also for other applications such as cross-lingual information retrieval. However, the building of such resources (bilingual single-word and multi-word correspondences, translation rules) demands extensive manual work, and, as a consequence, bilingual resources are usually more difficult to find than “shallow” monolingual resources such as morphological dictionaries or part-of-speech taggers, especially when they involve a less-resourced language. This paper describes a methodology to build automatically both bilingual dictionaries and shallow-transfer rules by extracting knowledge from word-aligned parallel corpora processed with shallow monolingual resources (morphological analysers, and part-of-speech taggers). We present experiments for Brazilian Portuguese–Spanish and Brazilian Portuguese–English parallel texts. The results show that the proposed methodology can enable the rapid creation of valuable computational resources (bilingual dictionaries and shallow-transfer rules) for machine translation and other natural language processing tasks).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Complex Technology of Machine Translation Resources Extension for the Kazakh Language

Parallel Corpora Preparation for English-Amharic Machine Translation

Construction of Large-Scale Chinese-English Bilingual Corpus and Sentence Alignment

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Armentano-Oller C, Carrasco RC, Corbí-Bellot AM, Forcada ML, Ginestí-Rosell M, Ortiz-Rojas S, Pérez-Ortiz JA, Ramírez-Sánchez G, Sánchez–Martínez F, Scalco MA (2006) Open-source Portuguese–Spanish machine translation. In: Proceedings of the VII Encontro para o Processamento Computacional da Língua Portuguesa Escrita e Falada. Itatiaia, RJ, Brazil, pp 50–59
Bick E (2000) The parsing system Palavras, automatic grammatical analysis of Portuguese in a constraint grammar framework. Ph.D. Thesis, Aarhus University Press, Denmark
Brown P, Della-Pietra V, Della-Pietrac S and Mercer R (1993). The mathematics of statistical machine translation: parameter estimation. Comput Linguist 19(2): 263–312
Google Scholar
Canals-Marote R, Esteve-Guillén A, Garrido-Alenda A, Guardiola-Savall M, Iturraspe-Bellver A, Montserrat-Buendia S, Ortiz-Rojas S, Pastor-Pina H, Pérez-Antón P, Forcada M (2001) The Spanish–Catalan machine translation system interNOSTRUM. In: MT Summit VIII: Machine Translation in the Information Age, Proceedings Santiago de Compostela, Spain, pp 73–76
Carbonell J, Probst K, Peterson E, Monson C, Lavie A, Brown R, Levin L (2002) Automatic rule learning for resource-limited MT. In: AMTA’02: Proceedings of the 5th Conference of the Association for Machine Translation in the Americas: From Research to Real Users. Lecture Notes In Computer Science, vol 2499, London, UK, pp 1–10
Caseli HM (2007) Indução de léxicos bilíngües e regras para a tradução automática. Ph.D. Thesis, ICMC-USP, São Paulo, Brazil
Caseli HM and Nunes MGV (2007). Automatic induction of bilingual lexicons for machine translation. Int J Transl 19: 29–43
Google Scholar
Caseli HM, Nunes MGV and Forcada ML (2005). Evaluating the LIHLA lexical aligner on Spanish, Brazilian Portuguese and Basque parallel texts. Procesamiento del Lenguaje Natural 35: 237–244
Google Scholar
Doddington G (2002) Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In: Proceedings of ARPA Workshop on Human Language Technology, San Diego, CA, pp 128–132
Fung P (1995) A pattern matching method for finding noun and proper noun translations from noisy parallel corpora. In: Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, Cambridge, MA, pp 236–243
Hutchins J and Somers H (1992). An introduction to machine translation. Academic Press, London
Google Scholar
Kaji H, Kida Y, Morimoto Y (1992) Learning translation templates from bilingual text. In: Proceedings of the fifteenth [sic] International Conference on Computational Linguistics, COLING-92. Nantes, France, pp 672–678
Koehn P, Knight K (2002) Learning a translation lexicon from monolingual corpora. In: Proceedings of the Workshop of the ACL Special Interest Group on the Lexicon (SIGLEX), Philadelphia, PA, pp 9–16
Langlais P, Foster G, Lapalme G (2001) Integrating bilingual lexicons in a probabilistic translation assistant. In: MT Summit VIII: Machine Translation in the Information Age, Proceedings, Santiago de Compostela, Spain, pp 197–202
Lavie A, Probst K, Peterson E, Vogel S, Levin L, Font-Llitjós A, Carbonell J (2004) A trainable transfer-based machine translation approach for languages with limited resources. In: Proceedings of the 9th Workshop of the European Association for Machine Translation (EAMT-04), Valletta, Malta, pp 1–8
McTait K (2003). Translation patterns, linguistic knowledge and complexity in an approach to EBMT. In: Carl, M and Way, A (eds) Recent advances in example-based machine translation, pp 307–338. Kluwer Academic Publishers, Dordrecht, The Netherlands
Google Scholar
Melamed ID, Green R, Turian JP (2003) Precision and recall of machine translation. In: Proceedings of the Conference on Human Language Technology and the North American Chapter of the Association for Computational Linguistics (HLT/NAACL 2003), Edmonton, Canada, pp 61–63
Menezes A, Richardson SD (2001) A best-first alignment algorithm for automatic extraction of transfer mappings from bilingual corpora. In: Proceedings of the Workshop on Data-driven Machine Translation at 39th Annual Meeting of the ACL and 10th Meeting of the European Chapter, Toulouse, France, pp 39–46
Och FJ, Ney H (2000) Improved statistical alignment models. In: Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics, Hong Kong, China, pp 440–447
Och FJ and Ney H (2003). A systematic comparison of various statistical alignment models. Comput Linguist 29(1): 19–51
Article Google Scholar
Och FJ and Ney H (2004). The alignment template approach to statistical machine translation. Comput Linguist 30(4): 417–449
Article Google Scholar
Papineni K, Roukos S, Ward T, Zhu W (2002) BLEU: a method for automatic evaluation of machine translation. In: ACL-02: the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA, pp 311–318
Paumier S (2006). Unitex 1.2 user manual. Université Paris-Est, Marne-la-Vallée, France
Google Scholar
Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, Dayal U and Hsu M (2004). Mining sequential patterns by pattern-growth: the PrefixSpan approach. IEEE Trans Knowl Data Eng 16(10): 1–17
Article Google Scholar
Probst K (2005) Learning transfer rules for machine translation with limited data. Ph.D. Thesis, Carnegie Mellon University, Pittsburgh, PA
Sánchez–Martínez F, Ney H (2006) Using alignment templates to infer shallow-transfer machine translation rules. In: Pyysala S, Salakoski T, Ginter D, Pahikkala T (eds) Advances in natural language processing, Proceedings of 5th International Conference on Natural Language Processing FinTAL, vol. 4139 of Lecture Notes in Computer Science, Turku, Finland, pp 756–767
Schafer C, Yarowsky D (2002) Inducing translation lexicons via diverse similarity measures and bridge languages. In: Proceedings of CoNLL-2002, Taipei, Taiwan, pp 1–7
Wu D, Xia X (1994) Learning an English–Chinese lexicon from parallel corpus. In: Proceedings of the 1st Conference of the Association for Machine Translation in the Americas (AMTA-1994), Columbia, MD pp 206–213

Download references

Author information

Authors and Affiliations

NILC – ICMC, University of São Paulo, São Carlos, SP, Brazil
Helena M. Caseli & Maria das Graças V. Nunes
Departament de Llenguatges i Sistemes Informàtics, Universitat d’Alacant, 03071, Alacant, Spain
Mikel L. Forcada

Authors

Helena M. Caseli
View author publications
You can also search for this author in PubMed Google Scholar
Maria das Graças V. Nunes
View author publications
You can also search for this author in PubMed Google Scholar
Mikel L. Forcada
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Helena M. Caseli.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Caseli, H.M., Nunes, M.d.G.V. & Forcada, M.L. Automatic induction of bilingual resources from aligned parallel corpora: application to shallow-transfer machine translation. Machine Translation 20, 227–245 (2006). https://doi.org/10.1007/s10590-007-9027-9

Download citation

Received: 28 May 2007
Accepted: 14 November 2007
Published: 04 January 2008
Issue Date: March 2006
DOI: https://doi.org/10.1007/s10590-007-9027-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic induction of bilingual resources from aligned parallel corpora: application to shallow-transfer machine translation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Complex Technology of Machine Translation Resources Extension for the Kazakh Language

Parallel Corpora Preparation for English-Amharic Machine Translation

Construction of Large-Scale Chinese-English Bilingual Corpus and Sentence Alignment

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Automatic induction of bilingual resources from aligned parallel corpora: application to shallow-transfer machine translation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Complex Technology of Machine Translation Resources Extension for the Kazakh Language

Parallel Corpora Preparation for English-Amharic Machine Translation

Construction of Large-Scale Chinese-English Bilingual Corpus and Sentence Alignment

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation