Using Natural Alignment to Extract Translation Equivalents

Otero, Pablo Gamallo

doi:10.1007/11751984_5

Pablo Gamallo Otero²⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3960))

Included in the following conference series:

International Workshop on Computational Processing of the Portuguese Language

436 Accesses
1 Citations

Abstract

Most methods to extract bilingual lexicons from parallel corpora learn word correspondences using relative small aligned segments, called sentences. Then, they need to get a corpus aligned at the sentence level. Such an alignment can require further manual corrections if the parallel corpus contains insertions, deletions, or fuzzy sentence boundaries. This paper shows that it is possible to extract bilingual lexicons without aligning parallel texts at the sentence level. We describe a method to learn word translations from a very roughly aligned corpus, namely a corpus with quite long segments separated by “natural boundaries”. The results obtained using this method are very close to those obtained using sentence alignment. Some experiments were performed on English-Portuguese and English-Spanish parallel texts.

This work has been supported by Ministerio de Educación y Ciencia of Spain, within the projects CESAR+ and GaricoTerm, ref: BFF2003-02866.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ahrenberg, L., Andersson, M., Merkel, M.: A simple hybrid aligner for generating lexical correspondences in parallel texts. In: 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics (COLING-ACL 1998), Montreal, pp. 29–35 (1998)
Google Scholar
Brown, P.F., Lai, J., Mercer, R.: Aligning sentences in parallel corpora. In: 29th Conference of ACL (1991)
Google Scholar
Church, K.: Char_align: A program for aligning parallel texts at the character level. In: 31st Conference of the Association for Computational Linguistics (ACL), Columbus, Ohio, pp. 1–8 (1993)
Google Scholar
Fung, P., McKeown, K.: Finding terminology translation frmo nonparallel corpora. In: 5th Annual Workshop on Very Large Corpora, Hong Kong, pp. 192–202 (1997)
Google Scholar
Gale, W., Church, K.: Identifying word correspondences in parallel texts. In: Workshop DARPA SNL (1991)
Google Scholar
Gamallo, P.: Extraction of translation equivalents from parallel corpora using sense-sensitive contexts. In: 10th Conference of the European Association on Machine Translation (EAMT 2005), Budapest, Hungary, pp. 97–102 (2005)
Google Scholar
Koehn, P.: Europarl: A multilingual corpus for evaluation of machine translation (2003), http://people.csail.mit.edu/koehn/publications/europarl/
Kwong, O.Y., Tsou, B.K., Lai, T.B.: Alignment and extraction of bilingual legal terminology from context profiles. Terminology 10(1), 81–99 (2004)
Article Google Scholar
Melamed, D.: A word-to-word model of translational equivalence. In: 35th Conference of the Association of Computational Linguistics (ACL 1997), Madrid, Spain (1997)
Google Scholar
Melamed, D.: Bitext maps and alignment via pattern recognition. Computational Linguistics 25(1) (1999)
Google Scholar
Ribeiro, A., Dias, G., Lopes, G., Mexia, J.: Cognates alignment. In: Machine Translation Summit VIII, Santiago de Compostela, Spain, pp. 287–293 (2001)
Google Scholar
Ribeiro, A., Lopes, G., Mexia, J.: Using confidence bands for parallel texts alignment. In: 38th Conference of the Association for Computational Linguistics (ACL), pp. 432–439 (2000)
Google Scholar
Simard, M., Plamondon, P.: Bilingual sentence alignment: Balancing robustness and accuracy. Machine Translation 13(1), 59–80 (1998)
Article Google Scholar
Smadja, F., McKeown, K., Hatzivassiloglou, V.: Translating collocations for bilingual lexicons. Computational Linguistics 22(1) (1996)
Google Scholar
Tiedemann, J.: Extraction of translation equivalents from parallel corpora. In: 11th Nordic Conference of Computational Linguistics, Copenhagen, Denmark (1998)
Google Scholar
Vintar, Ŝ.: Using parallel corpora for translation-oriented term extraction. Babel Journal 47(2), 121–132 (2001)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Departamento de Língua Espanhola, Faculdade de Filologia, Universidade de Santiago de Compostela, Galiza, Spain
Pablo Gamallo Otero

Authors

Pablo Gamallo Otero
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Pontifícia Universidade do Rio Grande do Sul, Porto Alegre, Brasil
Renata Vieira
Departamento de Informática, Universidade de Évora, Portugal
Paulo Quaresma
NILC-ICMC, University of São Paulo, CP 668P, 13560-970, São Carlos, SP, Brazil
Maria das Graças Volpe Nunes
L2F/INESC-ID Lisboa, Email: qa-clef@l2f.inesc-id.pt, Rua Alves Redol, 9, 1000-029, Lisboa, Portugal
Nuno J. Mamede
Instituto Militar de Engenharia, Praça General Tibúrcio, 80, Rio de Janeiro, Brazil
Cláudia Oliveira
Pontifícia Universidade Católica do Rio de Janeiro, Rua Marquês de São Vicente, 225, Rio de Janeiro, Brazil
Maria Carmelita Dias

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Otero, P.G. (2006). Using Natural Alignment to Extract Translation Equivalents. In: Vieira, R., Quaresma, P., Nunes, M.d.G.V., Mamede, N.J., Oliveira, C., Dias, M.C. (eds) Computational Processing of the Portuguese Language. PROPOR 2006. Lecture Notes in Computer Science(), vol 3960. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11751984_5

Download citation

DOI: https://doi.org/10.1007/11751984_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34045-4
Online ISBN: 978-3-540-34046-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics