A Self-Learning Method of Parallel Texts Alignment

Ribeiro, António; Lopes, Gabriel; Mexia, João

doi:10.1007/3-540-39965-8_4

António Ribeiro²,
Gabriel Lopes² &
João Mexia³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1934))

Included in the following conference series:

Conference of the Association for Machine Translation in the Americas

679 Accesses

Abstract

This paper describes a language independent method for alignment of parallel texts that re-uses acquired knowledge. The system extracts word translation equivalents and re-uses them as correspondence points in order to enhance the alignment of parallel texts. Points that may cause misalignment are filtered using confidence bands of linear regression analysis instead of heuristics, which are not theoretically reliable. Homographs bootstrap the alignment process so as to build the primary word translation lexicon. At each step, the previously acquired lexicon is re-used so as to repeatedly make finer-grained alignments and produce more reliable translation lexicons.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Aligning Sentences Between Comparable Texts of Different Styles

Integrated Technology for Creating Quality Parallel Corpora

A Language-Independent Method for Detection and Correction of Alignment Errors in Parallel Corpora

References

Brown, P., Lai, J., Mercer, R.: Aligning Sentences in Parallel Corpora. In: Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, Berkeley, California, U.S.A. (1991) 169–176
Google Scholar
Church, K.: Char_align: A Program for Aligning Parallel Texts at the Character Level. In: Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics, Columbus, Ohio, U.S.A. (1993) 1–8
Google Scholar
ELRA (European Language Resources Association) (1997) Multilingual Corpora for Cooperation, Disk 2 of 2, Paris, France
Google Scholar
Fung, P., McKeown, K.: Aligning Noisy Parallel Corpora across Language Groups: Word Pair Feature Matching by Dynamic Time Warping. In: Technology Partnerships for Crossing the Language Barrier: Proceedings of the First Conference of the Association for Machine Translation in the Americas, Columbia, Maryland, U.S.A. (1994) 81–88
Google Scholar
Fung, P., McKeown, K.: A Technical Word-and Term-Translation Aid Using Noisy Parallel Corpora across Language Groups. In: Machine Translation, Vol. 12, numbers 12 (Special issue) (1997) 53–87
Article Google Scholar
Gale, W., Church, K.: A Program for Aligning Sentences in Bilingual Corpora. In: Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, Berkeley, California, U.S.A. (1991) 177–184 (short version). Also in: Computational Linguistics, Vol. 19, number 1 (1993) 75–102 (long version)
Google Scholar
Kay, M., Röscheisen, M.: Text-Translation Alignment. In: Computational Linguistics, Vol. 19, number 1 (1993) 121–142
Google Scholar
Melamed, I.: Bitext Maps and Alignment via Pattern Recognition. In: Computational Linguistics, Vol. 25, number 1 (1999) 107–130
Google Scholar
Ribeiro, A., Lopes, G., Mexia, J.: Using Confidence Bands for Alignment with Hapaxes. In: Proceedings of the 2000 International Conference on Artificial Intelligence (IC-AI’ 2000), Las Vegas, U.S.A.. CSREA Press, U.S.A. (2000)
Google Scholar
Ribeiro, A., Lopes, G., Mexia, J.: Linear Regression Based Alignment of Parallel Texts Using Homograph Words. In: Horn, W. (ed.): ECAI 2000. Proceedings of the 14th European Conference on Artificial Intelligence, Berlin, Germany. IOS Press, Amsterdam, Netherlands (2000)
Google Scholar
Ribeiro, A., Lopes, G., Mexia, J.: Using Confidence Bands for Parallel Texts Alignment. In: Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics (ACL 2000) (2000, to appear)
Google Scholar
da Silva, J., Dias, G., Guilloré, S., Lopes, J.: Using Localmaxs Algorithms for the Extraction of Contiguous and Non-contiguous Multiword Lexical Units. In: Barahona, P., Alferes, J. (eds.): Progress in Artificial Intelligence Lecture Notes in Artificial Intelligence, Vol. 1695. Springer-Verlag, Berlin Heidelberg New York (1999) 113–132
Chapter Google Scholar
Simard, M., Foster, G., Isabelle, P.: Using Cognates to Align Sentences in Bilingual Corpora. In: Proceedings of the Fourth International Conference on Theoretical and Methodological Issues in Machine Translation TMI-92, Montreal, Canada (1992) 67–81
Google Scholar
Simard, M., Plamondon, P.: Bilingual Sentence Alignment: Balancing Robustness and Accuracy. In: Machine Translation, Vol. 13, number 1 (1998) 59–80
Article Google Scholar
Wonnacott, T., Wonnacott, R.: Introductory Statistics, 5th edition, John Wiley & Sons, New York Chichester Brisbane Toronto Singapore (1990)
Google Scholar
Wu, D.: Aligning a Parallel English-Chinese Corpus Statistically with Lexical Criteria. In: Proceedings of the 32nd Annual Conference of the Association for Computational Linguistics, Las Cruces, New Mexico, U.S.A. (1994) 80–87
Google Scholar

Download references

Author information

Authors and Affiliations

Departamento de Informática, Universidade Nova de Lisboa, Faculdade de Ciências e Tecnologia, Quinta da Torre, P-2825-114, Monte da Caparica, Portugal
António Ribeiro & Gabriel Lopes
Departamento de Matemática, Universidade Nova de Lisboa, Faculdade de Ciências e Tecnologia, Quinta da Torre, P-2825-114, Monte da Caparica, Portugal
João Mexia

Authors

António Ribeiro
View author publications
You can also search for this author in PubMed Google Scholar
Gabriel Lopes
View author publications
You can also search for this author in PubMed Google Scholar
João Mexia
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Litton PRC, 1500 PRC Drive, VA 22102, McLean, USA
John S. White

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ribeiro, A., Lopes, G., Mexia, J. (2000). A Self-Learning Method of Parallel Texts Alignment. In: White, J.S. (eds) Envisioning Machine Translation in the Information Future. AMTA 2000. Lecture Notes in Computer Science(), vol 1934. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-39965-8_4

Download citation

DOI: https://doi.org/10.1007/3-540-39965-8_4
Published: 02 July 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41117-8
Online ISBN: 978-3-540-39965-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics