Skip to main content

A Self-Learning Method of Parallel Texts Alignment

  • Conference paper
  • First Online:
Envisioning Machine Translation in the Information Future (AMTA 2000)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1934))

Included in the following conference series:

  • 638 Accesses

Abstract

This paper describes a language independent method for alignment of parallel texts that re-uses acquired knowledge. The system extracts word translation equivalents and re-uses them as correspondence points in order to enhance the alignment of parallel texts. Points that may cause misalignment are filtered using confidence bands of linear regression analysis instead of heuristics, which are not theoretically reliable. Homographs bootstrap the alignment process so as to build the primary word translation lexicon. At each step, the previously acquired lexicon is re-used so as to repeatedly make finer-grained alignments and produce more reliable translation lexicons.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Brown, P., Lai, J., Mercer, R.: Aligning Sentences in Parallel Corpora. In: Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, Berkeley, California, U.S.A. (1991) 169–176

    Google Scholar 

  2. Church, K.: Char_align: A Program for Aligning Parallel Texts at the Character Level. In: Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics, Columbus, Ohio, U.S.A. (1993) 1–8

    Google Scholar 

  3. ELRA (European Language Resources Association) (1997) Multilingual Corpora for Cooperation, Disk 2 of 2, Paris, France

    Google Scholar 

  4. Fung, P., McKeown, K.: Aligning Noisy Parallel Corpora across Language Groups: Word Pair Feature Matching by Dynamic Time Warping. In: Technology Partnerships for Crossing the Language Barrier: Proceedings of the First Conference of the Association for Machine Translation in the Americas, Columbia, Maryland, U.S.A. (1994) 81–88

    Google Scholar 

  5. Fung, P., McKeown, K.: A Technical Word-and Term-Translation Aid Using Noisy Parallel Corpora across Language Groups. In: Machine Translation, Vol. 12, numbers 12 (Special issue) (1997) 53–87

    Article  Google Scholar 

  6. Gale, W., Church, K.: A Program for Aligning Sentences in Bilingual Corpora. In: Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, Berkeley, California, U.S.A. (1991) 177–184 (short version). Also in: Computational Linguistics, Vol. 19, number 1 (1993) 75–102 (long version)

    Google Scholar 

  7. Kay, M., Röscheisen, M.: Text-Translation Alignment. In: Computational Linguistics, Vol. 19, number 1 (1993) 121–142

    Google Scholar 

  8. Melamed, I.: Bitext Maps and Alignment via Pattern Recognition. In: Computational Linguistics, Vol. 25, number 1 (1999) 107–130

    Google Scholar 

  9. Ribeiro, A., Lopes, G., Mexia, J.: Using Confidence Bands for Alignment with Hapaxes. In: Proceedings of the 2000 International Conference on Artificial Intelligence (IC-AI’ 2000), Las Vegas, U.S.A.. CSREA Press, U.S.A. (2000)

    Google Scholar 

  10. Ribeiro, A., Lopes, G., Mexia, J.: Linear Regression Based Alignment of Parallel Texts Using Homograph Words. In: Horn, W. (ed.): ECAI 2000. Proceedings of the 14th European Conference on Artificial Intelligence, Berlin, Germany. IOS Press, Amsterdam, Netherlands (2000)

    Google Scholar 

  11. Ribeiro, A., Lopes, G., Mexia, J.: Using Confidence Bands for Parallel Texts Alignment. In: Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics (ACL 2000) (2000, to appear)

    Google Scholar 

  12. da Silva, J., Dias, G., Guilloré, S., Lopes, J.: Using Localmaxs Algorithms for the Extraction of Contiguous and Non-contiguous Multiword Lexical Units. In: Barahona, P., Alferes, J. (eds.): Progress in Artificial Intelligence Lecture Notes in Artificial Intelligence, Vol. 1695. Springer-Verlag, Berlin Heidelberg New York (1999) 113–132

    Chapter  Google Scholar 

  13. Simard, M., Foster, G., Isabelle, P.: Using Cognates to Align Sentences in Bilingual Corpora. In: Proceedings of the Fourth International Conference on Theoretical and Methodological Issues in Machine Translation TMI-92, Montreal, Canada (1992) 67–81

    Google Scholar 

  14. Simard, M., Plamondon, P.: Bilingual Sentence Alignment: Balancing Robustness and Accuracy. In: Machine Translation, Vol. 13, number 1 (1998) 59–80

    Article  Google Scholar 

  15. Wonnacott, T., Wonnacott, R.: Introductory Statistics, 5th edition, John Wiley & Sons, New York Chichester Brisbane Toronto Singapore (1990)

    Google Scholar 

  16. Wu, D.: Aligning a Parallel English-Chinese Corpus Statistically with Lexical Criteria. In: Proceedings of the 32nd Annual Conference of the Association for Computational Linguistics, Las Cruces, New Mexico, U.S.A. (1994) 80–87

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ribeiro, A., Lopes, G., Mexia, J. (2000). A Self-Learning Method of Parallel Texts Alignment. In: White, J.S. (eds) Envisioning Machine Translation in the Information Future. AMTA 2000. Lecture Notes in Computer Science(), vol 1934. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-39965-8_4

Download citation

  • DOI: https://doi.org/10.1007/3-540-39965-8_4

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-41117-8

  • Online ISBN: 978-3-540-39965-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics