Skip to main content

Fast and Accurate Sentence Alignment of Bilingual Corpora

  • Conference paper
  • First Online:
Book cover Machine Translation: From Research to Real Users (AMTA 2002)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2499))

Included in the following conference series:

Abstract

We present a new method for aligning sentences with their translations in a parallel bilingual corpus. Previous approaches have generally been based either on sentence length or word correspondences. Sentence-length-based methods are relatively fast and fairly accurate. Word-correspondence-based methods are generally more accurate but much slower, and usually depend on cognates or a bilingual lexicon. Our method adapts and combines these approaches, achieving high accuracy at a modest computational cost, and requiring no knowledge of the languages or the corpus beyond division into words and sentences.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Kay, M., Röscheisen, M.: Text-Translation Alignment. Technical Report, Xerox Palo Alto Research Center (1988)

    Google Scholar 

  2. Kay, M., Röscheisen, M.: Text-Translation Alignment. Computational Linguistics 19(1) (1993) 121–142

    Google Scholar 

  3. Brown, P.F., Lai, J.C., Mercer, R.L.: Aligning Sentences in Parallel Corpora. In Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, Berkeley, California (1991) 169–176

    Google Scholar 

  4. Gale, W.A., Church, K.W.: A program for Aligning Sentences in Bilingual Corpora. In Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, Berkeley, California (1991) 177–184

    Google Scholar 

  5. Gale, W.A., Church, K.W.: A Program for Aligning Sentences in Bilingual Corpora. Computational Linguistics 19(1) (1993) 75–102

    Google Scholar 

  6. Chen, S.F.: 1993. Aligning Sentences in Bilingual Corpora Using Lexical Information. In Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics, Columbus, Ohio (1993) 9–16

    Google Scholar 

  7. Wu, D.: Aligning a Parallel English-Chinese Corpus Statistically with Lexical Criteria. In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, Las Cruces, New Mexico (1994) 80–87

    Google Scholar 

  8. Melamed, I.D.: A Geometric Approach to Mapping Bitext Correspondence. IRCS Technical Report 96-22, University of Pennsylvania (1996)

    Google Scholar 

  9. Melamed, I.D.: A Portable Algorithm for Mapping Bitext Correspondence. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics, Madrid, Spain (1997) 305–312

    Google Scholar 

  10. Simard, M., Plamondon, P.: Bilingual Sentence Alignment: Balancing Robustness and Accuracy. Machine Translation 13(1) (1998) 59–80

    Article  Google Scholar 

  11. Brown, PR, Delia Pietra, S. A., Della Pietra, V. J., Mercer, R.L.: The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics 19(2) (1993) 263–311

    Google Scholar 

  12. Rabiner, L. R.: A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE 77(2) (1989) 257–286

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Moore, R.C. (2002). Fast and Accurate Sentence Alignment of Bilingual Corpora. In: Richardson, S.D. (eds) Machine Translation: From Research to Real Users. AMTA 2002. Lecture Notes in Computer Science(), vol 2499. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45820-4_14

Download citation

  • DOI: https://doi.org/10.1007/3-540-45820-4_14

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-44282-0

  • Online ISBN: 978-3-540-45820-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics