Abstract
We present a new approach to the problem of aligning English and Chinese sentences in a bilingual corpus based on adaptive learning. While using length information alone produces surprisingly good results for aligning bilingual French and English sentences with success rates well over 95%, it does not fair as well for the alignment of English and Chinese sentences. The crux of the problem lies in greater variability of lengths and match types of the matched sentences. We propose to cope with such variability via a two-pass scheme under which model parameters can be learned from the data at hand. Experiments show that under the approach bilingual English-Chinese texts can be aligned effectively across diverse domains, genres and translation directions with accuracy rates approaching 99%.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Behavior Design Co.: The BDC Chinese-English Electronic Dictionary (Version 2.0), Taiwan (1992).
Brown, P.F., Della Pietra, S., Della Pietra, V., Mercer, R.L.: The Mathematic of Statistical Machine Translation: Parameter Estimation. Computational Linguistics 19:2 (1994) 263–311.
Brown, PF, Jennifer C. Lai, and Robert L. Mercer: Aligning Sentences in Parallel Corpora, In Proc. of the 29th Annual Meeting of the ACL (1991) 169–176.
Chang, J.S., Yu, D. and Lee, C.J.: Statistical Translation Model for Phrases, Computational Linguistic and Chinese Language Processing, 6:2 (2001) 43–64 (in Chinese).
Chen, S.F.: Aligning Sentences in Bilingual Corpora Using Lexical Information, In Proc. of 30th Annual Meeting of ACL (1993) 9–16.
Gale, W.A. and Church, K.W.: A program for aligning sentences in bilingual corpora, In Proc. of the 29th Annual Meeting of the ACL (1991) 177–184.
Jutras, J-M.: An Automatic Reviser: The TransCheck System, In Proc. of Applied Natural Language Processing (2000) 127–134.
Kay, M. and Röscheisen, M: Text-Translation Alignment, Computational Linguistics 19:1 (1994) 121–142.
Ker, S.J. and Chang J.S.: A Class-base Approach to Word Alignment, Computational Linguistics, 23:2 (1997) 313–343.
Kueng, T.L. and Su, K.Y.: A Robust Cross-Domain Bilingual Sentence Alignment Model, In Proceedings of the 19th International Conference on Computational Linguistics (2002).
Kwok, K.L.: NTCIR-2 Chinese, Cross-Language Retrieval Experiments Using PIRCS. In Proceedings of the Second NTCIR Workshop Meeting, National Institute of Informatics, Japan (2001) 14–20.
Longman Group.: Longman English-Chinese Dictionary of Contemporary English, Published by Longman Group (Far East) Ltd., Hong Kong (1992).
Melamed, I.D.: Bitext Maps and Alignment via Pattern Recognition, Computational Linguistics 25:1 (1999) 107–130.
Wu, D.K.: Aligning a Parallel English-Chinese Corpus Statistically with Lexical Criteria, In Proc. of the 31st Annual Meeting of the Association for Computational Linguistics (1994) 80–87.
Yamada, K, and Knight, K.: A Syntax-based Approach to Statistical Machine Translation. Proc. of the Conference of the Association for Computational Linguistics (2001) 523–530.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chuang, T.C., You, G.N., Chang, J.S. (2002). Adaptive Bilingual Sentence Alignment. In: Richardson, S.D. (eds) Machine Translation: From Research to Real Users. AMTA 2002. Lecture Notes in Computer Science(), vol 2499. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45820-4_3
Download citation
DOI: https://doi.org/10.1007/3-540-45820-4_3
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44282-0
Online ISBN: 978-3-540-45820-3
eBook Packages: Springer Book Archive