Skip to main content
Log in

Alignment and Matching of Bilingual English–Chinese News Texts

  • Published:
Machine Translation

Abstract

This paper presents a project to align and match bilingual English–Chinesenews files downloaded from the China News Service’s website.The work involves the alignment of bilingual texts at the sentence andclause levels. It addition, the work also requires matching of filesas the English and Chinese news files downloaded from the web do notcome in the same sequential order. These news files have their owncharacteristics and, furthermore, the issue of file-matching has itsunique difficulties apart from the known problems of alignment workpreviously reported in the literature. To align the news files wecombine the criteria of “anchors” (i.e. unambiguous correspondingtext elements) and sentence length. We employ Dynamic Programming first toalign at the paragraph level, then to align at the sentence-clauselevel. The precision and recall of the alignment are satisfactory forfree translation texts. To match English and Chinese files, we make useof the anchor alone. In file matching we encounter a “collision” problem due to contending matching candidates, andpropose a recursive splitting algorithm to resolve the problem. Weallow human intervention to improve the precision of matching, andsucceeded in achieving 100% precision with a fairly small amount ofmanual effort. Finally, to determine the various parameters used inaligning and matching, we utilize a Genetic Algorithm software packageto obtain their optimized values.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Brown, P. F., J. Cocke, S. A. Della Pietra, V. J. Della Pietra, F. Jelinek, J. D. Lafferty, R. L. Mercer, and P. S. Roossin: 1990, ‘A Statistical Approach To Machine Translation’, Computational Linguistics 16, 79-85.

    Google Scholar 

  • Brown, P. F., J. C. Lai, and R. L. Mercer: 1991, ‘Aligning Sentences in Parallel Corpora’, 29th Annual Meeting of the Association for Computational Linguistics, Berkeley, California, pp. 169-176.

  • Chen, K-H. and H-H. Chen: 1995, ‘Aligning Bilingual Corpus: Especially for Language Pairs from Different Families’, Information Sciences 4, 57-81.

    Google Scholar 

  • Fung, P. and K. W. Church: 1994, ‘K-vec: A New Approach for Aligning Parallel Texts’, COLING 94: The 15th International Conference on Computational Linguistics, Kyoto, Japan, pp. 1096-1102.

  • Gale, W. A. and K. W. Church: 1991, ‘A Program for Aligning Sentences in Bilingual Corpora’, 29th Annual Meeting of the Association for Computational Linguistics, Berkeley, California, pp. 177-184.

  • Gale, W. A. and K. W. Church: 1993, ‘A Program for Aligning Sentences in Bilingual Corpora’, Computational Linguistics 19, 75-102.

    Google Scholar 

  • Haruno, M. and T. Yamazaki: 1996, ‘High-performance Bilingual Text Alignment Using Statistical and Dictionary Information’, 34th Annual Meeting of the Association for Computational Linguistics, Santa Cruz, California, pp. 131-138.

  • Hwang, D. and M. Nagao: 1994, ‘Ruijisei ni motodzuita nikkan taiyaku tekisuto no bun taiō’ [Aligning of Japanese and Korean texts by analogy] Jōhōshorigakkai Kenkyūhōkoku 94.9, 87-94.

    Google Scholar 

  • Kay, M. and M. Röscheisen: 1993, ‘Text-translation Alignment’, Computational Linguistics 19, 121-142.

    Google Scholar 

  • Ker, S. J. and J. S. Chang: 1997, ‘A Class-based Approach to Word Alignment’, Computational Linguistics 23, 313-341.

    Google Scholar 

  • Klavans, J. L. and P. Resnik (eds): 1996, The Balancing Act: Combining Symbolic and Statistical Approaches to Language. Cambridge, Massachusetts: MIT Press.

    Google Scholar 

  • Langlais, P.: 1997, A System to Align Complex Bilingual Corpora, Technical report TMH-QPSR 4, Kungliska Tekniska Hogskolan, Stockholm, Sweden.

    Google Scholar 

  • Matsumoto, Y., H. Ishimoto and T. Utsuro: 1993, ‘Structural Matching of Parallel Texts’, 31st Annual Meeting of the Association for Computational Linguistics, Columbus, Ohio, pp. 23-30.

  • Melamed, I. D.: 1996a, ‘A Geometric Approach to Mapping Bitext Correspondence’, Proceedings of the Conference on Empirical Methods in Natural Language Processing, Philadelphia, Pennsylvania, pp. 1-12.

  • Melamed, I. D.: 1996b, ‘Automatic Construction of Clean Broad-coverage Translation Lexicons’, Expanding MT Horizons: Proceedings of the Second Conference of the Association for Machine Translation in the Americas, Montreal, Quebec, pp. 125-134.

  • Melamed, I. D.: 1997, ‘A Word-to-Word Model of Translational Equivalence’, 35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics, Madrid, Spain, pp. 490-497.

  • Nagao, M.: 1984, ‘A Framework of a Mechanical Translation between Japanese and English by Analogy Principle’, In A. Elithorn and R. Banerji (eds) Artificial and Human Intelligence, 173-180. Amsterdam: North-Holland.

    Google Scholar 

  • Sato, S.: 1991, Example-Based Machine Translation, Ph.D. Thesis, Kyoto University, Japan.

    Google Scholar 

  • Schraudolph, N. N.: 1992, A User’s Guide to GAucsd 1.4, Technical report CS92-249, CSE Department, UCSD, San Diego, California.

    Google Scholar 

  • Simard, M., G. F. Foster and P. Isabelle: 1992, ‘Using Cognates to Align Sentences in Bilingual Corpora’, Quatrième colloque international sur les aspects théoriques et méthodologiques de la traduction automatique, Fourth International Conference on Theoretical and Methodological Issues in Machine Translation, TMI-92, Montréal, pp. 67-81.

  • Tan, C. L. and M. Nagao: 1995, ‘Automatic Alignment of Japanese-Chinese Bilingual Texts’, IEICE Transactions on Information and Systems E78-D.1, 68-76.

    Google Scholar 

  • Wu, D.: 1994, ‘Aligning Parallel English-Chinese Text Statistically with Lexical Criteria’, 32nd Annual Meeting of the Association for Computational Linguistics, Las Cruces, New Mexico, pp. 80-87.

  • Wu, D.: 1997, ‘Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora’, Computational Linguistics 23, 377-403.

    Google Scholar 

  • Xu, D. and C. L. Tan: 1996, ‘Automatic Alignment of English-Chinese Bilingual Texts of CNS News’, Proceedings of International Conference on Chinese Computing '96, Singapore, pp. 90-97.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xu, D., Tan, C.L. Alignment and Matching of Bilingual English–Chinese News Texts. Machine Translation 14, 1–33 (1999). https://doi.org/10.1023/A:1008092103873

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1008092103873

Navigation