Skip to main content
Log in

A Technical Word- and Term-Translation Aid Using Noisy Parallel Corpora across Language Groups

  • Published:
Machine Translation

Abstract

Technical-term translation represents one of the most difficult tasks for human translators since (1) most translators are not familiar with terms and domain-specific terminology and (2) such terms are not adequately covered by printed dictionaries. This paper describes an algorithm for translating technical words and terms from noisy parallel corpora across language groups. Given any word which is part of a technical term in the source language, the algorithm produces a ranked candidate match for it in the target language. Potential translations for the term are compiled from the matched words and are also ranked. We show how this ranked list helps translators in technical-term translation. Most algorithms for lexical and term translation focus on Indo-European language pairs, and most use a sentence-aligned clean parallel corpus without insertion, deletion or OCR noise. Our algorithm is language- and character-set-independent, and is robust to noise in the corpus. We show how our algorithm requires minimum preprocessing and is able to obtain technical-word translations without sentence-boundary identification or sentence alignment, from the English–Japanese awk manual corpus with noise arising from text insertions or deletions and on the English–Chinese HKUST bilingual corpus. We obtain a precision of 55.35% from the awk corpus for word translation including rare words, counting only the best candidate and direct translations. Translation precision of the best-candidate translation is 89.93% from the HKUST corpus. Potential term translations produced by the program help bilingual speakers to get a 47% improvement in translating technical terms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Aho, A., B. Kernighan, and P.Weinberger: 1980, The AWK Programming Language. Addison-Wesley, Reading, Massachusetts.

    Google Scholar 

  • Brown, P.F., J. Cocke, S.A. Della Pietra, V.J. Della Pietra, F. Jelinek, J.D. Lafferty, R.L. Mercer, and P. Roosin: 1990, 'A Statistical Approach to Machine Translation', Computational Linguistics 16, 7

    Google Scholar 

  • Brown, P.F., S.A. Della Pietra, V.J. Della Pietra, and R.L. Mercer: 1993, 'The Mathematics of Machine Translation: Parameter Estimation', Computational Linguistics 19, 263–311.

    Google Scholar 

  • Brown, P., J. Lai, and R. Mercer: 1991, 'Aligning Sentences in Parallel Corpora', in 29th Annual Conference of the Association for Computational Linguistics, Berkeley, Calif., pp. 169–176.

  • Chen, S.: 1993, 'Aligning Sentences in Bilingual Corpora using Lexical Information', in31st Annual Conference of the Association for Computational Linguistics, Columbus, Ohio, pp. 9–16.

  • Church, K.: 1988, 'A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text', in 2nd Conference on Applied Natural Language Processing, Austin, Texas, pp. 136–143.

  • Church, K.: 1993, 'Char align: A Program for Aligning Parallel Texts at the Character Level', in 31st Annual Conference of the Association for Computational Linguistics, Columbus, Ohio, pp. 1–8.

  • Church, K., I. Dagan, W. Gale, P. Fung, J. Helfman, and B. Satish: 1993, 'Aligning Parallel Texts: Do Methods Developed for English-French Generalize to Asian Languages?', in Proceedings of Pacific Asia Conference on Formal and Computational Linguistics.

  • Dagan, I. and K. Church: 1997, 'Termight: Coordinating Humans and Machines in Bilingual Terminology Acquisition', this issue, pp. 89–107.

  • Dagan, I., K.W. Church, and W.A. Gale: 1993, 'Robust Bilingual Word Alignment for Machine Aided Translation', in Proceedings of the Workshop on Very Large Corpora: Academic and Industrial Perspectives, Columbus, Ohio, pp. 1–8.

  • Fung, P.: 1995, 'A Pattern Matching Method for Finding Noun and Proper Noun Translations from Noisy Parallel Corpora', in 33rd Annual Conference of the Association for Computational Linguistics, Boston, Massachusettes, pp. 236–243.

  • Fung, P. and K. Church: 1994, 'Kvec: A New Approach for Aligning Parallel Texts', in COLING 94: The 15th International Conference on Computational Linguistics, Kyoto, Japan, pp. 1096–1102.

  • Fung, P., M-Y. Kan, and Y. Horita: 1996, 'Extracting Japanese Domain and Technical Terms is Relatively Easy', in NeMLaP-2: Proceedings of the Second International Conference on New Methods in Language Processing, Ankara, Turkey, pp. 148–159.

  • Fung, P. and K. McKeown: 1994, 'Aligning Noisy Parallel Corpora Across Language Groups: Word Pair Feature Matching by Dynamic Time Warping', in Technology Partnerships for Crossing the Language Barrier: Proceedings of the First Conference of the Association for Machine Translation in the Americas, Columbia, Maryland, pp. 81–88.

  • Fung, P. and D. Wu: 1994, 'Statistical Augmentation of a Chinese Machine-Readable Dictionary', in Second Annual Workshop on Very Large Corpora (WVLC2), Kyoto, Japan, pp. 69–85.

  • Gale, W. and K. Church: 1991, 'Identifying Word Correspondences in Parallel Text', in Proceedings of the Fourth DARPA Workshop on Speech and Natural Language, Asilomar, Calif.

  • Gale, W.A. and K.W. Church: 1993, 'A Program for Aligning Sentences in Bilingual Corpora', Computational Linguistics 19, 75–102.

    Google Scholar 

  • Justeson, J.S. and S.M. Katz: 1995, 'Technical Terminology: Some Linguistic Properties and an Algorithm for Identification in Text', Natural Language Engineering 1, 9–27.

    Google Scholar 

  • Kay, M. and M. Röscheisen: 1993, 'Text-Translation Alignment', Computational Linguistics 19, 121–142.

    Google Scholar 

  • Kupiec, J.: 1993, 'An Algorithm for Finding Noun Phrase Correspondences in Bilingual Corpora', in 31st Annual Conference of the Association for Computational Linguistics, Columbus, Ohio, pp. 17–22.

  • Matsumoto, Y. and M. Nagao: 1994, 'Improvements of Japanese Morphological Analyzer JUMAN', in Proceedings of the International Workshop on Sharable Natural Language Resources, pp. 22–28.

  • Rabiner, L. and B.-H. Juang: 1993, Fundamentals of Speech Recognition, Prentice-Hall, Englewood Cliffs, NJ.

    Google Scholar 

  • Shemtov, H.: 1993, 'Text Alignment in a Tool for Translating Revised Documents', in Sixth Conference of the European Chapter of the Association for Computational Linguistics, Utrecht, The Netherlands, pp. 449–453.

  • Simard, M., G. Foster, and P. Isabelle: 1992, 'Using Cognates to Align Sentences in Bilingual Corpora', in Fourth International Conference on Theoretical and Methodological Issues in Machine Translation: Empiricist vs. Rationalist Methods in MT, Montreal, Canada, pp. 67–82.

  • Smadja, F. and K. McKeown: 1993, 'Translating Collocations for Use in Bilingual Lexicons', in Proceedings of the ARPA Human Language Technology Workshop 94, Plainsboro, New Jersey.

  • Smadja, F., K. McKeown, and V. Hatzsivassiloglou: 1996, 'Translating Collocations for Bilingual Lexicons: A Statistical Approach', Computational Linguistics 22, 1–38.

    Google Scholar 

  • van der Eijk, P.: 1993, 'Automating the Acquisition of Bilingual Terminology', in Sixth Conference of the European Chapter of the Association for Computational Linguistics, Utrecht, The Netherlands, pp. 113–119.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fung, P., McKeown, K. A Technical Word- and Term-Translation Aid Using Noisy Parallel Corpora across Language Groups. Machine Translation 12, 53–87 (1997). https://doi.org/10.1023/A:1007974605290

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1007974605290

Navigation