A Technical Word- and Term-Translation Aid Using Noisy Parallel Corpora across Language Groups

Fung, Pascale; McKeown, Kathleen

doi:10.1023/A:1007974605290

A Technical Word- and Term-Translation Aid Using Noisy Parallel Corpora across Language Groups

Published: March 1997

Volume 12, pages 53–87, (1997)
Cite this article

Machine Translation

Pascale Fung¹ &
Kathleen McKeown¹

343 Accesses
Explore all metrics

Abstract

Technical-term translation represents one of the most difficult tasks for human translators since (1) most translators are not familiar with terms and domain-specific terminology and (2) such terms are not adequately covered by printed dictionaries. This paper describes an algorithm for translating technical words and terms from noisy parallel corpora across language groups. Given any word which is part of a technical term in the source language, the algorithm produces a ranked candidate match for it in the target language. Potential translations for the term are compiled from the matched words and are also ranked. We show how this ranked list helps translators in technical-term translation. Most algorithms for lexical and term translation focus on Indo-European language pairs, and most use a sentence-aligned clean parallel corpus without insertion, deletion or OCR noise. Our algorithm is language- and character-set-independent, and is robust to noise in the corpus. We show how our algorithm requires minimum preprocessing and is able to obtain technical-word translations without sentence-boundary identification or sentence alignment, from the English–Japanese awk manual corpus with noise arising from text insertions or deletions and on the English–Chinese HKUST bilingual corpus. We obtain a precision of 55.35% from the awk corpus for word translation including rare words, counting only the best candidate and direct translations. Translation precision of the best-candidate translation is 89.93% from the HKUST corpus. Potential term translations produced by the program help bilingual speakers to get a 47% improvement in translating technical terms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

TermFinder: log-likelihood comparison and phrase-based statistical machine translation models for bilingual terminology extraction

Article 03 February 2018

Fully automatic multi-language translation with a catalogue of phrases: successful employment for the Swiss avalanche bulletin

Article 14 September 2015

Classification and Selection of Translation Candidates for Parallel Corpora Alignment

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Aho, A., B. Kernighan, and P.Weinberger: 1980, The AWK Programming Language. Addison-Wesley, Reading, Massachusetts.
Google Scholar
Brown, P.F., J. Cocke, S.A. Della Pietra, V.J. Della Pietra, F. Jelinek, J.D. Lafferty, R.L. Mercer, and P. Roosin: 1990, 'A Statistical Approach to Machine Translation', Computational Linguistics 16, 7
Google Scholar
Brown, P.F., S.A. Della Pietra, V.J. Della Pietra, and R.L. Mercer: 1993, 'The Mathematics of Machine Translation: Parameter Estimation', Computational Linguistics 19, 263–311.
Google Scholar
Brown, P., J. Lai, and R. Mercer: 1991, 'Aligning Sentences in Parallel Corpora', in 29th Annual Conference of the Association for Computational Linguistics, Berkeley, Calif., pp. 169–176.
Chen, S.: 1993, 'Aligning Sentences in Bilingual Corpora using Lexical Information', in31st Annual Conference of the Association for Computational Linguistics, Columbus, Ohio, pp. 9–16.
Church, K.: 1988, 'A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text', in 2nd Conference on Applied Natural Language Processing, Austin, Texas, pp. 136–143.
Church, K.: 1993, 'Char align: A Program for Aligning Parallel Texts at the Character Level', in 31st Annual Conference of the Association for Computational Linguistics, Columbus, Ohio, pp. 1–8.
Church, K., I. Dagan, W. Gale, P. Fung, J. Helfman, and B. Satish: 1993, 'Aligning Parallel Texts: Do Methods Developed for English-French Generalize to Asian Languages?', in Proceedings of Pacific Asia Conference on Formal and Computational Linguistics.
Dagan, I. and K. Church: 1997, 'Termight: Coordinating Humans and Machines in Bilingual Terminology Acquisition', this issue, pp. 89–107.
Dagan, I., K.W. Church, and W.A. Gale: 1993, 'Robust Bilingual Word Alignment for Machine Aided Translation', in Proceedings of the Workshop on Very Large Corpora: Academic and Industrial Perspectives, Columbus, Ohio, pp. 1–8.
Fung, P.: 1995, 'A Pattern Matching Method for Finding Noun and Proper Noun Translations from Noisy Parallel Corpora', in 33rd Annual Conference of the Association for Computational Linguistics, Boston, Massachusettes, pp. 236–243.
Fung, P. and K. Church: 1994, 'Kvec: A New Approach for Aligning Parallel Texts', in COLING 94: The 15th International Conference on Computational Linguistics, Kyoto, Japan, pp. 1096–1102.
Fung, P., M-Y. Kan, and Y. Horita: 1996, 'Extracting Japanese Domain and Technical Terms is Relatively Easy', in NeMLaP-2: Proceedings of the Second International Conference on New Methods in Language Processing, Ankara, Turkey, pp. 148–159.
Fung, P. and K. McKeown: 1994, 'Aligning Noisy Parallel Corpora Across Language Groups: Word Pair Feature Matching by Dynamic Time Warping', in Technology Partnerships for Crossing the Language Barrier: Proceedings of the First Conference of the Association for Machine Translation in the Americas, Columbia, Maryland, pp. 81–88.
Fung, P. and D. Wu: 1994, 'Statistical Augmentation of a Chinese Machine-Readable Dictionary', in Second Annual Workshop on Very Large Corpora (WVLC2), Kyoto, Japan, pp. 69–85.
Gale, W. and K. Church: 1991, 'Identifying Word Correspondences in Parallel Text', in Proceedings of the Fourth DARPA Workshop on Speech and Natural Language, Asilomar, Calif.
Gale, W.A. and K.W. Church: 1993, 'A Program for Aligning Sentences in Bilingual Corpora', Computational Linguistics 19, 75–102.
Google Scholar
Justeson, J.S. and S.M. Katz: 1995, 'Technical Terminology: Some Linguistic Properties and an Algorithm for Identification in Text', Natural Language Engineering 1, 9–27.
Google Scholar
Kay, M. and M. Röscheisen: 1993, 'Text-Translation Alignment', Computational Linguistics 19, 121–142.
Google Scholar
Kupiec, J.: 1993, 'An Algorithm for Finding Noun Phrase Correspondences in Bilingual Corpora', in 31st Annual Conference of the Association for Computational Linguistics, Columbus, Ohio, pp. 17–22.
Matsumoto, Y. and M. Nagao: 1994, 'Improvements of Japanese Morphological Analyzer JUMAN', in Proceedings of the International Workshop on Sharable Natural Language Resources, pp. 22–28.
Rabiner, L. and B.-H. Juang: 1993, Fundamentals of Speech Recognition, Prentice-Hall, Englewood Cliffs, NJ.
Google Scholar
Shemtov, H.: 1993, 'Text Alignment in a Tool for Translating Revised Documents', in Sixth Conference of the European Chapter of the Association for Computational Linguistics, Utrecht, The Netherlands, pp. 449–453.
Simard, M., G. Foster, and P. Isabelle: 1992, 'Using Cognates to Align Sentences in Bilingual Corpora', in Fourth International Conference on Theoretical and Methodological Issues in Machine Translation: Empiricist vs. Rationalist Methods in MT, Montreal, Canada, pp. 67–82.
Smadja, F. and K. McKeown: 1993, 'Translating Collocations for Use in Bilingual Lexicons', in Proceedings of the ARPA Human Language Technology Workshop 94, Plainsboro, New Jersey.
Smadja, F., K. McKeown, and V. Hatzsivassiloglou: 1996, 'Translating Collocations for Bilingual Lexicons: A Statistical Approach', Computational Linguistics 22, 1–38.
Google Scholar
van der Eijk, P.: 1993, 'Automating the Acquisition of Bilingual Terminology', in Sixth Conference of the European Chapter of the Association for Computational Linguistics, Utrecht, The Netherlands, pp. 113–119.

Download references

Author information

Authors and Affiliations

Computer Science Department, Columbia University, New York, NY, 10027, U.S.A.
Pascale Fung & Kathleen McKeown

Authors

Pascale Fung
View author publications
You can also search for this author inPubMed Google Scholar
Kathleen McKeown
View author publications
You can also search for this author inPubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fung, P., McKeown, K. A Technical Word- and Term-Translation Aid Using Noisy Parallel Corpora across Language Groups. Machine Translation 12, 53–87 (1997). https://doi.org/10.1023/A:1007974605290

Download citation

Issue Date: March 1997
DOI: https://doi.org/10.1023/A:1007974605290

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Technical Word- and Term-Translation Aid Using Noisy Parallel Corpora across Language Groups

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

TermFinder: log-likelihood comparison and phrase-based statistical machine translation models for bilingual terminology extraction

Fully automatic multi-language translation with a catalogue of phrases: successful employment for the Swiss avalanche bulletin

Classification and Selection of Translation Candidates for Parallel Corpora Alignment

Explore related subjects

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Subscribe and save

Buy Now