Skip to main content
Log in

Termight: Coordinating Humans and Machines in Bilingual Terminology Acquisition

  • Published:
Machine Translation

Abstract

We propose a semi-automatic tool, termight, that supports the construction of bilingual glossaries. Termight consists of two components which address the two subtasks in glossary construction: (a) preparing a monolingual list of all technical terms in a source-language document, and (b) finding the translations for these terms in parallel source–target documents. As a first step (in each component) the tool extracts automatically candidate terms and candidate translations, based on term-extraction and word-alignment algorithms. It then performs several additional preprocessing steps which greatly facilitate human post-editing of the candidate lists. These steps include grouping and sorting of candidates and associating example concordance lines with each candidate. Finally, the data prepared in preprocessing is presented to the user via an interactive interface which supports quick post-editing operations. Termight was deployed by translators at AT & T Business Translation Services (formerly AT & T Language Line Services) leading to very high rates of semi-automatic glossary construction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Bourigault, D.: 1992, 'Surface Grammatical Analysis for the Extraction of Terminological Noun Phrases', in Proceedings of the Fifteenth International Conference on Computational Linguistics, COLING-92, Nantes, pp. 977–981.

  • Brown, P., Cocke, J., Della Pietra, S., Della Pietra, V., Jelinek, F., Mercer, R.L. and Roossin, P.S.: 1990, 'A Statistical Approach to Language Translation', Computational Linguistics 16, 79–85.

    Google Scholar 

  • Brown, P., Lai, J., and Mercer, R.: 1991a, 'Aligning Sentences in Parallel Corpora', in Proceedings of the 29th Annual Conference of the Association for Computational Linguistics, Berkeley, Calif., pp. 169–176.

  • Brown, P., Della Pietra, S., Della Pietra, V., and Mercer, R.: 1991b, 'Word Sense Disambiguation using Statistical Methods', in Proceedings of the 29th Annual Conference of the Association for Computational Linguistics, Berkeley, Calif., pp. 264–270.

  • Brown, P., Della Pietra, S., Della Pietra, V., and Mercer, R.: 1993, 'The Mathematics of Statistical Machine Translation: Parameter Estimation', Computational Linguistics 19, 263–311.

    Google Scholar 

  • L. L. Cherry: 1990, 'Index', in Unix Research System Papers, 10th edition, Vol. 2, pp. 609–610, AT&T, Murray Hill, NJ.

    Google Scholar 

  • Church, K. and Gale, W.: 1995, 'Inverse Document Frequency (idf): a Measure of Deviations from Poisson', in Proceedings of the Third Workshop on Very Large Corpora, pp. 121–130.

  • Church, K. W. and Hanks, P.: 1990, 'Word Association Norms, Mutual Information, and Lexicography', Computational Linguistics 16, 22–29.

    Google Scholar 

  • Church, K.W.: 1988, 'A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text', in Proceedings of the 2nd Conference on Applied Natural Language Processing, Austin, TX, pp. 136–143.

  • Church, K. W.: 1993, 'Char align: A Program for Aligning Parallel Texts at the Character Level', in Proceedings of the 31st Annual Conference of the Association for Computational Linguistics, Columbus, Ohio, pp. 1–8.

  • Dagan, I., Church, K., and Gale, W.: 1993, 'Robust Bilingual Word Alignment for Machine Aided Translation', in Proceedings of the Workshop on Very Large Corpora: Academic and Industrial Perspectives, Columbus, Ohio, pp. 1–8.

  • Daille, B.: 1994, 'Study and Implementation of Combined Techniques for Automatic Extraction of Terminology', in J. L. Klavans and P. Resnik (eds), The Balancing Act, Combining Symbolic and Statistical Approaches to Language, pp. 29–36.

  • Daille, B., Gaussier, É., and Langé, J.-M.: 1994, 'Towards Automatic Extraction of Monolingual and Bilingual Terminology', in COLING 94: The 15th International Conference on Computational Linguistics, Kyoto, pp. 515–521.

  • Damerau, F.J.: 1993, 'Generating and Evaluating Domain-oriented Multi-word Terms from Texts', Information Processing & Management 29, 433–447.

    Google Scholar 

  • Fung, P.: 1995, 'A Pattern Matching Method for Finding Noun and Proper Noun Translations from Noisy Parallel Corpora', in Proceedings of the 33rd Annual Conference of the Association for Computational Linguistics, Cambridge, Mass., pp. 236–233.

  • Fung, P. and Church, K.: 1994, 'K-vec: a New Approach for Aligning Parallel Texts', in COLING 94: The 15th International Conference on Computational Linguistics, pp. 1096–1102.

  • Fung, P. and McKeown, K.: 1996, 'A TechnicalWord-and Term-Translation Aid using Noisy Parallel Corpora across Language Groups', this issue, 53–87.

  • Gale, W. and Church, K.: 1991a, 'Concordances for Parallel Texts', in Proceedings of the Seventh Annual Conference of the UW Center for the New OED and Text Research, Using Corpora, pp. 40–62.

  • Gale,W. and Church, K.: 1991b, 'Identifying Word Correspondence in Parallel Text', in Proceedings of the Fourth DARPA Workshop on Speech and Natural Language, Asilomar, CA.

  • Gale, W. and Church, K.: 1991c, 'A Program for Aligning Sentences in Bilingual Corpora', in 29th Annual Conference of the Association for Computational Linguistics, Berkeley, Calif., pp. 177–184.

  • Gale, W., Church, K., and Yarowsky, D.: 1992, 'Using Bilingual Materials to Develop Word Sense Disambiguation Methods', in Fourth International Conference on Theoretical and Methodological Issues in Machine Translation: Empiricist vs. Rationalist Methods in MT: TMI-92, Montréal, pp. 101–112.

  • Gaussier, E., Langé, J. M., and Meunier, F.: 1992, 'Towards Bilingual Terminology', in Proceedings of the Joint ALLC/ACH Conference, Oxford, pp. 121–124.

  • Hann, M.: 1992, The Key to Technical Translation, Vol. 1. John Benjamins, Amsterdam.

    Google Scholar 

  • Harding P.: 1982, Automatic Indexing and Classification for Mechanised Information Retrieval. BLRDD Report No. 5723, British Library R & D Department, London.

    Google Scholar 

  • Isabelle, P., Dymetman, M., Foster, G., Jutras, J.-M., Macklovitch, E., Perrault, F., Ren, X., and Simard, M.: 1993, 'Translation Analysis and Translation Automation', in Proceedings of Fifth International Conference on Theoretical and Methodological Issues in Machine Translation TMI '93: MT in the Next Generation, pp. 201–217.

  • Isabelle, P.: 1992, 'Bi-textual Aids for Translators', in Proceedings of the Annual Conference of the UW Center for the New OED and Text Research.

  • Justeson, John and Katz, Slava: 1995, 'Technical Terminology: Some Linguistic Properties and an Algorithm for Identification in Text', Natural Language Engineering 1, 9–28.

    Google Scholar 

  • Kay, M.: 1980, 'The Proper Place of Men and Machines in Language Translation', Technical Report CSL–80–11, Xerox Palo Alto Research Center, Palo Alto, Calif. Reprinted in this issue, 3–23.

    Google Scholar 

  • Kay, M. and Röscheisen, M.: 1993, 'Text-translation Alignment', Computational Linguistics 19, 121–142.

    Google Scholar 

  • Klavans, J. and Tzoukermann, E.: 1990, 'The BICORD system: Combining Lexical Information from Bilingual Corpora and Machine Readable Dictionaries', in COLING-90: Papers presented to the 13th International Conference on Computational Linguistics, Helsinki, Vol. 3, pp. 174–179.

    Google Scholar 

  • Kupiec, J.: 1993, 'An Algorithm for Finding Noun Phrase Correspondences in Bilingual Corpora', in 31st Annual Conference of the Association for Computational Linguistics, Columbus, Ohio, pp. 17–22.

  • Landauer, T. K. and Littman, M. L.: 1990, 'Fully Automatic Cross-language Document Retrieval Using Latent Semantic Indexing', in Proceedings of the Annual Conference of the UW Center for the New OED and Text Research.

  • Matsumoto, Y., Ishimoto, H., and Utsuro, T.: 1993, 'Structural Matching of Parallel Texts', in 31st Annual Conference of the Association for Computational Linguistics, Columbus, Ohio, pp. 23–30.

  • Meyer, I., Skuce, D., Bowker, L., and Ack, K.: 1992, 'Towards a New Generation of Terminological Resources: An Experiment in Building a Terminological Knowledge Base', in Proceedings of the Fifteenth International Conference on Computational Linguistics, COLING-92, Nantes, pp. 956–960.

  • Ogden, W. and Gonzales, M.: 1993, 'Norm-a System for Translators', Demonstration at ARPA Workshop on Human Language Technology. Salton, Gerard: 1988, 'Syntactic Approaches to Automatic Book Indexing', in 26th Annual Conference of the Association for Computational Linguistics, Buffalo, NY, pp. 204–210.

  • Simard, M., Foster, G., and Isabelle, P.: 1992, 'Using Cognates to Align Sentences in Bilingual Corpora', in Fourth International Conference on Theoretical and Methodological Issues in Machine Translation: Empiricist vs. Rationalist Methods in MT, TMI-92, Montréal, pp. 67–82.

  • Simard,M., Foster, G., and Perrault, F.: 1995, TransSearch: a Bilingual Concordance Tool. Technical report, CITI, Laval, Canada.

  • Smadja, F.: 1992, 'How to Compile a Bilingual Collocational Lexicon Automatically', in AAAI Workshop on Statistically-based Natural Language Proceedings Techniques.

  • Smadja, F.: 1993, 'Retrieving Collocations from Text: Xtract', Computational Linguistics 19, 143–177.

    Google Scholar 

  • Smadja, F., McKeown, K. R., and Hatzivassiloglou, V.: 1996, 'Translating Collocations for Bilingual Lexicons: A Statistical Approach', Computational Linguistics 22, 1–38.

    Google Scholar 

  • Su, K.-Y., Wu, M.-W., and Chang, J.-S.: 1994, 'A Corpus-based Approach to Automatic Compound Extraction', in 32nd Annual Conference of the Association for Computational Linguistics, Las Cruces, NM, pp. 242–247.

  • van der Eijk, P.: 1993, 'Automating the Acquisition of Bilingual Terminology', in Sixth Conference of theEuropeanChapter of theAssociation forComputational Linguistics,Utrecht, TheNetherlands, pp. 113–119.

  • Warwick, S., Hajič, J., and Russell, G.: 1990, 'Searching on Tagged Corpora: Linguistically Motivated Concordance Analysis', in Proceedings of the Annual Conference of the UW Center for the New OED and Text Research.

  • Wu, M.-W. and Su, K.-Y.: 1993, 'Corpus-based Compound Extraction with Mutual Information and Relative Frequency Count', in Proceedings of ROCLING VI, pp. 207–216.

  • Wu, D. and Xia, X.: 1994, 'Learning an English-Chinese Lexicon from a Parallel Corpus', in Technology Partnerships for Crossing the Language Barrier: Proceedings of the First Conference of the Association for Machine Translation in the Americas, Columbia, Maryland, pp. 206–213.

  • Wu, D. and Xia, X.: 1995, 'Large-scale Automatic Extraction of an English-Chinese Lexicon', Machine Translation 9, 285–313.

    Google Scholar 

  • Zhou, J. and Dapkus, P.: 1995, 'Automatic Suggestion of Significant Terms for a Predefined Topic', in Proceedings of Third the Workshop on Very Large Corpora, pp. 131–147.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dagan, I., Church, K. Termight: Coordinating Humans and Machines in Bilingual Terminology Acquisition. Machine Translation 12, 89–107 (1997). https://doi.org/10.1023/A:1007926723945

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1007926723945

Navigation