Skip to main content
Log in

From syntactic encodings to thematic roles: Building lexical entries for interlingual MT

  • Published:
Machine Translation

Abstract

Our goal is to construct large-scale lexicons for interlingual MT of English, Arabic, Korean, and Spanish. We describe techniques that predict salient linguistic features of a non-English word using the features of its English gloss (i.e., translation) in a bilingual dictionary. While not exact, owing to inexact glosses and language-to-language variations, these techniques can augment an existing dictionary with reasonable accuracy, thus saving significant time. We have conducted two experiments that demonstrate the value of these techniques. The first tested the feasibility of building a database of thematic grids for over 6500 Arabic verbs based on a mapping between English glosses and the syntactic codes in Longman's Dictionary of Contemporary English (LDOCE) (Procter, 1978). We show that it is more efficient and less error-prone to hand-verify the automatically constructed grids than it would be to build the thematic grids by hand from scratch. The second experiment tested the automatic classification of verbs into a richer semantic typology based on (Levin, 1993), from which we can derive a more refined set of thematic grids. In this second experiment, we show that a brute-force, non-robust technique provides 72% accuracy for semantic classification of LDOCE verbs; we then show that it is possible to approach this yield with a more robust technique based on fine-tuned statistical correlations. We further suggest the possibility of raising this yield by taking into account linguistic factors such as polysemy and positive and negative constraints on the syntax-semantics relation. We conclude that, while human intervention will always be necessary for the construction of a semantic classification from LDOCE, such intervention is significantly minimized as more knowledge about the syntax-semantics relation is introduced.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Alshawi, H. 1989. Analysing the Dictionary Definitions. In B. Boguraev and T. Briscoe, editor,Computational Lexicography for Natural Language Processing. Longman, London, pages 153–169.

    Google Scholar 

  • Boguraev, B. and T. Briscoe. 1989. Utilising the LDOCE Grammar Codes. In B. Boguraev and T. Briscoe, editor,Computational Lexicography for Natural Language Processing. Longman, London, pages 85–116.

    Google Scholar 

  • Dorr, B.J. 1993.Machine Translation: A View from the Lexicon. MIT Press, Cambridge, MA.

    Google Scholar 

  • Dorr, B.J., J. Hendler, S. Blanksteen, and B. Migdalof. 1994. Use of LCS and Discourse for Intelligent Tutoring: On Beyond Syntax. In M. Holland and J. Kaplan and M. Sams, editor,Intelligent Language Tutors: Balancing Theory and Technology. Lawrence Erlbaum Associates, Hillsdale, NJ.

    Google Scholar 

  • Dorr, B.J. and D. Jones. 1995. Automatic Extraction of Semantic Classes from Syntactic Information in Online Resources. Technical Report UMIACS/CS TR, Institute for Advanced Computer Studies, University of Maryland, College Park, MD.

    Google Scholar 

  • Dorr, B.J., D. Lin, J. Lee, and S. Suh. 1994. A Paradigm for Non-head-driven Parsing: Parameterized Message-Passing. InProceedings of the International Conference on New Methods in Language Processing, Manchester, UK.

  • Farwell, D., L. Guthrie, and Y. Wilks. 1993. Automatically Creating Lexical Entries for ULTRA, a Multilingual MT System.Machine Translation, 8(3).

  • Fillmore, C.J. 1968. The Case for Case. In E. Bach and R.T. Harms, editor,Universals in Linguistic Theory. Holt, Rinehart, and Winston, pages 1–88.

  • Fontenelle, T. and J. Vanandroye. 1989. Retrieving Ergative Verbs from a Lexical Data Base.Dictionaries, 11:11–39.

    Google Scholar 

  • Grimshaw, J. 1990.Argument Structure. MIT Press, Cambridge, MA.

    Google Scholar 

  • Gruber, J.S. 1965.Studies in Lexical Relations. Ph.D. thesis, Information Science, Massachusetts Institute of Technology, Cambridge, MA.

    Google Scholar 

  • Jackendoff, R.S. 1983.Semantics and Cognition. MIT Press, Cambridge, MA.

    Google Scholar 

  • Jackendoff, R.S. 1990.Semantic Structures. MIT Press, Cambridge, MA.

    Google Scholar 

  • Levin, B. 1993.English Verb Classes and Alternations: A Preliminary Investigation. University of Chicago Press, Chicago, IL.

    Google Scholar 

  • Lin, D., B.J. Dorr, J. Lee, and S. Suh. 1994. A Parameter-Based Message-Passing Parser for MT of Korean and English. InProceedings of the Association for MT in the Americas Conference on Partnerships in Translation Technology, Columbia, MD, pages 149–156, Columbia, MD.

  • Lonsdale, D., T. Mitamura, and E. Nyberg. 1995. Acquisition of Large Lexicons for Practical Knowledge-Based MT.Machine Translation, 9(3).

  • Montemagni, S. and L. Vanderwende. 1992. Structural Patterns vs. String Patterns for Extracting Semantic Information from Dictionaries. InProceedings of Fourteenth International Conference on Computational Linguistics, pages 546–552, Nantes, France.

  • Pesetsky, D. 1982. Paths and Categories. Unpublished MIT Ph.D. dissertation.

  • Pinker, S. 1989.Learnability and Cognition: The Acquisition of Argument Structure. MIT Press, Cambridge, MA.

    Google Scholar 

  • Procter, P. 1978.Longman Dictionary of Contemporary English. Longman, London.

    Google Scholar 

  • Sanfilippo, A. and V. Poznanski. 1992. The Acquisition of Lexical Knowledge from Combined Machine-Readable Dictionary Resources. InProceedings of the Applied Natural Language Processing Conference, pages 80–87, Trento, Italy.

  • Weinberg, A., J. Garman, J. Martin, and P. Merlo. 1994. Principle-Based Parser for Foreign Language Training in German and Arabic. In M. Holland and J. Kaplan and M. Sams, editor,Intelligent Language Tutors: Balancing Theory and Technology. Lawrence Erlbaum Associates, Hillsdale, NJ.

    Google Scholar 

  • Wilks, Y., D. Fass, C.M. Guo, J.E. McDonald, T. Plate, and B.M. Slator. 1989. A Tractable Machine Dictionary as a Resource for Computational Semantics. In B. Boguraev and T. Briscoe, editor,Computational Lexicography for Natural Language Processing. Longman, London, pages 85–116.

    Google Scholar 

  • Wilks, Y., D. Fass, C.M. Guo, J.E. McDonald, T. Plate, and B.M. Slator. 1990. Providing Machine Tractable Dictionary Tools.Machine Translation, 5(2):99–154.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dorr, B.J., Garman, J. & Weinberg, A. From syntactic encodings to thematic roles: Building lexical entries for interlingual MT. Mach Translat 9, 221–250 (1994). https://doi.org/10.1007/BF00980579

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF00980579

Keywords

Navigation