Abstract
Our goal is to construct large-scale lexicons for interlingual MT of English, Arabic, Korean, and Spanish. We describe techniques that predict salient linguistic features of a non-English word using the features of its English gloss (i.e., translation) in a bilingual dictionary. While not exact, owing to inexact glosses and language-to-language variations, these techniques can augment an existing dictionary with reasonable accuracy, thus saving significant time. We have conducted two experiments that demonstrate the value of these techniques. The first tested the feasibility of building a database of thematic grids for over 6500 Arabic verbs based on a mapping between English glosses and the syntactic codes in Longman's Dictionary of Contemporary English (LDOCE) (Procter, 1978). We show that it is more efficient and less error-prone to hand-verify the automatically constructed grids than it would be to build the thematic grids by hand from scratch. The second experiment tested the automatic classification of verbs into a richer semantic typology based on (Levin, 1993), from which we can derive a more refined set of thematic grids. In this second experiment, we show that a brute-force, non-robust technique provides 72% accuracy for semantic classification of LDOCE verbs; we then show that it is possible to approach this yield with a more robust technique based on fine-tuned statistical correlations. We further suggest the possibility of raising this yield by taking into account linguistic factors such as polysemy and positive and negative constraints on the syntax-semantics relation. We conclude that, while human intervention will always be necessary for the construction of a semantic classification from LDOCE, such intervention is significantly minimized as more knowledge about the syntax-semantics relation is introduced.
Similar content being viewed by others
References
Alshawi, H. 1989. Analysing the Dictionary Definitions. In B. Boguraev and T. Briscoe, editor,Computational Lexicography for Natural Language Processing. Longman, London, pages 153–169.
Boguraev, B. and T. Briscoe. 1989. Utilising the LDOCE Grammar Codes. In B. Boguraev and T. Briscoe, editor,Computational Lexicography for Natural Language Processing. Longman, London, pages 85–116.
Dorr, B.J. 1993.Machine Translation: A View from the Lexicon. MIT Press, Cambridge, MA.
Dorr, B.J., J. Hendler, S. Blanksteen, and B. Migdalof. 1994. Use of LCS and Discourse for Intelligent Tutoring: On Beyond Syntax. In M. Holland and J. Kaplan and M. Sams, editor,Intelligent Language Tutors: Balancing Theory and Technology. Lawrence Erlbaum Associates, Hillsdale, NJ.
Dorr, B.J. and D. Jones. 1995. Automatic Extraction of Semantic Classes from Syntactic Information in Online Resources. Technical Report UMIACS/CS TR, Institute for Advanced Computer Studies, University of Maryland, College Park, MD.
Dorr, B.J., D. Lin, J. Lee, and S. Suh. 1994. A Paradigm for Non-head-driven Parsing: Parameterized Message-Passing. InProceedings of the International Conference on New Methods in Language Processing, Manchester, UK.
Farwell, D., L. Guthrie, and Y. Wilks. 1993. Automatically Creating Lexical Entries for ULTRA, a Multilingual MT System.Machine Translation, 8(3).
Fillmore, C.J. 1968. The Case for Case. In E. Bach and R.T. Harms, editor,Universals in Linguistic Theory. Holt, Rinehart, and Winston, pages 1–88.
Fontenelle, T. and J. Vanandroye. 1989. Retrieving Ergative Verbs from a Lexical Data Base.Dictionaries, 11:11–39.
Grimshaw, J. 1990.Argument Structure. MIT Press, Cambridge, MA.
Gruber, J.S. 1965.Studies in Lexical Relations. Ph.D. thesis, Information Science, Massachusetts Institute of Technology, Cambridge, MA.
Jackendoff, R.S. 1983.Semantics and Cognition. MIT Press, Cambridge, MA.
Jackendoff, R.S. 1990.Semantic Structures. MIT Press, Cambridge, MA.
Levin, B. 1993.English Verb Classes and Alternations: A Preliminary Investigation. University of Chicago Press, Chicago, IL.
Lin, D., B.J. Dorr, J. Lee, and S. Suh. 1994. A Parameter-Based Message-Passing Parser for MT of Korean and English. InProceedings of the Association for MT in the Americas Conference on Partnerships in Translation Technology, Columbia, MD, pages 149–156, Columbia, MD.
Lonsdale, D., T. Mitamura, and E. Nyberg. 1995. Acquisition of Large Lexicons for Practical Knowledge-Based MT.Machine Translation, 9(3).
Montemagni, S. and L. Vanderwende. 1992. Structural Patterns vs. String Patterns for Extracting Semantic Information from Dictionaries. InProceedings of Fourteenth International Conference on Computational Linguistics, pages 546–552, Nantes, France.
Pesetsky, D. 1982. Paths and Categories. Unpublished MIT Ph.D. dissertation.
Pinker, S. 1989.Learnability and Cognition: The Acquisition of Argument Structure. MIT Press, Cambridge, MA.
Procter, P. 1978.Longman Dictionary of Contemporary English. Longman, London.
Sanfilippo, A. and V. Poznanski. 1992. The Acquisition of Lexical Knowledge from Combined Machine-Readable Dictionary Resources. InProceedings of the Applied Natural Language Processing Conference, pages 80–87, Trento, Italy.
Weinberg, A., J. Garman, J. Martin, and P. Merlo. 1994. Principle-Based Parser for Foreign Language Training in German and Arabic. In M. Holland and J. Kaplan and M. Sams, editor,Intelligent Language Tutors: Balancing Theory and Technology. Lawrence Erlbaum Associates, Hillsdale, NJ.
Wilks, Y., D. Fass, C.M. Guo, J.E. McDonald, T. Plate, and B.M. Slator. 1989. A Tractable Machine Dictionary as a Resource for Computational Semantics. In B. Boguraev and T. Briscoe, editor,Computational Lexicography for Natural Language Processing. Longman, London, pages 85–116.
Wilks, Y., D. Fass, C.M. Guo, J.E. McDonald, T. Plate, and B.M. Slator. 1990. Providing Machine Tractable Dictionary Tools.Machine Translation, 5(2):99–154.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Dorr, B.J., Garman, J. & Weinberg, A. From syntactic encodings to thematic roles: Building lexical entries for interlingual MT. Mach Translat 9, 221–250 (1994). https://doi.org/10.1007/BF00980579
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF00980579