Skip to main content
Log in

Construction of a Chinese–English Verb Lexicon for Machine Translation and Embedded Multilingual Applications

  • Published:
Machine Translation

Abstract

This paper addresses the problem of automatic acquisition of lexical knowledge for rapid construction of engines for machine translation and embedded multilingual applications. We describe new techniques for large-scale construction of a Chinese–English verb lexicon and we evaluate the coverage and effectiveness of the resulting lexicon. Leveraging off an existing Chinese conceptual database called How Net and a large, semantically rich English verb database, we use thematic-role information to create links between Chinese concepts and English classes. We apply the metrics of recall and precision to evaluate the coverage and effectiveness of the linguistic resources. The results of this work indicate that: (a) we are able to obtain reliable Chinese–English entries both with and without pre-existing semantic links between the two languages; (b) if we have pre-existing semantic links, we are able to produce a more robust lexical resource by merging these with our semantically rich English database; (c) in our comparisons with manual lexicon creation, our automatic techniques were shown to achieve 62% precision, compared to a much lower precision of 10% for arbitrary assignment of semantic links.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Ayan, N. F. and B. J. Dorr: 2002, ‘Generating A Parsing Lexicon from an LCS-Based Lexicon’, in LREC 2002 Workshop Proceedings: Linguistic Knowledge Acquisition and Representation: Bootstrapping Annotated Language Data, Las Palmas, Spain.

  • Baker, C. F., C. J. Fillmore, and J. B. Lowe: 1998, ‘The Berkeley FrameNet Project’, in COLINGACL '98: 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Montreal, Quebec, pp. 86–90.

  • Ballesteros, L. and W. B. Croft: 1997, ‘Phrasal Translation and Query Expansion Techniques for Cross-Language Information Retrieval’, in SIGIR '97: Proceedings of the 20th International ACM SIGIR Conference on Research and Development in Information Retrieval, Philadelphia, PA, pp. 84–91.

  • Carpuat, M., G. Ngai, P. Fung, and K. Church: 2002, ‘Creating a Bilingual Ontology: A Corpus-Based Approach for Aligning WordNet and HowNet’, in Proceedings of the 1st Global WordNet Conference, Mysore, India.

  • Dang, H. T., K. Kipper, M. Palmer, and J. Rosenzweig: 1998, ‘Investigating Regular Sense Extensions Based on Intersective Levin’, in COLING-ACL '98: 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Montreal, Quebec, pp. 293–299.

  • Dong, Z.: 1988a, ‘Enlightment and Challenge of Machine Translation’, Shanghai Journal of Translators for Science and Technology 1, 9–15.

    Google Scholar 

  • Dong, Z.: 1988b, ‘Knowledge Description: What, How and Who?’, in Proceedings of International Symposium on Electronic Dictionary, Tokyo, Japan, p. 18.

  • Dong, Z. D.: 1988c, ‘MT Research in China’, in Dan Maxwell, Klaus Schubert and Toon Witkam (eds), New Directions in Machine Translation, Foris, Dordrecht, pp. 85–91.

    Google Scholar 

  • Dong, Z.: 2000, ‘HowNet Chinese—English Conceptual Database’, Technical Report Online Software Database, Released at ACL. http://www.keenage.com.

  • Dorr, B. J.: 1993, Machine Translation: A View from the Lexicon, MIT Press, Cambridge, MA.

    Google Scholar 

  • Dorr, B. J.: 1994, ‘Machine Translation Divergences: A Formal Description and Proposed Solution’, Computational Linguistics 20, 597–633.

    Google Scholar 

  • Dorr, B. J.: 1997a, ‘Large-Scale Acquisition of LCS-Based Lexicons for Foreign Language Tutoring’, in Fifth Conference on Applied Natural Language Processing, Washington, DC, pp. 139–146.

  • Dorr, B. J.: 1997b, ‘Large-Scale Dictionary Construction for Foreign Language Tutoring and Interlingual Machine Translation’, Machine Translation 12, 271–322.

    Article  Google Scholar 

  • Dorr, B. J.: 2001, ‘LCS Verb Database’, Technical Report Online Software Database, University of Maryland, College Park, MD. http://www.umiacs.umd.edu/?bonnie/LCS_Database_ Docmentation.html.

    Google Scholar 

  • Dorr, B. J., N. Habash, and D. Traum: 1998, ‘A Thematic Hierarchy for Efficient Generation from Lexical-Conceptal Structure’, in Farwell et al. (1998), pp. 333–343.

  • Dorr, B. J. and D. Jones: 1999, ‘Acquisition of Semantic Lexicons: Using Word Sense Disambiguation to Improve Precision’, in E. Viegas (ed.), Breadth and Depth of Semantic Lexicons, Kluwer Academic Publishers, Norwell MA, pp. 79–98.

    Google Scholar 

  • Dorr, B. J. and M. Katsova: 1998, ‘Lexical Selection for Cross-Language Applications: Combining LCS with WordNet’, in Farwell et al. (1998), pp. 438–447.

  • Dorr, B. J., G.-A. Levow, D. Lin, and S. Thomas: 2000, ‘Chinese—English Semantic Resource Construction’, in Proceedings of the Second International Conference on Language Resources and Evaluation (LREC2000), Athens, Greece, pp. 757–760.

  • Dorr, B. J., M. A. Martí, and I. Castellón: 1997, ‘Spanish EuroWordNet and LCS-Based Interlingual MT’, Proceedings of the Workshop on Interlinguas in MT, MT Summit, San Diego, CA, pp. 19–32.

  • Dorr, B. J. and M. B. Olsen: 1996, ‘Multilingual Generation: The Role of Telicity in Lexical Choice and Syntactic Realization’, Machine Translation 11, 37–74.

    Article  Google Scholar 

  • Dorr, B. J., L. Pearl, R. Hwa, and N. Habash: 2002, ‘DUSTer: A Method for Unraveling Cross-Language Divergences for StatisticalWord-Level Alignment’, in Richardson (2002), pp. 31–43.

  • Dowty, D.: 1979, Word Meaning in Montague Grammar, Reidel, Dordrecht.

    Google Scholar 

  • Dowty, D.: 1991, ‘Thematic Proto-Roles and Argument Selection’, Language 67, 547–619.

    Article  Google Scholar 

  • Farwell, D., L. Gerber, and E. Hovy (eds): 1998, Machine Translation and the Information Soup: Third Conference of the Association for Machine Translation in the Americas, AMTA'98, Springer, Berlin.

    Google Scholar 

  • Fellbaum, C.: 1998, WordNet: An Electronic Lexical Database, MIT Press, Cambridge, MA.

    Google Scholar 

  • Gildea, D. and D. Jurafsky: 2002, ‘Automatic Labeling of Semantic Roles’, Computational Linguistics 28, 245–288.

    Article  Google Scholar 

  • Green, R., L. Pearl, B. J. Dorr, and P. Resnik: 2001a, ‘Lexical Resource Integration across the Syntax-Semantics Interface’, in Proceedings of the NAACL Workshop on WordNet and Other Lexical Resources: Applications, Customizations, Pittsburg, PA, pp. 71–76.

  • Green, R., L. Pearl, B. J. Dorr, and P. Resnik: 2001b, ‘Mapping WordNet Senses to a Lexical Database of Verbs’, in Association for Computational Linguistics 39th Annual Meeting and 10th Conference of the European Chapter, Toulouse, France, pp. 244–251.

  • Habash, N.: 2000, ‘Oxygen: A Language Independent Linearization Engine’, in John S. White (ed.), Envisioning Machine Translation in the Information Future: 4th Conference of the Association for Machine Translation in the Americas, AMTA2000, Springer, Berlin, pp. 68–79.

    Google Scholar 

  • Habash, N.: 2002, ‘IL Annotation Experiment’, in Workshop on Interlingual Reliability, Fifth Conference of the Association for Machine Translation in the Americas, AMTA2002, Tiburon, CA.

  • Habash, N. Y.: 2003, ‘Generation-Heavy Hybrid Machine Translation’, Ph.D. thesis, Department of Computer Science, University of Maryland, College Park, MD.

    Google Scholar 

  • Habash, N. and B. Dorr: 2001, ‘Large Scale Language Independent Generation Using Thematic Hierarchies’, in MT Summit VIII: Machine Translation in the Information Age, Santiago de Compostela, Spain, pp. 139–144.

  • Habash, N. and B. J. Dorr: 2002, ‘Handling Translation Divergences: Combining Statistical and Symbolic Techniques in Generation-Heavy Machine Translation’, in Richardson (2002), pp. 84–93.

  • Hobbs, J. R., D. Appelt, J. Bear, D. Israel, M. Kameyama, M. Stickel, and M. Tyson: 1997, ‘FASTUS: A Cascaded Finite-State Transducer for Extracting Information from Natural-Language Text’, in E. Roche and Y. Schabes (eds), Finite-State Language Processing, MIT Press, Cambridge, MA, pp. 383–406.

    Google Scholar 

  • Hovy, E.: 1998, ‘Combining and Standardizing Large-Scale, Practical Ontologies forMachine Translation and Other Uses’, in Proceedings of the First International Conference on Language Resources and Evaluation (LREC), Granada, Spain.

  • Hull, D. A. and G. Grefenstette: 1996, ‘Experiments in Multilingual Information Retrieval’, in Proceedings of the 19th Annual International ACMSIGIR Conference on Research and Development in Information Retrieval SIGIR '96, Zurich, Switzerland, pp. 49–57.

  • Jackendoff, R.: 1983, Semantics and Cognition, MIT Press, Cambridge, MA.

    Google Scholar 

  • Jackendoff, R.: 1990, Semantic Structures, MIT Press, Cambridge, MA.

    Google Scholar 

  • Jones, D., R. Berwick, F. Cho, Z. Khan, K. Kohl, N. Nomura, A. Radhakrishnan, U. Sauerland, and B. Ulicny: 1994, ‘Verb Classes and Alternations in Bangla, German, English, and Korean’, Technical report, Massachusetts Institute of Technology.

  • Kingsbury, P. and M. Palmer: 2002, ‘From Treebank to PropBank’, in LREC 2002: Third International Conference on Language Resources and Evaluation, Las Palmas, Spain, pp. 1989–1993.

  • Langkilde, I. and K. Knight: 1998a, ‘Generating Word Lattices from Abstract Meaning Representation’, Technical report, Information Science Institute, University of Southern California.

  • Langkilde, I. and K. Knight: 1998b, ‘Generation that Exploits Corpus-Based Statistical Knowledge’, in COLING-ACL '98: 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Montreal, Quebec, pp. 704–710.

  • Langkilde, I. and K. Knight: 1998c, ‘The Practical Value of n-Grams in Generation’, in Proceedings of the 9th International Natural Language Generation Workshop (INLG '98), Niagra-on-the-Lake, Ontario.

  • Langkilde-Geary, I.: 2002, ‘An Empirical Verification of Coverage and Correctness for a General-Purpose Sentence Generator’, in International Natural Language Generation Conference (INLG '02), Marriman, NY.

  • Levin, B.: 1993, English Verb Classes and Alternations: A Preliminary Investigation, University of Chicago Press, Chicago, IL.

    Google Scholar 

  • Levow, G.-A., B. J. Dorr, and D. Lin: 2000, ‘Construction of Chinese—English Semantic Hierarchy for Cross-Language Retrieval’, in Proceedings of the Workshop on English-Chinese Cross Language Information Retrieval, International Conference on Chinese Language Computing, Chicago, IL, pp. 187–194.

  • Miller, G. A. and C. Fellbaum: 1991, ‘Semantic Networks of English’, in B. Levin and S. Pinter (eds), Lexical and Conceptual Semantics, Blackwell, Cambridge, MA, pp. 197–229.

    Google Scholar 

  • Nomura, N., D. A. Jones, and R. C. Berwick: 1994, ‘An Architecture for a Universal Lexicon: A Case Study on Shared Syntactic Information in Japanese, Hindi, Bengali, Greek, and English’, in COLING 94: The 15th International Conference on Computational Linguistics, Kyoto, Japan, pp. 243–249.

  • Oard, D. W.: 1998, ‘A Comparative Study of Query and Document Translation for Cross-Language Information Retrieval’, in Farwell et al. (1998), pp. 472–483.

  • Oard, D. W. and B. J. Dorr: 1996, ‘A Survey of Multilingual Text Retrieval’, Technical Report UMIACS TR 96-19, CS-TR-3615, University of Maryland, Institute for Advanced Computer Studies. http://www.glue.umd.edu/~oard/research.html.

  • Olsen, M. B., B. J. Dorr, and D. J. Clark: 1997a, ‘Using WordNet to Posit Hierarchical Structure in Levin's Verb Classes’, in Proceedings of the Workshop on Interlinguas in MT, MT Summit, San Diego, CA, pp. 99–110.

  • Olsen, M. B., B. J. Dorr, and S. C. Thomas: 1997b, ‘Toward Compact Monotonically Compositional Interlingua Using Lexical Aspect’, in Proceedings of the Workshop on Interlinguas in MT, MT Summit, San Diego, CA, pp. 33–44.

  • Olsen, M. B., B. J. Dorr, and S. C. Thomas: 1998, ‘Enhancing Automatic Acquisition of Thematic Structure in a Large-Scale Lexicon for Mandarin Chinese’, in Farwell et al. (1998), pp. 41–50.

  • Palmer, M., A. Joshi, M. Marcus, M. Liberman, and F. Pereira: 2002, ‘Multilingual PennTools’, TIDES Presentation, University of Pennsylvania.

  • Palmer, M. and J. Rosenzweig: 1996, ‘Capturing Motion Verb Generalizations with Synchronous Adjoining Grammars’, in Expanding MT Horizons, Proceedings of the Second Conference of the Association for Machine Translation in the Americas, Montreal, Quebec, pp. 76–85.

  • Palmer, M., J. Rosenzweig, and S. Cotton: 2001, ‘Automatic Predicate Argument Analysis of the Penn TreeBank’, in Human Language Technologies Conference, San Diego, CA.

  • Palmer, M., J. Rosenzweig, and H. T. Dang: 1997, ‘Intersective Levin Classes’, in Workshop on Tagging Text with Lexical Semantics: Why, What, and How?, Washington, D.C. Presentation at the Working Group on Combining Knowledge Sources for Automatic Semantic Tagging.

  • Palmer,M. and Z. Wu: 1995, ‘Verb Semantics for English-Chinese Translation’, Machine Translation 10, 59–92.

    Article  Google Scholar 

  • Peters, W., P. Vossen, P. Diez-Orzas, and G. Adriaens: 1998, ‘Cross-Linguistic Alignment of Wordnets with an Inter-Lingual-Index’, Computers and the Humanities 32, 221–251.

    Article  Google Scholar 

  • Procter, P.: 1978, Longman Dictionary of Contemporary English, Longman, London.

    Google Scholar 

  • Resnik, P.: 1995, ‘Using Information Content to Evaluate Semantic Similarity in a Taxonomy’, in Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, IJCAI 95, Montréal, Québec, pp. 448–453.

  • Richardson, S. D. (ed.): 2002, Machine Translation: From Research to Real Users, 5th Conference of the Association for Machine Translation in the Americas, AMTA 2002, Springer, Berlin.

    Google Scholar 

  • Saint-Dizier, P.: 1996, ‘Semantic Verb Classes Based on ‘Alternations’ and on WordNet-like Semantic Criteria: A Powerful Convergence’, in Proceedings of the Workshop on Predicative Forms in Natural Language and Lexical Knowledge Bases, Toulouse, France, pp. 62–70.

  • Stallard, D.: 2000, ‘Talk'n'Travel: A Conversational System for Air Travel Planning’, in Association for Computational Linguistics 6th Applied Natural Language Processing Conference, Seattle, Washington, pp. 68–75.

  • van Valin, J. R. D.: 1993, ‘A Synopsis of Role and Reference Grammar’, in J. Robert D. van Valin (ed.), Advances in Role and Reference Grammar, John Benjamins, Amsterdam, pp. 1–164.

    Google Scholar 

  • Viegas, E., B. A. Onyshkevych, V. Raskin, and S. Nirenburg: 1996, ‘From Submit to Submitted via Submission: On Lexical Rules in Large-Scale Lexicon Acquisition’, in 34th Annual Meeting of the Association for Computational Linguistics, Santa Cruz, CA, pp. 32–39.

  • Vossen, P.: 1998, EuroWordNet: A Multilingual Database with Lexical Semantic Networks, Kluwer Academic Publishers, Dordrecht.

    Google Scholar 

  • Vossen, P., L. Bloksma, A. Alonge, E. Marinai, C. Peters, I. Castellon, A. Marti, and G. Rigau: 1998, ‘Compatibility in Interpretation of Relations in EuroWordNet’, Computers and the Humanities 32, 153–184.

    Article  Google Scholar 

  • Vossen, P., P. Diez-Orzas, and W. Peters: 1997, ‘The Multilingual Design of EuroWordNet’, in Proceedings of the ACL/EACL-97 Workshop on Automatic Information Extraction and Building of Lexical Semantic Resources for NLP Application, Madrid, Spain.

  • Weinberg, A., J. Garman, J. Martin, and P. Merlo: 1995, ‘Principle-Based Parser for Foreign Language Training in German and Arabic’, in J. K. Melissa Holland and M. Sams (eds), Intelligent Language Tutors: Theory Shaping Technology, Lawrence Erlbaum Associates, Hillsdale, NJ, pp. 23–44.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dorr, B.J., Levow, GA. & Lin, D. Construction of a Chinese–English Verb Lexicon for Machine Translation and Embedded Multilingual Applications. Machine Translation 17, 99–137 (2002). https://doi.org/10.1023/B:COAT.0000010116.83274.c3

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/B:COAT.0000010116.83274.c3

Navigation