Abstract
In this paper, we study the problem of adding a large number of new words into a Chinese thesaurus according to their definitions in a Chinese dictionary, while minimizing the effort of hand tagging. To deal with the problem, we first make use of a kind of supervised learning technique to learn a set of defining formats for each class in the thesaurus, which tries to characterize the regularities about the definitions of the words in the class. We then use traditional techniques in Graph theory to derive a minimal subset of the new words to be added into the thesaurus, which meets the following condition: if we add the new words in the subset into the thesaurus by hand, the other new words can be added into the thesaurus automatically by matching their definitions with the defining formats of each class in the thesaurus. The method uses little, if any, language-specific or thesaurus-specific knowledge, and can be applied to the thesauri of other languages.
Similar content being viewed by others
References
Boguraev, B. "Building a Lexicon." International Journal of Lexicography, 4(3) (1991).
Chang, J.S. and Y.J. Lin. "An Estimation of the Entropy of Chinese: A New Approach to Constructing Class-based n-grams Models." Proceedings of ROCLING VII. Taiwan, 1995, pp. 149–169.
Cormen, H., C.E. Leiserson and R.L. Rivest. Introduction to Algorithms. MIT Press, 1990.
Hopcroft, J. and J.D. Ullman. Introduction to Automata Theory, Language, and Computation. Reading, MA: Addison-Wesley, 1979.
Ker, S. J. and J.J.S.Chang. "Automatic Acquisition of Class-based Rules for Word Alignment." Proceedings of the 10th Pacific Asia Conference. Hong Kong, 1996, pp. 173–183.
Kozima, H. and T. Furugori. "Similarity between Words Computed by Spreading Activation on an English Dictionary." In Proceeding of 6th Conference of the European Chapter of ACL.} Utrecht, the Netherlands}, 1993}, pp. 232–
Knight, K. "Building a Large Ontology for Machine Translation." Proceedings of DARPA Human Language Conference. Princeton, USA, 1993, pp. 185–190.
Li, H. and N. Abe. "Generalizing Case Frames Using a Thesaurus and the MDL Principle." Proceedings of Recent Advances in Natural Language Processing, 1995, pp. 239–248.
Lesk, M. "Automated Word Sense Disambiguation using Machine Readable Dictionaries: How to Tell a Pine Cone from an Ice Cream Cone." Proceedings of the ACM SIGDOC Conference, Toronto, Ontario, 1986.
Lua, K.T. "A Study of Chinese Word Semantics and Its Prediction." Journal of Computer Processing of Chinese and Oriental Languages, 7(2) (1993), 167–189.
Mei, J.J. et al. TongYiCi CiLin(A Chinese Thesaurus). Shanghai: Shanghai Cishu Press, 1983.
Miller, G.A., R.Backwith, C.Fellbaum, D.Gross and K.J. Miller. "Introduction to WordNet: An On-line Lexical Database." International Journal of Lexicography}, 3(4)} (1990}) (Special Is
Nagao, M. "Some Rationales and Methodologies for Example-Based Approach." Proceedings of workshop on Future Generation Natural Language Processing. Manchester: UMIST 1992.
Hearst, M.A. and H. Schutze. "Customizing a Lexicon to Better Suit a Computational Task." Proceedings of 31st Annual Meeting of ACL, Columbus, Ohio, USA, 1993, pp. 55–69.
Procter, P. et al. Longman Dictionary of Contemporary English, Longman Group, 1978.
Resnik, P. "Disambiguating Noun Groupings with respect to WordNet Senses." Proceedings of 3rd Workshop on Very Large Corpus. MIT, USA, 1995, pp. 54–68. XianDai HanYu CiDian (A modern Chinese Dictionary). Beijing: Shangwu Press, 1978.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Donghong, J., Junping, G. & Changning, H. Adding New Words into A Chinese Thesaurus. Computers and the Humanities 31, 203–227 (1997). https://doi.org/10.1023/A:1000980024577
Issue Date:
DOI: https://doi.org/10.1023/A:1000980024577