Research on Automatic Chinese Multi-word Term Extraction Based on Term Component

Kang, Wei; Sui, Zhifang

doi:10.1007/978-3-642-00831-3_6

Research on Automatic Chinese Multi-word Term Extraction Based on Term Component

Wei Kang²¹ &
Zhifang Sui²¹

Conference paper

845 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5459))

Abstract

This paper presents an automatic Chinese multi-word term extraction method based on the unithood and the termhood measure. The unithood of the candidate term is measured by the strength of inner unity and marginal variety. Term component is taken into account to estimate the termhood. Inspired by the economical law of term generating, we propose two measures of a candidate term to be a true term: the first measure is based on domain speciality of term, and the second one is based on the similarity between a candidate and a template that contains structured information of terms. Experiments on I.T. domain and Medicine domain show that our method is effective and portable in different domains.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Dagan, Ido, Church, K.: Termight: Identifying and Translating Technical Terminology. In: Proceedings of the 4th Conference on Applied Natural Language Processing (ANLP), pp. 34–40 (1994)
Google Scholar
Church, K.W., Hanks, P.: Word Association Norms, Mutual Information and Lexicography. Computational Linguistics 16(1), 22–29 (1990)
Google Scholar
Dunning, T.: Accurate Methods for the Statistics of Surprise and Coincidence. Computational Linguistics 19(1), 61–75 (1993)
Google Scholar
Daille, B.: Study and Implementation of Combined Techniques for Automatic Extraction of Terminology. In: The Balancing Act: Combining Symbolic and Statistical Approaches to Language, New Mexico State University, Las Cruces (1994)
Google Scholar
Patry, A., Langlais, P.: Corpus-based Terminology Extraction. In: 7th International Conference on Terminology and Knowledge Engineering, Copenhagen, Denmark, pp. 313–321 (August 2005)
Google Scholar
Su, K.-Y., Wu, M.-W., Chang, J.-S.: A Corpus-based Approach to Automatic Compound Extraction. In: Proceedings of the 32nd Annual meeting on Association for Computational Linguistics, Las Cruces, New Mexico, June 27-30, pp. 242–247 (1994)
Google Scholar
Frantzi, K.T., Ananiadou, S.: Extracting Nested Collocations. In: Proceedings of the 16th Conference on Computational Linguistics, pp. 41–46 (1996)
Google Scholar
Nakagawa, H., Mori, T.: A Simple but Powerful Automatic Term Extraction Method. In: Proceeding of the 2nd International Workshop on Computational Terminology, Taipei, Taiwan, pp. 29–35, August 31 (2002)
Google Scholar
Wermter, J., Hahn, U.: Paradigmatic Modifiability Statistics for the Extraction of Complex Multi-word Terms. In: HLT-EMNLP 2005–Proceedings of the 5th Human Language Technology Conference and 2005 Conference on Empirical Methods in Natural Language Processing, Vancouver, Canada, October 6-8, pp. 843–850 (2005)
Google Scholar
Wermter, J., Hahn, U.: You Can’t Beat Frequency (Unless You Use Linguistic Knowledge)–A Qualitative Evaluation of Association Measures for Collocation and Term Extraction. In: Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL, Sydney, pp. 785–792 (July 2006)
Google Scholar
Deane, P.: A Nonparametric Method for Extraction of Candidate Phrasal Terms. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005), Ann Arbor, Michigan, pp. 605–613 (2005)
Google Scholar
Kit, C.: Corpus Tools for Retrieving and Deriving Termhood Evidence. In: 5th East Asia Forum of Terminology, Haikou, China, December 6, pp. 69–80 (2002)
Google Scholar
Feng, Z.: An Introduction to Modern Terminology. Language & Culture Press, Beijing (1997)
Google Scholar
Nagao, M., Mori, S.: A New Method of N-gram Statistics for Large Number of N and Automatic Extraction of Words and Phrases from Large Text Data of Japanese. In: Proceedings of the 15th Conference on Computational Linguistics, vol. 1, pp. 611–615 (1994)
Google Scholar
Chen, Y.: The Research on Automatic Chinese Term Extraction. Master’s thesis of Peking University (2005)
Google Scholar
Lin, D.: An Information-Theoretic Definition of Similarity. In: Proceedings of the 15th International Conference on Machine Learning, pp. 296–304. Morgan Kaufmann, San Francisco (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Computational Linguisitcs, Peking University, 100871, Peking, China
Wei Kang & Zhifang Sui

Authors

Wei Kang
View author publications
You can also search for this author in PubMed Google Scholar
Zhifang Sui
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong
Wenjie Li
Division of Information and Communication Sciences, Macquarie University, NSW 2109, Sydney, Australia
Diego Mollá-Aliod

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kang, W., Sui, Z. (2009). Research on Automatic Chinese Multi-word Term Extraction Based on Term Component. In: Li, W., Mollá-Aliod, D. (eds) Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy. ICCPOL 2009. Lecture Notes in Computer Science(), vol 5459. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00831-3_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-00831-3_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00830-6
Online ISBN: 978-3-642-00831-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics