Dual Filtering Strategy for Chinese Term Extraction

Chen, Xiaoming; Li, Xuening; Hu, Yi; Lu, Ruzhan

doi:10.1007/11540007_97

Dual Filtering Strategy for Chinese Term Extraction

Xiaoming Chen^20,21,
Xuening Li²⁰,
Yi Hu²⁰ &
…
Ruzhan Lu²⁰

Conference paper

908 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3614))

Abstract

Automatic term extraction (ATR) is an important problem in natural language processing. But most of extraction methods focus on the extraction of multiword units. Inevitably, many common words (or phrases) as terms are extracted at the same time. In this paper, we propose a hybrid method for automatic extraction of term from domain-specific un-annotated Chinese documents by means of linguistics knowledge and statistical techniques, taking dual filtering strategy and introducing a weight formula to filter term candidates. The results of the research indicate that our system is more efficient and precise than previous methods.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Alegria, I., Arregi, O., Balza, I.: Linguistic and Statistical Approaches to Basque Term Extraction (2004), http://ixa.is.ehu.es
Bourigault, D.: Lexter, a Natural Language Processing Tool for Terminology Extraction. In: Proceedings of 7th EURALEX International Congress (1996)
Google Scholar
Wenliang, C., Jingbo, Z., Tianshun, Y.: Automatic Learning Field Words by Bootstrapping. Language Computing and Content-based Text Processing, 67–72 (2003)
Google Scholar
Church, K.W., Hanks, P.P.: Word association norms, mutual information and lexicography. In: Proceedings of the 27^th Annual Meeting of the ACL, pp. 76–83 (1989)
Google Scholar
Dias, G., Guillore, S., Lopes, J.G.P.: Mutual Expectation: A Measure for Multiword Lexical Unit Extraction. In: Proceedings of VEXTAL Venezia per il Trattamento Automatico delle Lingue (1999)
Google Scholar
Justeson, J.S., Katz, S.M.: Technical Terminology: Some Linguistic Properties and an Algorithm for Identification in Text. Natural Language Engineering 1(1), 9–27 (1993)
Google Scholar
Jianzhou, L., Tingting, H., Donghong, J.: Extracting Chinese Term Based on Open Corpus. Advances in Computation of Oriental Languages, 43–49 (2003)
Google Scholar
Shengfen, L., Maosong, S.: Chinese Word Extraction Based on the Internal Associative Strength of Character Strings. Journal of Chinese Information Processing 2003(3), 9–14 (2003)
Google Scholar
Pantel, P., Lin, D.: A Statistical Corpus-Based Term Extractor. In: Canadian Conference on AI 2001, pp. 36–46 (2001)
Google Scholar
Navigli, R., Velardi, P.: Semantic Interpretation of Terminological Strings. In: Proceedings of 4^th Conference. Terminology and Knowledge Engineering (TKE 2002), pp. 325–353 (2002)
Google Scholar
Smadja, F.: Retrieving Collocations from Text: XTRACT. Computational Linguistics 19(1), 143–177 (1993)
Google Scholar
Binyong, Y., Shizeng, F.: Word Frequency Counting: A new concept and a new approach. Applied Linguistics 1994 (2), 69–75 (1994)
Google Scholar
Pu, Z.: The Application of Circulation to Recognizing Terms in the Field of IT. In: Proceedings of Conference of the 20^th Anniversary of CIPSC, pp. 111–120 (2001)
Google Scholar
Jiaheng, Z., Yongping, D., Lepeng, S.: The Research on Lexical Acquisition of Agricultural Plant Diseases and Insect Pests. In: Language Computing and Content-based Text Processing, pp. 61–66 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of computer science and engineering, Shanghai Jiao Tong Univ., Shanghai, 200030
Xiaoming Chen, Xuening Li, Yi Hu & Ruzhan Lu
School of computer science, Guizhou Univ, Guiyang, 55002, P.R.China
Xiaoming Chen

Authors

Xiaoming Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xuening Li
View author publications
You can also search for this author in PubMed Google Scholar
Yi Hu
View author publications
You can also search for this author in PubMed Google Scholar
Ruzhan Lu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Electrical and Electronic Engineering, Nanyang Technological University, Block S1, Nanyang Avenue, 639798, Singapore
Lipo Wang
Honda Research Institute Europe GmbH, Offenbach/Main, Germany
Yaochu Jin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, X., Li, X., Hu, Y., Lu, R. (2005). Dual Filtering Strategy for Chinese Term Extraction. In: Wang, L., Jin, Y. (eds) Fuzzy Systems and Knowledge Discovery. FSKD 2005. Lecture Notes in Computer Science(), vol 3614. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11540007_97

Download citation

DOI: https://doi.org/10.1007/11540007_97
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28331-7
Online ISBN: 978-3-540-31828-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics