Abstract
Automatic term extraction (ATR) is an important problem in natural language processing. But most of extraction methods focus on the extraction of multiword units. Inevitably, many common words (or phrases) as terms are extracted at the same time. In this paper, we propose a hybrid method for automatic extraction of term from domain-specific un-annotated Chinese documents by means of linguistics knowledge and statistical techniques, taking dual filtering strategy and introducing a weight formula to filter term candidates. The results of the research indicate that our system is more efficient and precise than previous methods.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Alegria, I., Arregi, O., Balza, I.: Linguistic and Statistical Approaches to Basque Term Extraction (2004), http://ixa.is.ehu.es
Bourigault, D.: Lexter, a Natural Language Processing Tool for Terminology Extraction. In: Proceedings of 7th EURALEX International Congress (1996)
Wenliang, C., Jingbo, Z., Tianshun, Y.: Automatic Learning Field Words by Bootstrapping. Language Computing and Content-based Text Processing, 67–72 (2003)
Church, K.W., Hanks, P.P.: Word association norms, mutual information and lexicography. In: Proceedings of the 27th Annual Meeting of the ACL, pp. 76–83 (1989)
Dias, G., Guillore, S., Lopes, J.G.P.: Mutual Expectation: A Measure for Multiword Lexical Unit Extraction. In: Proceedings of VEXTAL Venezia per il Trattamento Automatico delle Lingue (1999)
Justeson, J.S., Katz, S.M.: Technical Terminology: Some Linguistic Properties and an Algorithm for Identification in Text. Natural Language Engineering 1(1), 9–27 (1993)
Jianzhou, L., Tingting, H., Donghong, J.: Extracting Chinese Term Based on Open Corpus. Advances in Computation of Oriental Languages, 43–49 (2003)
Shengfen, L., Maosong, S.: Chinese Word Extraction Based on the Internal Associative Strength of Character Strings. Journal of Chinese Information Processing 2003(3), 9–14 (2003)
Pantel, P., Lin, D.: A Statistical Corpus-Based Term Extractor. In: Canadian Conference on AI 2001, pp. 36–46 (2001)
Navigli, R., Velardi, P.: Semantic Interpretation of Terminological Strings. In: Proceedings of 4th Conference. Terminology and Knowledge Engineering (TKE 2002), pp. 325–353 (2002)
Smadja, F.: Retrieving Collocations from Text: XTRACT. Computational Linguistics 19(1), 143–177 (1993)
Binyong, Y., Shizeng, F.: Word Frequency Counting: A new concept and a new approach. Applied Linguistics 1994 (2), 69–75 (1994)
Pu, Z.: The Application of Circulation to Recognizing Terms in the Field of IT. In: Proceedings of Conference of the 20th Anniversary of CIPSC, pp. 111–120 (2001)
Jiaheng, Z., Yongping, D., Lepeng, S.: The Research on Lexical Acquisition of Agricultural Plant Diseases and Insect Pests. In: Language Computing and Content-based Text Processing, pp. 61–66 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chen, X., Li, X., Hu, Y., Lu, R. (2005). Dual Filtering Strategy for Chinese Term Extraction. In: Wang, L., Jin, Y. (eds) Fuzzy Systems and Knowledge Discovery. FSKD 2005. Lecture Notes in Computer Science(), vol 3614. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11540007_97
Download citation
DOI: https://doi.org/10.1007/11540007_97
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28331-7
Online ISBN: 978-3-540-31828-6
eBook Packages: Computer ScienceComputer Science (R0)