Chinese Terminology Extraction Using Window-Based Contextual Information

Ji, Luning; Sum, Mantai; Lu, Qin; Li, Wenjie; Chen, Yirong

doi:10.1007/978-3-540-70939-8_6

Luning Ji¹,
Mantai Sum¹,
Qin Lu¹,
Wenjie Li¹ &
…
Yirong Chen¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4394))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

1583 Accesses
8 Citations

Abstract

Terminology extraction is an important work for automatic update of domain specific knowledge. Contextual information helps to decide whether the extracted new terms are terminology or not. As extraction based on fixed patterns has very limited use to handle natural language text, we need both syntactical and semantic information in the context of a term to determine its termhood. In this paper, we investigate two window-based context word extraction methods taking into account of syntactic and semantic information. Based on the performance of each method individually, a hybrid method which combines both syntactical and semantic information is proposed. Experiments show that the hybrid method can achieve significant improvement.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Daille, B.: Study and Implementation of Combined Techniques for Automatic extraction of terminology. In: Resnik, P., Klavans, J. (eds.) The Balancing Act: Combining Symbolic and Statistical Approaches to Language, pp. 49–66. MIT Press, Cambridge (1996)
Google Scholar
Milios, E., Zhang, Y., He, B., Dong, L.: Automatic Term Extraction and Document Similarity in Special Text Corpora. In: Proc. of the 6th Conference of the Pacific Association for Computational Linguistics, Halifax, NS, Canada, August 22-25, pp. 275–284 (2003)
Google Scholar
Yirong, C., Qin, L., Wenjie, L., Zhifang, S., Luning, J.: A Study on Terminology Extraction Based on Classified Corpora. In: LREC2006 (2006)
Google Scholar
Chien, L.F.: Pat-tree-based adaptive keyphrase extraction for intelligent Chinese information retrieval. Information Processing and Management 35, 501–521 (1999)
Article Google Scholar
Frank, E., Paynter, G.W., Witten, I.H., Gutwin, C., Nevill-Manning, C.G.: Domain-specific keyphrase Extraction. In: Proc. of 16th Int. Joint Conf. on Artificial Intelligence IJCAI-99, pp. 668–673 (1999)
Google Scholar
Nakagawa, H., Mori, T.: A simple but powerful automatic term extraction method. In: Proc. of the 2nd Int. Workshop on Computational Terminology, Taipei,Taiwan, August 31, pp. 29–35 (2002)
Google Scholar
Fahmi, I.: C-value method for multi-word term extraction. In: Seminar in Statistics and Methodology, May 23 (2005)
Google Scholar
Chang, J.-S.: Domain Specific Word Extraction from Hierarchical Web Documents: A First Step Toward Building Lexicon Trees from Web Corpora. Proc. of the Fourth SIGHAN Workshop on Chinese Language Learning, 64–71 (2005)
Google Scholar
Kageura, K., Umino, B.: Methods of automatic term recognition: a review. Terminology 3(2), 259–289 (1996)
Article Google Scholar
Frantzi, K.T.: Incorporating Context Information for the Extraction of Terms. In: Proc. of ACL/EACL ’97, Madrid, Spain, July, pp. 501–503 (1997)
Google Scholar
Frantzi, K.T., Annaniadou, S.: Extracting nested collocations. In: Proc. Of COLING’96, pp. 41–46 (1996)
Google Scholar
Lu, Q., Chan, S.-T., Li, B., Yu, S.: A Unicode-based Adaptive Segmenter. Journal of Chinese Language and Computing 14(3), 221–234 (2004)
Google Scholar
Schone, P., Jurafsky, D.: Is knowledge-free induction of multiword unit dictionary headwords a solved problem? In: Proc. of EMNLP (2001)
Google Scholar
Luo, S., Sun, M.: Two-Character Chinese Word Extraction Based on Hybrid of Internal and Contextual Measures. In: Proc. of the Second SIGHAN Workshop on Chinese Language Processing, July, pp. 24–30 (2003)
Google Scholar
Sui, Z., Chen, Y.: The Research on the automatic Term Extraction in the Domain of Information Science and Technology. In: Proc. of the 5th East Asia Forum of the Terminology (2002)
Google Scholar
Hisamitsu, T., Niwa, Y.: A measure of term representativeness based on the number of co-occurring salient words. In: Proc. of the 19th COLING (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

The Department of Computing, The Hong Kong Polytechnic University, Hong Kong, China
Luning Ji, Mantai Sum, Qin Lu, Wenjie Li & Yirong Chen

Authors

Luning Ji
View author publications
You can also search for this author in PubMed Google Scholar
Mantai Sum
View author publications
You can also search for this author in PubMed Google Scholar
Qin Lu
View author publications
You can also search for this author in PubMed Google Scholar
Wenjie Li
View author publications
You can also search for this author in PubMed Google Scholar
Yirong Chen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ji, L., Sum, M., Lu, Q., Li, W., Chen, Y. (2007). Chinese Terminology Extraction Using Window-Based Contextual Information. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2007. Lecture Notes in Computer Science, vol 4394. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70939-8_6

Download citation

DOI: https://doi.org/10.1007/978-3-540-70939-8_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-70938-1
Online ISBN: 978-3-540-70939-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics