Abstract
Recently, key-phrase extraction from patent document has received considerable attention. However, the current statistical approaches of Chinese key-phrase extraction did not realize the semantic comprehension, thereby resulting in inaccurate and partial extraction. In this study, a Chinese patent mining approach based on sememe statistics and key-phrase extraction has been proposed to extract key-phrases from patent document. The key-phrase extraction algorithm is based on semantic knowledge structure of HowNet, and statistical approach is adopted to calculate the chosen value of the phrase in the patent document. With an experimental data set, the results showed that the proposed algorithm had improvements in recall from 62% to 73% and in precision from 72% to 81% compared with term frequency statistics algorithm.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Chien, L.F., Pu, H.T.: Important Issues on Chinese Information Retrieval. Computational Linguistics and Chinese Language Processing 1, 205–221 (1996)
Schatz, B., Chen, H.: Digital Libraries: Technological Advancements and Social Impacts. IEEE Computer 2, 45–50 (1999)
Chen, H., Houston, A.L., Sewell, R.R., Schatz, B.R.: Internet Browsing and Searching: User Evaluation of Category Map and Concept Space Techniques. Journal of the American Society for Information Science 7, 582–603 (1998)
Wang, H., Li, S., Yu, S.: Automatic Keyphrase Extraction from Chinese News Documents. In: Wang, L., Jin, Y. (eds.) FSKD 2005. LNCS (LNAI), vol. 3614, pp. 648–657. Springer, Heidelberg (2005)
Freitag, D.: Machine Learning for Information Extraction in Informal Domains. Journal Machine Learning 39, 169–202 (2000)
Ong, T.H., Chen, H.: Updateable PAT-Tree Approach to Chinese Key Phrase Extraction using Mutual Information: A Linguistic Foundation for Knowledge Management. In: Proceedings of the Second Asian Digital Library Conference, Taiwan, pp. 63–84 (1999)
Dong, Z.D.: Bigger Context and Better Understanding: Expectation on Future MT Technology. In: Proceedings of the International Conference on Machine Translation & Computer Language Information, Beijing, pp. 17–25 (1996)
Damerau, F.J.: Generating and Evaluating Domain-Oriented Multi-word Terms from Texts. Information Processing & Management 4, 433–447 (1993)
Ji, H., Luo, Z., Wan, M., Gao, X.: Research on Automatic Summarization Based on Concept Counting and Semantic Hierarchy Analysis for English Texts. Journal of Chinese Information Processing 2, 14–20 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Jin, B., Teng, HF., Shi, YJ., Qu, FZ. (2007). Chinese Patent Mining Based on Sememe Statistics and Key-Phrase Extraction. In: Alhajj, R., Gao, H., Li, J., Li, X., Zaïane, O.R. (eds) Advanced Data Mining and Applications. ADMA 2007. Lecture Notes in Computer Science(), vol 4632. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73871-8_48
Download citation
DOI: https://doi.org/10.1007/978-3-540-73871-8_48
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73870-1
Online ISBN: 978-3-540-73871-8
eBook Packages: Computer ScienceComputer Science (R0)