Chinese Patent Mining Based on Sememe Statistics and Key-Phrase Extraction

Jin, Bo; Teng, Hong-Fei; Shi, Yan-Jun; Qu, Fu-Zheng

doi:10.1007/978-3-540-73871-8_48

Chinese Patent Mining Based on Sememe Statistics and Key-Phrase Extraction

Bo Jin²⁴,
Hong-Fei Teng²⁵,
Yan-Jun Shi²⁵ &
…
Fu-Zheng Qu^25,26

Conference paper

2244 Accesses
5 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4632))

Abstract

Recently, key-phrase extraction from patent document has received considerable attention. However, the current statistical approaches of Chinese key-phrase extraction did not realize the semantic comprehension, thereby resulting in inaccurate and partial extraction. In this study, a Chinese patent mining approach based on sememe statistics and key-phrase extraction has been proposed to extract key-phrases from patent document. The key-phrase extraction algorithm is based on semantic knowledge structure of HowNet, and statistical approach is adopted to calculate the chosen value of the phrase in the patent document. With an experimental data set, the results showed that the proposed algorithm had improvements in recall from 62% to 73% and in precision from 72% to 81% compared with term frequency statistics algorithm.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Chien, L.F., Pu, H.T.: Important Issues on Chinese Information Retrieval. Computational Linguistics and Chinese Language Processing 1, 205–221 (1996)
Google Scholar
Schatz, B., Chen, H.: Digital Libraries: Technological Advancements and Social Impacts. IEEE Computer 2, 45–50 (1999)
Article Google Scholar
Chen, H., Houston, A.L., Sewell, R.R., Schatz, B.R.: Internet Browsing and Searching: User Evaluation of Category Map and Concept Space Techniques. Journal of the American Society for Information Science 7, 582–603 (1998)
Article Google Scholar
Wang, H., Li, S., Yu, S.: Automatic Keyphrase Extraction from Chinese News Documents. In: Wang, L., Jin, Y. (eds.) FSKD 2005. LNCS (LNAI), vol. 3614, pp. 648–657. Springer, Heidelberg (2005)
Chapter Google Scholar
Freitag, D.: Machine Learning for Information Extraction in Informal Domains. Journal Machine Learning 39, 169–202 (2000)
Article MATH Google Scholar
Ong, T.H., Chen, H.: Updateable PAT-Tree Approach to Chinese Key Phrase Extraction using Mutual Information: A Linguistic Foundation for Knowledge Management. In: Proceedings of the Second Asian Digital Library Conference, Taiwan, pp. 63–84 (1999)
Google Scholar
Dong, Z.D.: Bigger Context and Better Understanding: Expectation on Future MT Technology. In: Proceedings of the International Conference on Machine Translation & Computer Language Information, Beijing, pp. 17–25 (1996)
Google Scholar
Damerau, F.J.: Generating and Evaluating Domain-Oriented Multi-word Terms from Texts. Information Processing & Management 4, 433–447 (1993)
Article Google Scholar
Ji, H., Luo, Z., Wan, M., Gao, X.: Research on Automatic Summarization Based on Concept Counting and Semantic Hierarchy Analysis for English Texts. Journal of Chinese Information Processing 2, 14–20 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Dalian Univ. of Tech., P.R. China
Bo Jin
School of Mechanical Engineering, Dalian Univ. of Tech., P.R. China
Hong-Fei Teng, Yan-Jun Shi & Fu-Zheng Qu
Key Laboratory for Precision and Non-traditional Machining Technology, Dalian Univ. of Tech., P.R. China
Fu-Zheng Qu

Authors

Bo Jin
View author publications
You can also search for this author in PubMed Google Scholar
Hong-Fei Teng
View author publications
You can also search for this author in PubMed Google Scholar
Yan-Jun Shi
View author publications
You can also search for this author in PubMed Google Scholar
Fu-Zheng Qu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science Department, University of Calgary , Calgary, AB, Canada
Reda Alhajj
School of Computer Science and Technology , Harbin Institute of Technology, Harbin, China
Hong Gao
School of Computer Science and Technology , Harbin Institute of Technology , Harbin, China
Jianzhong Li
School of Information Technology and Electronic Engineering , The University of Queensland , Queensland, Australia
Xue Li
Department of Computing Science , University of Alberta, Edmonton, AB, Canada
Osmar R. Zaïane

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jin, B., Teng, HF., Shi, YJ., Qu, FZ. (2007). Chinese Patent Mining Based on Sememe Statistics and Key-Phrase Extraction. In: Alhajj, R., Gao, H., Li, J., Li, X., Zaïane, O.R. (eds) Advanced Data Mining and Applications. ADMA 2007. Lecture Notes in Computer Science(), vol 4632. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73871-8_48

Download citation

DOI: https://doi.org/10.1007/978-3-540-73871-8_48
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73870-1
Online ISBN: 978-3-540-73871-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics