Novel Word Features for Keyword Extraction

Chen, Yiqun; Yin, Jian; Zhu, Weiheng; Qiu, Shiding

doi:10.1007/978-3-319-21042-1_12

Yiqun Chen^17,18,
Jian Yin¹⁷,
Weiheng Zhu¹⁹ &
…
Shiding Qiu¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9098))

Included in the following conference series:

International Conference on Web-Age Information Management

2763 Accesses
4 Citations
3 Altmetric

Abstract

Keyword extraction plays an increasingly crucial role in several texts related researches. Applications that utilize feature word selection include text mining, and information retrieval etc. This paper introduces novel word features for keyword extraction. These new word features are derived according to the background knowledge supplied by patent data. Given a document, to acquire its background knowledge, this paper first generates a query for searching the patent data based on the key facts present in the document. The query is used to find files in patent data that are closely related to the contents of the document. With the patent search result file set, the information of patent inventors, assignees, and citations in each file are used to mining the hidden knowledge and relationship between different patent files. Then the related knowledge is imported to extend the background knowledge base, which would be extracted to derive the novel word features. The newly introduced word features that reflect the document’s background knowledge offer valuable indications on individual words’ importance in the input document and serve as nice complements to the traditional word features derivable from explicit information of a document. The keyword extraction problem can then be regarded as a classification problem and the Support Vector Machine (SVM) is used to extract the keywords. Experiments have been done using two different data sets. The results show our method improves the performance of keyword extraction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

USPTO Bulk Downloads. Patent Grant Full Text. The United States: Patent and Trademark Office. http://www.google.com/googlebooks/uspto-patents-grants-text.html
Witten, I.H., Paynter, G.W., Frank, E., Gutwin, C., Nevill-Manning, C.G.: Kea: Practical automatic keyphrase extraction. In: Proceedings of DL (1999)
Google Scholar
Mihalcea, R., Tarau, P.: Textrank: Bringing order into texts. In: Proceeding of Conference on Empirical Methods in Natural Language, pp. 404–411 (2004)
Google Scholar
Page, L., Brin, S., Motwani, R., et al.: The pagerank citation ranking: Bringing order to the web. Technical report of Stanford Digital Library
Google Scholar
Kleinberg, J.: Hubs, Authorities, and Communities. Cornell University (1999)
Google Scholar
Mihalcea R., Csomai A.: Wikify!: linking documents to encyclopedic knowledge. In: Proceedings of the 16th ACM Conference on Information and Knowledge Management, pp. 233–242 (2007)
Google Scholar
Medelyan, O., Frank, E., Witten, I.H.: Human-competitive tagging using automatic keyphrase extraction. In: Proceedings of EMNLP (2009)
Google Scholar
Qureshi, M.A., O’Riordan, C., Pasi, G.: Short-text domain specific key terms/phrases extraction using an n-gram model with wikipedia. In: Proceedings of ACM CIKM (2012)
Google Scholar
Joorabchi, A., Mahdi, A.E.: Automatic subject metadata generation for scientific documents using wikipedia and genetic algorithms. In: EKAW 2012. LNCS (LNAI), vol. 7603, pp. 32–41. Springer, Heidelberg (2012)
Google Scholar
Songhua, X., Yang, S., Lau, F.C.M.: Keyword Extraction and Headline Generation Using Novel Word Features. In: Proceedings of the 24th Conference on Artificial Intelligence, AAAI, Atlanta, Georgia, USA, July 11-15, pp. 1461–1466 (2011)
Google Scholar
Jean-Louis, L., Zouaq, A., Gagnon, M., Ensan, F.: An Assessment of Online Semantic Annotators for the Keyword Extraction Task. In: Pham, D.-N., Park, S.-B. (eds.) PRICAI 2014. LNCS, vol. 8862, pp. 548–560. Springer, Heidelberg (2014)
Google Scholar
QTag. http://web.bham.ac.uk/O.Mason/software/tagger/
Pedersen, T., Patwardhan, S., Michelizzi, J.: Wordnet: similarity – measuring the relatedness of concepts. In: Proceedings of the 19th National Conference on Artificial Intelligence (AAAI 2004), pp. 1024–1025 (2004)
Google Scholar
Miller, G.A.: Wordnet: a Lexical Database for English. Communications of the ACM (1995)
Google Scholar
Lucene. http://lucene.apache.org/
LIBSVM – A Library for Support Vector Machines. http://www.csie.ntu.edu.tw/~cjlin/libsvm/
Kim, S.N., Medelyan, O., Kan, M.Y., Baldwin, T.: Semeval-2010 task 5: Automatic keyphrase extraction from scientific articles. In: Proceedings of SemEval (2010)
Google Scholar
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Technical report, Ithaca, NY, USA (1987)
Google Scholar
El-Beltagy, S.R., Rafea, A.: Kp-miner: A keyphrase extraction system for english and Arabic documents. Information Systems (2009)
Google Scholar
Medelyan, O., Frank, E., Witten, I.H.: Human-competitive tagging using automatic keyphrase extraction. In: Proceedings of EMNLP (2009)
Google Scholar
Alchemyapi. http://www.alchemyapi.com/api/keyword-extraction/

Download references

Author information

Authors and Affiliations

Department of Computer Science, Sun Yat-sen University, Guangzhou, 510275, China
Yiqun Chen & Jian Yin
Department of Computer Science, Guangdong University of Education, Guangzhou, 510303, China
Yiqun Chen
College of Information Science Technology, Jinan University, Guangzhou, 510632, China
Weiheng Zhu & Shiding Qiu

Authors

Yiqun Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jian Yin
View author publications
You can also search for this author in PubMed Google Scholar
Weiheng Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Shiding Qiu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jian Yin .

Editor information

Editors and Affiliations

Google, CA, USA
Xin Luna Dong
Postdoc Apartments (Hong Lou) 4-1-4, Shandong University, Li Cheng, Jinan, China
Xiaohui Yu
Tsinghua University, Beijing, China
Jian Li
Northeastern University, BOSTON, USA
Yizhou Sun

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, Y., Yin, J., Zhu, W., Qiu, S. (2015). Novel Word Features for Keyword Extraction. In: Dong, X., Yu, X., Li, J., Sun, Y. (eds) Web-Age Information Management. WAIM 2015. Lecture Notes in Computer Science(), vol 9098. Springer, Cham. https://doi.org/10.1007/978-3-319-21042-1_12

Download citation

DOI: https://doi.org/10.1007/978-3-319-21042-1_12
Published: 06 June 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-21041-4
Online ISBN: 978-3-319-21042-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics