Abstract
Keyword extraction plays an increasingly crucial role in several texts related researches. Applications that utilize feature word selection include text mining, and information retrieval etc. This paper introduces novel word features for keyword extraction. These new word features are derived according to the background knowledge supplied by patent data. Given a document, to acquire its background knowledge, this paper first generates a query for searching the patent data based on the key facts present in the document. The query is used to find files in patent data that are closely related to the contents of the document. With the patent search result file set, the information of patent inventors, assignees, and citations in each file are used to mining the hidden knowledge and relationship between different patent files. Then the related knowledge is imported to extend the background knowledge base, which would be extracted to derive the novel word features. The newly introduced word features that reflect the document’s background knowledge offer valuable indications on individual words’ importance in the input document and serve as nice complements to the traditional word features derivable from explicit information of a document. The keyword extraction problem can then be regarded as a classification problem and the Support Vector Machine (SVM) is used to extract the keywords. Experiments have been done using two different data sets. The results show our method improves the performance of keyword extraction.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
USPTO Bulk Downloads. Patent Grant Full Text. The United States: Patent and Trademark Office. http://www.google.com/googlebooks/uspto-patents-grants-text.html
Witten, I.H., Paynter, G.W., Frank, E., Gutwin, C., Nevill-Manning, C.G.: Kea: Practical automatic keyphrase extraction. In: Proceedings of DL (1999)
Mihalcea, R., Tarau, P.: Textrank: Bringing order into texts. In: Proceeding of Conference on Empirical Methods in Natural Language, pp. 404–411 (2004)
Page, L., Brin, S., Motwani, R., et al.: The pagerank citation ranking: Bringing order to the web. Technical report of Stanford Digital Library
Kleinberg, J.:Â Hubs, Authorities, and Communities. Cornell University (1999)
Mihalcea R., Csomai A.: Wikify!: linking documents to encyclopedic knowledge. In: Proceedings of the 16th ACM Conference on Information and Knowledge Management, pp. 233–242 (2007)
Medelyan, O., Frank, E., Witten, I.H.: Human-competitive tagging using automatic keyphrase extraction. In: Proceedings of EMNLP (2009)
Qureshi, M.A., O’Riordan, C., Pasi, G.: Short-text domain specific key terms/phrases extraction using an n-gram model with wikipedia. In: Proceedings of ACM CIKM (2012)
Joorabchi, A., Mahdi, A.E.: Automatic subject metadata generation for scientific documents using wikipedia and genetic algorithms. In: EKAW 2012. LNCS (LNAI), vol. 7603, pp. 32–41. Springer, Heidelberg (2012)
Songhua, X., Yang, S., Lau, F.C.M.: Keyword Extraction and Headline Generation Using Novel Word Features. In: Proceedings of the 24th Conference on Artificial Intelligence, AAAI, Atlanta, Georgia, USA, July 11-15, pp. 1461–1466 (2011)
Jean-Louis, L., Zouaq, A., Gagnon, M., Ensan, F.: An Assessment of Online Semantic Annotators for the Keyword Extraction Task. In: Pham, D.-N., Park, S.-B. (eds.) PRICAI 2014. LNCS, vol. 8862, pp. 548–560. Springer, Heidelberg (2014)
Pedersen, T., Patwardhan, S., Michelizzi, J.: Wordnet: similarity – measuring the relatedness of concepts. In: Proceedings of the 19th National Conference on Artificial Intelligence (AAAI 2004), pp. 1024–1025 (2004)
Miller, G.A.: Wordnet: a Lexical Database for English. Communications of the ACM (1995)
Lucene. http://lucene.apache.org/
LIBSVM – A Library for Support Vector Machines. http://www.csie.ntu.edu.tw/~cjlin/libsvm/
Kim, S.N., Medelyan, O., Kan, M.Y., Baldwin, T.: Semeval-2010 task 5: Automatic keyphrase extraction from scientific articles. In: Proceedings of SemEval (2010)
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Technical report, Ithaca, NY, USA (1987)
El-Beltagy, S.R., Rafea, A.: Kp-miner: A keyphrase extraction system for english and Arabic documents. Information Systems (2009)
Medelyan, O., Frank, E., Witten, I.H.: Human-competitive tagging using automatic keyphrase extraction. In: Proceedings of EMNLP (2009)
Alchemyapi. http://www.alchemyapi.com/api/keyword-extraction/
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Chen, Y., Yin, J., Zhu, W., Qiu, S. (2015). Novel Word Features for Keyword Extraction. In: Dong, X., Yu, X., Li, J., Sun, Y. (eds) Web-Age Information Management. WAIM 2015. Lecture Notes in Computer Science(), vol 9098. Springer, Cham. https://doi.org/10.1007/978-3-319-21042-1_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-21042-1_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-21041-4
Online ISBN: 978-3-319-21042-1
eBook Packages: Computer ScienceComputer Science (R0)