Skip to main content

Novel Word Features for Keyword Extraction

  • Conference paper
  • First Online:
Web-Age Information Management (WAIM 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9098))

Included in the following conference series:

Abstract

Keyword extraction plays an increasingly crucial role in several texts related researches. Applications that utilize feature word selection include text mining, and information retrieval etc. This paper introduces novel word features for keyword extraction. These new word features are derived according to the background knowledge supplied by patent data. Given a document, to acquire its background knowledge, this paper first generates a query for searching the patent data based on the key facts present in the document. The query is used to find files in patent data that are closely related to the contents of the document. With the patent search result file set, the information of patent inventors, assignees, and citations in each file are used to mining the hidden knowledge and relationship between different patent files. Then the related knowledge is imported to extend the background knowledge base, which would be extracted to derive the novel word features. The newly introduced word features that reflect the document’s background knowledge offer valuable indications on individual words’ importance in the input document and serve as nice complements to the traditional word features derivable from explicit information of a document. The keyword extraction problem can then be regarded as a classification problem and the Support Vector Machine (SVM) is used to extract the keywords. Experiments have been done using two different data sets. The results show our method improves the performance of keyword extraction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. USPTO Bulk Downloads. Patent Grant Full Text. The United States: Patent and Trademark Office. http://www.google.com/googlebooks/uspto-patents-grants-text.html

  2. Witten, I.H., Paynter, G.W., Frank, E., Gutwin, C., Nevill-Manning, C.G.: Kea: Practical automatic keyphrase extraction. In: Proceedings of DL (1999)

    Google Scholar 

  3. Mihalcea, R., Tarau, P.: Textrank: Bringing order into texts. In: Proceeding of Conference on Empirical Methods in Natural Language, pp. 404–411 (2004)

    Google Scholar 

  4. Page, L., Brin, S., Motwani, R., et al.: The pagerank citation ranking: Bringing order to the web. Technical report of Stanford Digital Library

    Google Scholar 

  5. Kleinberg, J.: Hubs, Authorities, and Communities. Cornell University (1999)

    Google Scholar 

  6. Mihalcea R., Csomai A.: Wikify!: linking documents to encyclopedic knowledge. In: Proceedings of the 16th ACM Conference on Information and Knowledge Management, pp. 233–242 (2007)

    Google Scholar 

  7. Medelyan, O., Frank, E., Witten, I.H.: Human-competitive tagging using automatic keyphrase extraction. In: Proceedings of EMNLP (2009)

    Google Scholar 

  8. Qureshi, M.A., O’Riordan, C., Pasi, G.: Short-text domain specific key terms/phrases extraction using an n-gram model with wikipedia. In: Proceedings of ACM CIKM (2012)

    Google Scholar 

  9. Joorabchi, A., Mahdi, A.E.: Automatic subject metadata generation for scientific documents using wikipedia and genetic algorithms. In: EKAW 2012. LNCS (LNAI), vol. 7603, pp. 32–41. Springer, Heidelberg (2012)

    Google Scholar 

  10. Songhua, X., Yang, S., Lau, F.C.M.: Keyword Extraction and Headline Generation Using Novel Word Features. In: Proceedings of the 24th Conference on Artificial Intelligence, AAAI, Atlanta, Georgia, USA, July 11-15, pp. 1461–1466 (2011)

    Google Scholar 

  11. Jean-Louis, L., Zouaq, A., Gagnon, M., Ensan, F.: An Assessment of Online Semantic Annotators for the Keyword Extraction Task. In: Pham, D.-N., Park, S.-B. (eds.) PRICAI 2014. LNCS, vol. 8862, pp. 548–560. Springer, Heidelberg (2014)

    Google Scholar 

  12. QTag. http://web.bham.ac.uk/O.Mason/software/tagger/

  13. Pedersen, T., Patwardhan, S., Michelizzi, J.: Wordnet: similarity – measuring the relatedness of concepts. In: Proceedings of the 19th National Conference on Artificial Intelligence (AAAI 2004), pp. 1024–1025 (2004)

    Google Scholar 

  14. Miller, G.A.: Wordnet: a Lexical Database for English. Communications of the ACM (1995)

    Google Scholar 

  15. Lucene. http://lucene.apache.org/

  16. LIBSVM – A Library for Support Vector Machines. http://www.csie.ntu.edu.tw/~cjlin/libsvm/

  17. Kim, S.N., Medelyan, O., Kan, M.Y., Baldwin, T.: Semeval-2010 task 5: Automatic keyphrase extraction from scientific articles. In: Proceedings of SemEval (2010)

    Google Scholar 

  18. Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Technical report, Ithaca, NY, USA (1987)

    Google Scholar 

  19. El-Beltagy, S.R., Rafea, A.: Kp-miner: A keyphrase extraction system for english and Arabic documents. Information Systems (2009)

    Google Scholar 

  20. Medelyan, O., Frank, E., Witten, I.H.: Human-competitive tagging using automatic keyphrase extraction. In: Proceedings of EMNLP (2009)

    Google Scholar 

  21. Alchemyapi. http://www.alchemyapi.com/api/keyword-extraction/

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jian Yin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Chen, Y., Yin, J., Zhu, W., Qiu, S. (2015). Novel Word Features for Keyword Extraction. In: Dong, X., Yu, X., Li, J., Sun, Y. (eds) Web-Age Information Management. WAIM 2015. Lecture Notes in Computer Science(), vol 9098. Springer, Cham. https://doi.org/10.1007/978-3-319-21042-1_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-21042-1_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-21041-4

  • Online ISBN: 978-3-319-21042-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics