Skip to main content

Keyphrase Extraction Using Knowledge Graphs

  • Conference paper
  • First Online:
Web and Big Data (APWeb-WAIM 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10366))

Abstract

Extracting keyphrases from documents automatically is an important and interesting task since keyphrases provide a quick summarization for documents. Although lots of efforts have been made on keyphrase extraction, most of the existing methods (the co-occurrence based methods and the statistic-based methods) do not take semantics into full consideration. The co-occurrence based methods heavily depend on the co-occurrence relations between two words in the input document, which may ignore many semantic relations. The statistic-based methods exploit the external text corpus to enrich the document, which introduces more unrelated relations inevitably. In this paper, we propose a novel approach to extract keyphrases using knowledge graphs, based on which we could detect the latent relations of two keyterms (i.e., noun words and named entities) without introducing many noises. Extensive experiments over real data show that our method outperforms the state-of-art methods including the graph-based co-occurrence methods and statistic-based clustering methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www-nlpir.nist.gov/projects/duc/past_duc/duc2001/data.html.

References

  1. Wikipedia. http://en.wikipedia.org/

  2. Bavelas, A.: Communication patterns in task-oriented groups. J. Acoust. Soc. Am. 22(6), 725–730 (1950)

    Article  Google Scholar 

  3. Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: DBpedia - a crystallization point for the web of data. J. Web Sem. 7(3), 154–165 (2009)

    Article  Google Scholar 

  4. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  5. Boudin, F.: A comparison of centrality measures for graph-based keyphrase extraction. In: IJCNLP 2013, pp. 834–838 (2013)

    Google Scholar 

  6. Cilibrasi, R., Vitányi, P.M.B.: The Google similarity distance (2004). CoRR, abs/cs/0412098

    Google Scholar 

  7. Freeman, L.C.: A set of measures of centrality based on betweenness. Sociometry 40, 35–41 (1977)

    Article  Google Scholar 

  8. Grineva, M.P., Grinev, M.N., Lizorkin, D.: Extracting key terms from noisy and multitheme documents. In: WWW 2009, pp. 661–670 (2009)

    Google Scholar 

  9. Hammouda, K.M., Matute, D.N., Kamel, M.S.: CorePhrase: keyphrase extraction for document clustering. In: Perner, P., Imiya, A. (eds.) MLDM 2005. LNCS, vol. 3587, pp. 265–274. Springer, Heidelberg (2005). doi:10.1007/11510888_26

    Chapter  Google Scholar 

  10. Haveliwala, T.H.: Topic-sensitive PageRank. In: WWW 2002, pp. 517–526 (2002)

    Google Scholar 

  11. Huang, C., Tian, Y., Zhou, Z., Ling, C.X., Huang, T.: Keyphrase extraction using semantic networks structure analysis. In: ICDM 2006, pp. 275–284 (2006)

    Google Scholar 

  12. Hulth, A.: Improved automatic keyword extraction given more linguistic knowledge. In: EMNLP 2003, pp. 216–223 (2003)

    Google Scholar 

  13. Hulth, A.: Reducing false positives by expert combination in automatic keyword indexing. In: RANLP 2003, pp. 367–376 (2003)

    Google Scholar 

  14. Jeh, G., Widom, J.: SimRank: a measure of structural-context similarity. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, 23–26 July 2002, pp. 538–543 (2002)

    Google Scholar 

  15. Jiang, X., Hu, Y., Li, H.: A ranking approach to keyphrase extraction. In: SIGIR 2009, pp. 756–757 (2009)

    Google Scholar 

  16. Kusner, M.J., Sun, Y., Kolkin, N.I., Weinberger, K.Q.: From word embeddings to document distances. In: ICML 2015, pp. 957–966 (2015)

    Google Scholar 

  17. Liu, Z., Huang, W., Zheng, Y., Sun, M.: Automatic keyphrase extraction via topic decomposition. In: EMNLP, pp. 366–376 (2010)

    Google Scholar 

  18. Liu, Z., Li, P., Zheng, Y., Sun, M.: Clustering to find exemplar terms for keyphrase extraction. In: EMNLP 2009, pp. 257–266 (2009)

    Google Scholar 

  19. Lloyd, S.P.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–136 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  20. Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J.R., Bethard, S., McClosky, D.: The Stanford CoreNLP natural language processing toolkit. In: ACL 2014, pp. 55–60 (2014)

    Google Scholar 

  21. Mihalcea, R., Tarau, P.: TextRank: bringing order into text. In: EMNLP 2004, pp. 404–411 (2004)

    Google Scholar 

  22. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR (2013)

    Google Scholar 

  23. Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the web (1999)

    Google Scholar 

  24. Russell, S.J., Norvig, P.: Artificial Intelligence - A Modern Approach: the Intelligent Agent Book. Prentice Hall Series in Artificial Intelligence. Prentice Hall, Englewood Cliffs (1995)

    MATH  Google Scholar 

  25. Tsatsaronis, G., Varlamis, I., Nørvåg, K.: SemanticRank: ranking keywords and sentences using semantic graphs. In: COLING 2010, pp. 1074–1082 (2010)

    Google Scholar 

  26. Turney, P.D.: Learning to extract keyphrases from text (2002). CoRR, cs.LG/0212013

    Google Scholar 

  27. Wan, X., Xiao, J.: Exploiting neighborhood knowledge for single document summarization and keyphrase extraction. ACM Trans. Inf. Syst. 28(2), 8 (2010)

    Article  Google Scholar 

  28. Wan, X., Yang, J., Xiao, J.: Towards an iterative reinforcement approach for simultaneous document summarization and keyword extraction. In: ACL 2007, vol. 7, pp. 552–559 (2007)

    Google Scholar 

  29. Witten, I.H., Paynter, G.W., Frank, E., Gutwin, C., Nevill-Manning, C.G.: KEA: practical automatic keyphrase extraction. In: Proceedings of the Fourth ACM Conference on Digital Libraries, pp. 254–255 (1999)

    Google Scholar 

  30. Youn, E., Jeong, M.K.: Class dependent feature scaling method using naive Bayes classifier for text datamining. Pattern Recogn. Lett. 30(5), 477–485 (2009)

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported by Research Grant Council of Hong Kong SAR No. 14221716 and The Chinese University of Hong Kong Direct Grant No. 4055048 and NSFC under grant Nos. 61622201, 61532010, 61370055, 61402020 and Ph.D. Programs Foundation of Ministry of Education of China No. 20130001120021.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Weiguo Zheng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Shi, W., Zheng, W., Yu, J.X., Cheng, H., Zou, L. (2017). Keyphrase Extraction Using Knowledge Graphs. In: Chen, L., Jensen, C., Shahabi, C., Yang, X., Lian, X. (eds) Web and Big Data. APWeb-WAIM 2017. Lecture Notes in Computer Science(), vol 10366. Springer, Cham. https://doi.org/10.1007/978-3-319-63579-8_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-63579-8_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-63578-1

  • Online ISBN: 978-3-319-63579-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics