Keyphrase Extraction Using Knowledge Graphs

Shi, Wei; Zheng, Weiguo; Yu, Jeffrey Xu; Cheng, Hong; Zou, Lei

doi:10.1007/978-3-319-63579-8_11

Wei Shi¹⁸,
Weiguo Zheng¹⁸,
Jeffrey Xu Yu¹⁸,
Hong Cheng¹⁸ &
…
Lei Zou¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10366))

Included in the following conference series:

Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint Conference on Web and Big Data

1964 Accesses
6 Citations

Abstract

Extracting keyphrases from documents automatically is an important and interesting task since keyphrases provide a quick summarization for documents. Although lots of efforts have been made on keyphrase extraction, most of the existing methods (the co-occurrence based methods and the statistic-based methods) do not take semantics into full consideration. The co-occurrence based methods heavily depend on the co-occurrence relations between two words in the input document, which may ignore many semantic relations. The statistic-based methods exploit the external text corpus to enrich the document, which introduces more unrelated relations inevitably. In this paper, we propose a novel approach to extract keyphrases using knowledge graphs, based on which we could detect the latent relations of two keyterms (i.e., noun words and named entities) without introducing many noises. Extensive experiments over real data show that our method outperforms the state-of-art methods including the graph-based co-occurrence methods and statistic-based clustering methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://www-nlpir.nist.gov/projects/duc/past_duc/duc2001/data.html.

References

Wikipedia. http://en.wikipedia.org/
Bavelas, A.: Communication patterns in task-oriented groups. J. Acoust. Soc. Am. 22(6), 725–730 (1950)
Article Google Scholar
Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: DBpedia - a crystallization point for the web of data. J. Web Sem. 7(3), 154–165 (2009)
Article Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Boudin, F.: A comparison of centrality measures for graph-based keyphrase extraction. In: IJCNLP 2013, pp. 834–838 (2013)
Google Scholar
Cilibrasi, R., Vitányi, P.M.B.: The Google similarity distance (2004). CoRR, abs/cs/0412098
Google Scholar
Freeman, L.C.: A set of measures of centrality based on betweenness. Sociometry 40, 35–41 (1977)
Article Google Scholar
Grineva, M.P., Grinev, M.N., Lizorkin, D.: Extracting key terms from noisy and multitheme documents. In: WWW 2009, pp. 661–670 (2009)
Google Scholar
Hammouda, K.M., Matute, D.N., Kamel, M.S.: CorePhrase: keyphrase extraction for document clustering. In: Perner, P., Imiya, A. (eds.) MLDM 2005. LNCS, vol. 3587, pp. 265–274. Springer, Heidelberg (2005). doi:10.1007/11510888_26
Chapter Google Scholar
Haveliwala, T.H.: Topic-sensitive PageRank. In: WWW 2002, pp. 517–526 (2002)
Google Scholar
Huang, C., Tian, Y., Zhou, Z., Ling, C.X., Huang, T.: Keyphrase extraction using semantic networks structure analysis. In: ICDM 2006, pp. 275–284 (2006)
Google Scholar
Hulth, A.: Improved automatic keyword extraction given more linguistic knowledge. In: EMNLP 2003, pp. 216–223 (2003)
Google Scholar
Hulth, A.: Reducing false positives by expert combination in automatic keyword indexing. In: RANLP 2003, pp. 367–376 (2003)
Google Scholar
Jeh, G., Widom, J.: SimRank: a measure of structural-context similarity. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, 23–26 July 2002, pp. 538–543 (2002)
Google Scholar
Jiang, X., Hu, Y., Li, H.: A ranking approach to keyphrase extraction. In: SIGIR 2009, pp. 756–757 (2009)
Google Scholar
Kusner, M.J., Sun, Y., Kolkin, N.I., Weinberger, K.Q.: From word embeddings to document distances. In: ICML 2015, pp. 957–966 (2015)
Google Scholar
Liu, Z., Huang, W., Zheng, Y., Sun, M.: Automatic keyphrase extraction via topic decomposition. In: EMNLP, pp. 366–376 (2010)
Google Scholar
Liu, Z., Li, P., Zheng, Y., Sun, M.: Clustering to find exemplar terms for keyphrase extraction. In: EMNLP 2009, pp. 257–266 (2009)
Google Scholar
Lloyd, S.P.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–136 (1982)
Article MathSciNet MATH Google Scholar
Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J.R., Bethard, S., McClosky, D.: The Stanford CoreNLP natural language processing toolkit. In: ACL 2014, pp. 55–60 (2014)
Google Scholar
Mihalcea, R., Tarau, P.: TextRank: bringing order into text. In: EMNLP 2004, pp. 404–411 (2004)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR (2013)
Google Scholar
Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the web (1999)
Google Scholar
Russell, S.J., Norvig, P.: Artificial Intelligence - A Modern Approach: the Intelligent Agent Book. Prentice Hall Series in Artificial Intelligence. Prentice Hall, Englewood Cliffs (1995)
MATH Google Scholar
Tsatsaronis, G., Varlamis, I., Nørvåg, K.: SemanticRank: ranking keywords and sentences using semantic graphs. In: COLING 2010, pp. 1074–1082 (2010)
Google Scholar
Turney, P.D.: Learning to extract keyphrases from text (2002). CoRR, cs.LG/0212013
Google Scholar
Wan, X., Xiao, J.: Exploiting neighborhood knowledge for single document summarization and keyphrase extraction. ACM Trans. Inf. Syst. 28(2), 8 (2010)
Article Google Scholar
Wan, X., Yang, J., Xiao, J.: Towards an iterative reinforcement approach for simultaneous document summarization and keyword extraction. In: ACL 2007, vol. 7, pp. 552–559 (2007)
Google Scholar
Witten, I.H., Paynter, G.W., Frank, E., Gutwin, C., Nevill-Manning, C.G.: KEA: practical automatic keyphrase extraction. In: Proceedings of the Fourth ACM Conference on Digital Libraries, pp. 254–255 (1999)
Google Scholar
Youn, E., Jeong, M.K.: Class dependent feature scaling method using naive Bayes classifier for text datamining. Pattern Recogn. Lett. 30(5), 477–485 (2009)
Article Google Scholar

Download references

Acknowledgements

This work is supported by Research Grant Council of Hong Kong SAR No. 14221716 and The Chinese University of Hong Kong Direct Grant No. 4055048 and NSFC under grant Nos. 61622201, 61532010, 61370055, 61402020 and Ph.D. Programs Foundation of Ministry of Education of China No. 20130001120021.

Author information

Authors and Affiliations

The Chinese University of Hong Kong, Sha Tin, Hong Kong
Wei Shi, Weiguo Zheng, Jeffrey Xu Yu & Hong Cheng
Peking University, Beijing, China
Lei Zou

Authors

Wei Shi
View author publications
You can also search for this author in PubMed Google Scholar
Weiguo Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Jeffrey Xu Yu
View author publications
You can also search for this author in PubMed Google Scholar
Hong Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Lei Zou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Weiguo Zheng .

Editor information

Editors and Affiliations

Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, China
Lei Chen
Computer Science, Aarhus University, Aarhus N, Denmark
Christian S. Jensen
Computer Science, University of Southern California, Los Angeles, California, USA
Cyrus Shahabi
Northeastern University, Shenyang, China
Xiaochun Yang
Kent State University, Kent, Ohio, USA
Xiang Lian

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shi, W., Zheng, W., Yu, J.X., Cheng, H., Zou, L. (2017). Keyphrase Extraction Using Knowledge Graphs. In: Chen, L., Jensen, C., Shahabi, C., Yang, X., Lian, X. (eds) Web and Big Data. APWeb-WAIM 2017. Lecture Notes in Computer Science(), vol 10366. Springer, Cham. https://doi.org/10.1007/978-3-319-63579-8_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-63579-8_11
Published: 03 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63578-1
Online ISBN: 978-3-319-63579-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics