WEKE: Learning Word Embeddings for Keyphrase Extraction

Zhang, Yuxiang; Liu, Huan; Shi, Bei; Li, Xiaoli; Wang, Suge

doi:10.1007/978-3-030-60290-1_19

Yuxiang Zhang¹³,
Huan Liu¹³,
Bei Shi¹⁴,
Xiaoli Li¹⁵ &
…
Suge Wang¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12318))

Included in the following conference series:

Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data

1180 Accesses
1 Citations

Abstract

Traditional supervised keyphrase extraction models depend on the features of labeled keyphrases while prevailing unsupervised models mainly rely on global structure of the word graph, with nodes representing candidate words and edges/links capturing the co-occurrence between words. However, the local context information of the word graph can not be exploited in existing unsupervised graph-based keyphrase extraction methods and integrating different types of information into a unified model is relatively unexplored. In this paper, we propose a new word embedding model specially for keyphrase extraction task, which can capture local context information and incorporate them with other types of crucial information into the low-dimensional word vector to help better extract keyphrases. Experimental results show that our method consistently outperforms 7 state-of-the-art unsupervised methods on three real datasets in Computer Science area for keyphrase extraction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O’Reilly Media, Sebastopol (2009)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Florescu, C., Caragea, C.: Positionrank: an unsupervised approach to keyphrase extraction from scholarly documents. In: Proceedings of ACL, pp. 1105–1115 (2017)
Google Scholar
Gollapalli, S.D., Caragea, C.: Extracting keyphrases from research papers using citation networks. In: Proceedings of AAAI, pp. 1629–1635 (2014)
Google Scholar
Gollapalli, S.D., Li, X., Yang, P.: Incorporating expert knowledge into keyphrase extraction. In: Proceedings of AAAI, pp. 3180–3187 (2017)
Google Scholar
Lee, D.R.: Measures of the amount of ecologic association between species. Ecology 26(3), 297–302 (1945)
Article Google Scholar
Liu, Y., Liu, Z., Chua, T.S., Sun, M.: Topical word embeddings. In: Proceedings of AAAI, pp. 2418–2424 (2015)
Google Scholar
Liu, Z., Huang, W., Zheng, Y., Sun, M.: Automatic keyphrase extraction via topic decomposition. In: Proceedings of EMNLP, pp. 366–376 (2010)
Google Scholar
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)
Google Scholar
Meng, R., Zhao, S., Han, S., He, D., Brusilovsky, P., Chi, Y.: Deep keyphrase generation. In: Proceedings of ACL, pp. 582–592 (2017)
Google Scholar
Mihalcea, R., Tarau, P.: Textrank: bringing order into text. In: Proceedings of EMNLP, pp. 404–411 (2004)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of NIPS, pp. 3111–3119. MIT Press (2013)
Google Scholar
Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web. Technical report, Stanford InfoLab (1999)
Google Scholar
Piotr, B., Edouard, G., Armand, J., Tomas, M.: Enriching word vectors with subword information. TACL 5, 135–146 (2017)
Article Google Scholar
Porter, M.F.: An algorithm for suffix stripping. Program Electron. Libr. Inf. Syst. 40(3), 211–218 (2006)
Google Scholar
Shi, B., Lam, W., Jameel, S.: Jointly learning word embeddings and latent topics. In: Proceedings of SIGIR, pp. 375–384 (2017)
Google Scholar
Sterckx, L., Demeester, T., Deleu, J.: Topical word importance for fast keyphrase extraction. In: Proceedings of WWW, pp. 121–122 (2015)
Google Scholar
Tang, J., Qu, M., Mei, Q.: Pte: predictive text embedding through large-scale heterogeneous text networks. In: Proceedings of SIGKDD, pp. 1165–1174 (2015)
Google Scholar
Tang, Y., Huang, W., Liu, Q., Zhang, B.: Qalink: enriching text documents with relevant Q&A site contents. In: Proceedings of CIKM, pp. 3159–3168 (2017)
Google Scholar
Teneva, N., Cheng, W.: Salience rank: efficient keyphrase extraction with topic modeling. In: Proceedings of ACL, pp. 530–535 (2017)
Google Scholar
Wan, X., Xiao, J.: Single document keyphrase extraction using neighborhood knowledge. In: Proceedings of AAAI, pp. 855–860 (2008)
Google Scholar
Wang, F., Wang, Z., Wang, S., Li, Z.: Exploiting description knowledge for keyphrase extraction. In: Pham, D.-N., Park, S.-B. (eds.) PRICAI 2014. LNCS (LNAI), vol. 8862, pp. 130–142. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-13560-1_11
Chapter Google Scholar
Wang, R., Liu, W., McDonald, C.: Corpus-independent generic keyphrase extraction using word embedding vectors. In: Proceedings of DL-WSDM, pp. 39–46 (2015)
Google Scholar
Zhang, W., Feng, W., Wang, J.: Integrating semantic relatedness and words’ intrinsic features for keyword extraction. In: Proceedings of IJCAI, pp. 2225–2231 (2013)
Google Scholar
Zhang, Y., Chang, Y., Liu, X., Gollapalli, S.D., Li, X., Xiao, C.: Mike: keyphrase extraction by integrating multidimensional information. In: Proceedings of CIKM, pp. 1349–1358 (2017)
Google Scholar
Zhang, Z., Gao, J., Ciravegna, F.: Semre-rank: improving automatic term extraction by incorporating semantic relatedness with personalised pagerank. ACM Trans. Knowl. Discov. Data (TKDD) 12(5), 57:1–57:41 (2018)
Google Scholar

Download references

Acknowledgements

This work was partially supported by grants from the National Natural Science Foundation of China (Nos. U1933114, 61573231) and Open Project Foundation of Intelligent Information Processing Key Laboratory of Shanxi Province (No. CICIP2018004).

Author information

Authors and Affiliations

School of Computer Science and Technology, Civil Aviation University of China, Tianjin, China
Yuxiang Zhang & Huan Liu
Tencent AI Lab, Shenzhen, China
Bei Shi
Institute for Infocomm Research, A*STAR, Singapore, Singapore
Xiaoli Li
School of Computer and Information Technology, Shanxi University, Taiyuan, China
Suge Wang

Authors

Yuxiang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Huan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Bei Shi
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoli Li
View author publications
You can also search for this author in PubMed Google Scholar
Suge Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuxiang Zhang .

Editor information

Editors and Affiliations

Tianjin University, Tianjin, China
Xin Wang
University of Melbourne, Melbourn, NSW, Australia
Rui Zhang
Kyung Hee University, Yongin, Korea (Democratic People's Republic of)
Young-Koo Lee
Nanjing University of Information Science and Technology, Nanjing, China
Le Sun
Kangwon National University, Chunchon, Korea (Republic of)
Yang-Sae Moon

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, Y., Liu, H., Shi, B., Li, X., Wang, S. (2020). WEKE: Learning Word Embeddings for Keyphrase Extraction. In: Wang, X., Zhang, R., Lee, YK., Sun, L., Moon, YS. (eds) Web and Big Data. APWeb-WAIM 2020. Lecture Notes in Computer Science(), vol 12318. Springer, Cham. https://doi.org/10.1007/978-3-030-60290-1_19

Download citation

DOI: https://doi.org/10.1007/978-3-030-60290-1_19
Published: 14 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60289-5
Online ISBN: 978-3-030-60290-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics