Using Word Embeddings to Enhance Keyword Identification for Scientific Publications

Wang, Rui; Liu, Wei; McDonald, Chris

doi:10.1007/978-3-319-19548-3_21

Rui Wang¹⁶,
Wei Liu¹⁶ &
Chris McDonald¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9093))

Included in the following conference series:

Australasian Database Conference

1905 Accesses
20 Citations

Abstract

Automatic keyword identification is a desirable but difficult task. It requires considerations of not only the extraction of important words or phrases from a text, but also the generation of abstractive ones that do not appear in the text. In this paper, we propose an approach that uses word embedding vectors as an external knowledge base for both keyword extraction and generation. Our evaluation shows that our approach outperforms many baseline algorithms, and is comparable to the state-of-the-art algorithm on our chosen dataset. In addition, we also introduce a new approach for evaluating the task of keyword extraction, that overcomes a common problem of overly strict matching criteria. We show that using word embedding vectors is a simpler, yet effective, method for both keyword extraction and generation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Jones, K.S.: A statistical interpretation of term specificity and its application in retrieval. Journal of documentation 28(1), 11–21 (1972)
Article Google Scholar
Mihalcea, R., Tarau, P.: TextRank: Bringing Order into Texts. In: Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain (2004)
Google Scholar
Wan, X., Xiao, J.: Single document keyphrase extraction using neighborhood knowledge. AAAI 8, 855–860 (2008)
Google Scholar
Matsuo, Y., Ishizuka, M.: Keyword extraction from a single document using word co-occurrence statistical information. International Journal on Artificial Intelligence Tools 13(01), 157–169 (2004)
Article Google Scholar
Hasan, K.S., Ng, V.: Automatic keyphrase extraction: A survey of the state of the art. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1262–1273 (2014)
Google Scholar
Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, pp. 604–613. ACM (1998)
Google Scholar
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. The Journal of Machine Learning Research 12, 2493–2537 (2011)
MATH Google Scholar
Mnih, A., Hinton, G.: Three new graphical models for statistical language modelling. In: Proceedings of the 24th International Conference on Machine Learning, pp. 641–648. ACM (2007)
Google Scholar
Turian, J., Ratinov, L., Bengio, Y.: Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, pp. 384–394 (2010)
Google Scholar
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Computer networks and ISDN systems 30(1), 107–117 (1998)
Article Google Scholar
Liu, Z., Li, P., Zheng, Y., Sun, M.: Clustering to find exemplar terms for keyphrase extraction. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1-Volume 1, Association for Computational Linguistics, pp. 257–266 (2009)
Google Scholar
Grineva, M., Grinev, M., Lizorkin, D.: Extracting key terms from noisy and multitheme documents. In: Proceedings of the 18th International Conference on World Wide Web, pp. 661–670. ACM (2009)
Google Scholar
Wang, J., Liu, J., Wang, C.: Keyword extraction based on pagerank. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 857–864. Springer, Heidelberg (2007)
Chapter Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Google Scholar
Liu, Z., Chen, X., Zheng, Y., Sun, M.: Automatic keyphrase extraction by bridging vocabulary gap. In: Proceedings of the Fifteenth Conference on Computational Natural Language Learning, Association for Computational Linguistics, pp. 135–144 (2011)
Google Scholar
Brown, P.F., Pietra, V.J.D., Pietra, S.A.D., Mercer, R.L.: The mathematics of statistical machine translation: Parameter estimation. Computational linguistics 19(2), 263–311 (1993)
Google Scholar
Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1, Association for Computational Linguistics, pp. 423–430 (2003)
Google Scholar
Grolmusz, V.: A note on the pagerank of undirected graphs (2012). arXiv preprint arXiv:1205.1960
Xing, W., Ghorbani, A.: Weighted pagerank algorithm. In: 2004 Proceedings of the Second Annual Conference on Communication Networks and Services Research, pp. 305–314. IEEE (2004)
Google Scholar
Rose, S., Engel, D., Cramer, N., Cowley, W.: Automatic keyword extraction from individual documents. Text Mining (2010)
Google Scholar
Liu, Z., Huang, W., Zheng, Y., Sun, M.: Automatic keyphrase extraction via topic decomposition. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, pp. 366–376 (2010)
Google Scholar
Hasan, K.S., Ng, V.: Conundrums in unsupervised keyphrase extraction: making sense of the state-of-the-art. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, Association for Computational Linguistics, pp. 365–373 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Software Engineering, The University of Western Australia, 35 Stirling Highway, Crawley, WA, 6009, Australia
Rui Wang, Wei Liu & Chris McDonald

Authors

Rui Wang
View author publications
You can also search for this author in PubMed Google Scholar
Wei Liu
View author publications
You can also search for this author in PubMed Google Scholar
Chris McDonald
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei Liu .

Editor information

Editors and Affiliations

University of Queensland, Brisbane, Queensland, Australia
Mohamed A. Sharaf
Monash University, Clayton, Australia
Muhammad Aamir Cheema
The University of Melbourne, Melbourne, Australia
Jianzhong Qi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, R., Liu, W., McDonald, C. (2015). Using Word Embeddings to Enhance Keyword Identification for Scientific Publications. In: Sharaf, M., Cheema, M., Qi, J. (eds) Databases Theory and Applications. ADC 2015. Lecture Notes in Computer Science(), vol 9093. Springer, Cham. https://doi.org/10.1007/978-3-319-19548-3_21

Download citation

DOI: https://doi.org/10.1007/978-3-319-19548-3_21
Published: 28 May 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19547-6
Online ISBN: 978-3-319-19548-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics