Abstract
The task of keyphrase generation aims to generate the key phrases that capture the primary content of a document. An external domain-specific gazetteer can assist in generating keyphrases that are literally absent in the document (i.e., do not match any contiguous sub-sequence of source text) but relevant to the content of the document. In this paper, we present a technique to integrate knowledge from a gazetteer in order to improve keyphrase generation from research papers. We also present a copy mechanism that helps our model to utilize the gazetteer vocabulary to deal with the out-of-vocabulary words in keyphrases. Since constructing and maintaining relevant high-quality gazetteer by hand is very expensive, we also propose a method for automatic construction of a gazetteer given the input document, by leveraging similar documents in the training corpus. The thus constructed gazetteer helps focus on corpus-level information carried by other similar documents. Although this external information is crucial, it is never considered in previous studies. Experiments on real world datasets of research papers demonstrate that our proposed approach improves the performance of the state-of-the-art keyphrase generation models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: Proceedings of 3rd International Conference on Learning Representations, ICLR 2015 (2015)
Berend, G.: Opinion expression mining by exploiting keyphrase extraction. In: Proceedings of 5th International Joint Conference on Natural Language Processing, pp. 1162–1170 (2011)
Chan, H.P., Chen, W., Wang, L., King, I.: Neural keyphrase generation via reinforcement learning with adaptive rewards. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 2163–2174 (2019)
Chen, J., Zhang, X., Wu, Y., Yan, Z., Li, Z.: Keyphrase generation with correlation constraints. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 4057–4066 (2018)
Chen, W., Gao, Y., Zhang, J., King, I., Lyu, M.R.: Title-guided encoding for keyphrase generation. Proc. AAAI Conf.Artif. Intell. 33, 6268–6275 (2019)
Gollapalli, S.D., Li, X.L., Yang, P.: Incorporating expert knowledge into keyphrase extraction. In: Thirty-first AAAI Conference on Artificial Intelligence (2017)
Hulth, A.: Improved automatic keyword extraction given more linguistic knowledge. In: Proceedings of EMNLP (2003)
Hulth, A., Megyesi, B.B.: A study on automatically extracted keywords in text categorization. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, pp. 537–544 (2006)
Kim, S.N., Medelyan, O., Kan, M.Y., Baldwin, T.: Semeval-2010 task 5: automatic keyphrase extraction from scientific articles. In: Proceedings of the 5th International Workshop on Semantic Evaluation, pp. 21–26 (2010)
Krapivin, M., Autaeu, A., Marchese, M.: Large dataset for keyphrases extraction. University of Trento, Technical report (2009)
Kusner, M., Sun, Y., Kolkin, N., Weinberger, K.: From word embeddings to document distances. In: International Conference on Machine Learning (2015)
Lin, H., Lu, Y., Han, X., Sun, L., Dong, B., Jiang, S.: Gazetteer-enhanced attentive neural networks for named entity recognition. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 6233–6238 (2019)
Medelyan, O., Frank, E., Witten, I.H.: Human-competitive tagging using automatic keyphrase extraction. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pp. 1318–1327 (2009)
Meng, R., Zhao, S., Han, S., He, D., Brusilovsky, P., Chi, Y.: Deep keyphrase generation. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics: Long Papers, vol. 1, pp. 582–592 (2017)
Merrouni, Z.A., Frikh, B., Ouhbi, B.: Automatic keyphrase extraction: a survey and trends. J. Intell. Inf. Syst. 54, 1–34 (2019)
Mihalcea, R., Tarau, P.: Textrank: bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (2004)
Nguyen, T.D., Kan, M.Y.: Keyphrase extraction in scientific publications. In: Goh, D.H.L., Cao, T.H., Sølvberg, I.T., Rasmussen, E. (eds.) Asian Digital Libraries: Looking Back 10 Years and Forging New Frontiers. Lecture Notes in Computer Science, vol. 4822, pp. 317–326. Springer, Berlin, Heidelberg (2007). https://doi.org/10.1007/978-3-540-77094-7_41
Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: Proceedings of EMNLP, pp. 1532–1543 (2014)
Peshterliev, S., Dupuy, C., Kiss, I.: Self-attention gazetteer embeddings for named-entity recognition. arXiv preprint arXiv:2004.04060 (2020)
Qazvinian, V., Radev, D., Özgür, A.: Citation summarization through keyphrase extraction. In: Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010), pp. 895–903 (2010)
Santosh, T.Y.S.S., Sanyal, D.K., Bhowmick, P.K., Das, P.P.: DAKE: document-level attention for keyphrase extraction. In: Jose, J., et al. (eds.) Advances in Information Retrieval (ECIR 2020). Lecture Notes in Computer Science, vol. 12036, pp. 392–401. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45442-5_49
Santosh, T., Sanyal, D.K., Bhowmick, P.K., Das, P.P.: Sasake: syntax and semantics aware keyphrase extraction from research papers. In: Proceedings of the 28th International Conference on Computational Linguistics, pp. 5372–5383 (2020)
Sanyal, D.K., Bhowmick, P.K., Das, P.P., Chattopadhyay, S., Santosh, T.Y.S.S.: Enhancing access to scholarly publications with surrogate resources. Scientometrics 121(2), 1129–1164 (2019)
See, A., Liu, P.J., Manning, C.D.: Get to the point: Summarization with pointer-generator networks. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics: Long Papers, vol. 1, pp. 1073–1083 (2017)
Song, C.H., Lawrie, D., Finin, T., Mayfield, J.: Improving neural named entity recognition with gazetteers. UMBC Faculty Collection (2020)
Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8(3–4), 229–256 (1992)
Yuan, X., Wang, T., Meng, R., Thaker, K., Brusilovsky, P., He, D., Trischler, A.: One size does not fit all: generating and evaluating variable number of keyphrases. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Long Papers, vol. 1 (2020)
Acknowledgements
This work is supported by National Digital Library of India Project sponsored by Ministry of Human Resource Development, Government of India at IIT Kharagpur and Faculty Research Grant, IACS.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Santosh, T.Y.S.S., Sanyal, D.K., Bhowmick, P.K., Das, P.P. (2021). Gazetteer-Guided Keyphrase Generation from Research Papers. In: Karlapalem, K., et al. Advances in Knowledge Discovery and Data Mining. PAKDD 2021. Lecture Notes in Computer Science(), vol 12712. Springer, Cham. https://doi.org/10.1007/978-3-030-75762-5_52
Download citation
DOI: https://doi.org/10.1007/978-3-030-75762-5_52
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-75761-8
Online ISBN: 978-3-030-75762-5
eBook Packages: Computer ScienceComputer Science (R0)