Supervised Learning of Keyphrase Extraction Utilizing Prior Summarization

Liu, Tingyi; Iwaihara, Mizuho

doi:10.1007/978-3-030-91669-5_13

Tingyi Liu¹¹ &
Mizuho Iwaihara¹¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 13133))

Included in the following conference series:

International Conference on Asian Digital Libraries

962 Accesses
2 Citations

Abstract

Keyphrase extraction is the task of selecting a set of phrases that can best represent a given document. Keyphrase extraction is utilized in document indexing and categorization, thus being one of core technologies of digital libraries. Supervised keyphrase extraction based on pretrained language models are advantageous thorough their contextualized text representations. In this paper, we show an adaptation of the pertained language model BERT to keyphrase extraction, called BERT Keyphrase-Rank (BK-Rank), based on a cross-encoder architecture. However, the accuracy of BK-Rank alone is suffering when documents contain a large amount of candidate phrases, especially in long documents. Based on the notion that keyphrases are more likely to occur in representative sentences of the document, we propose a new approach called Keyphrase-Focused BERT Summarization (KFBS), which extracts important sentences as a summary, from which BK-Rank can more easily find keyphrases. Training of KFBS is by distant supervision such that sentences lexically similar to the keyphrase set are chosen as positive samples. Our experimental results show that the combination of KFBS + BK-Rank show superior performance over the compared baseline methods on well-known four benchmark collections, especially on long documents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Arora, S., Liang, Y., Ma, T.: A simple but tough-to-beat baseline for sentence embeddings. In: ICLR (2016)
Google Scholar
Bennani-Smires, K., Musat, C.C., Hossmann, A., et al.: Simple unsupervised keyphrase extraction using sentence embeddings. In: Conference on Computational Natural Language Learning (2018)
Google Scholar
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst. 30(1–7), 107–117 (1998)
Article Google Scholar
Campos, R., Mangaravite, V., Pasquali, A., Jorge, A.M., Nunes, C., Jatowt, A.: YAKE! Collection-independent automatic keyword extractor. In: Pasi, G., Piwowarski, B., Azzopardi, L., Hanbury, A. (eds.) ECIR 2018. LNCS, vol. 10772, pp. 806–810. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-76941-7_80
Chapter Google Scholar
Carbonell, J., Goldstein, J.: The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of 21st Annual International ACM SIGIR Conference on on Research and Development in Information Retrieval, pp. 335–336 (1998)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT 2019, pp. 4171–4186 (2019)
Google Scholar
El-Beltagy, S.R., Rafea, A.: KP-miner: participation in SemEval-2. In: Proceedings of. 5th Int. Workshop on Semantic Evaluation, pp. 190–193 (2010)
Google Scholar
Hulth, A.: Improved automatic keyword extraction given more linguistic knowledge. In: Proceedings of 2003 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 216–223 (2003)
Google Scholar
Humeau, S., Shuster, K., Lachaux, M.A., et al.: Poly-encoders: architectures and pre-training strategies for fast and accurate multi-sentence scoring. In: International Conference on Learning Representations (2019)
Google Scholar
Kim, S.N., Medelyan, O., Kan, M.Y., et al.: Semeval-2010 task 5: automatic keyphrase extraction from scientific articles. In: Proceedings of 5th International Workshop on Semantic Evaluation, pp. 21–26 (2010)
Google Scholar
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, pp. 1188–1196 (2014)
Google Scholar
Liu, Y.: Fine-tune BERT for Extractive Summarization. arXiv preprint arXiv:1903.10318 (2019)
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: International Conference on Learning Representations (2018)
Google Scholar
Mihalcea, R., Tarau, P.: TextRank: bringing order into text. In: Proceedings of 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 404–411 (2004)
Google Scholar
Moratanch, N., Chitrakala, S.: A survey on extractive text summarization. In: 2017 International Conference on Computer, Communication and Signal Processing (ICCCSP). IEEE (2017)
Google Scholar
Nguyen, T.D., Kan, M.-Y.: Keyphrase extraction in scientific publications. In: Goh, D.-L., Cao, T.H., Sølvberg, I.T., Rasmussen, E. (eds.) ICADL 2007. LNCS, vol. 4822, pp. 317–326. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-77094-7_41
Chapter Google Scholar
Pagliardini, M., Gupta, P., Jaggi, M.: Unsupervised learning of sentence embeddings using compositional n-gram features. In: Proceedings of NAACL-HLT, pp. 528–540 (2018)
Google Scholar
Papagiannopoulou, E., Tsoumakas, G.: A review of keyphrase extraction. Wiley Interdisc. Rev. Data Min. Knowl. Discov. 10(2) e1339 (2020)
Google Scholar
Peters, M.E., et al.: Deep contextualized word representations. arXiv preprint arXiv:1802.05365 (2018)
Sun, Y., et al.: SIFRank: a new baseline for unsupervised keyphrase extraction based on pre-trained language model. IEEE Access 8, 10896–10906 (2020)
Article Google Scholar
Toutanova, K., et al.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (2003)
Google Scholar
Wan, X., Xiao, J.: Single document keyphrase extraction using neighborhood knowledge. In: AAAI Conference on Artificial Intelligence (AAAI-08), pp. 855–860 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of Information, Production and Systems, Waseda University, Kitakyushu, 808-0135, Japan
Tingyi Liu & Mizuho Iwaihara

Authors

Tingyi Liu
View author publications
You can also search for this author in PubMed Google Scholar
Mizuho Iwaihara
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mizuho Iwaihara .

Editor information

Editors and Affiliations

National Taiwan Normal University, Taipei, Taiwan
Hao-Ren Ke
Nanyang Technological University, Singapore, Singapore
Chei Sian Lee
Kyoto University, Kyoto, Japan
Kazunari Sugiyama

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, T., Iwaihara, M. (2021). Supervised Learning of Keyphrase Extraction Utilizing Prior Summarization. In: Ke, HR., Lee, C.S., Sugiyama, K. (eds) Towards Open and Trustworthy Digital Societies. ICADL 2021. Lecture Notes in Computer Science(), vol 13133. Springer, Cham. https://doi.org/10.1007/978-3-030-91669-5_13

Download citation

DOI: https://doi.org/10.1007/978-3-030-91669-5_13
Published: 30 November 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91668-8
Online ISBN: 978-3-030-91669-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics