From Fundamentals to Recent Advances: A Tutorial on Keyphrasification

Meng, Rui; Mahata, Debanjan; Boudin, Florian

doi:10.1007/978-3-030-99739-7_73

Rui Meng¹⁵,
Debanjan Mahata¹⁶ &
Florian Boudin¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13186))

Included in the following conference series:

European Conference on Information Retrieval

2824 Accesses

Abstract

Keyphrases represent the most important information of text which often serve as a surrogate for efficiently summarizing text documents. With the advancement of deep neural networks, recent years have witnessed rapid development in automatic identification of keyphrases. The performance of keyphrase extraction methods has been greatly improved by the progresses made in natural language understanding, enable models to predict relevant phrases not mentioned in the text. We name the task of summarizing texts with phrases keyphrasification.

In this half-day tutorial, we provide a comprehensive overview of keyphrasification as well as hands-on practice with popular models and tools. This tutorial covers important topics ranging from basics of the task to the advanced topics and applications. By the end of the tutorial, participants will have a better understanding of 1) classical and state-of-the-art keyphrasification methods, 2) current evaluation practices and their issues, and 3) current trends and future directions in keyphrasification research. Tutorial-related resources are available at https://keyphrasification.github.io/.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Automatic keyphrase extraction: a survey and trends

Article 02 May 2019

MultPAX: Keyphrase Extraction Using Language Models and Knowledge Graphs

A Comparative Assessment of State-Of-The-Art Methods for Multilingual Unsupervised Keyphrase Extraction

Notes

1.
https://github.com/boudinfl/pke.

References

Alzaidy, R., Caragea, C., Giles, C.L.: Bi-lstm-crf sequence labeling for keyphrase extraction from scholarly documents. In: The World Wide Web Conference, pp. 2551–2557 (2019)
Google Scholar
Belém, F., Almeida, J., Gonçalves, M.: Tagging and tag recommendation (September 2019). https://doi.org/10.5772/intechopen.82242
Bennani-Smires, K., Musat, C., Hossmann, A., Baeriswyl, M., Jaggi, M.: Simple unsupervised keyphrase extraction using sentence embeddings. In: Proceedings of the 22nd Conference on Computational Natural Language Learning, pp. 221–229 (2018)
Google Scholar
Berend, G.: Opinion expression mining by exploiting keyphrase extraction. In: Proceedings of 5th International Joint Conference on Natural Language Processing, pp. 1162–1170. Asian Federation of Natural Language Processing, Chiang Mai, Thailand (November 2011). https://aclanthology.org/I11-1130
Boudin, F., Gallina, Y., Aizawa, A.: Keyphrase generation for scientific document retrieval. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 1118–1126. Association for Computational Linguistics, Online (July 2020). https://doi.org/10.18653/v1/2020.acl-main.105, https://aclanthology.org/2020.acl-main.105
Bougouin, A., Boudin, F., Daille, B.: TopicRank: graph-based topic ranking for keyphrase extraction. In: Proceedings of the Sixth International Joint Conference on Natural Language Processing, pp. 543–551. Asian Federation of Natural Language Processing, Nagoya, Japan (October 2013). https://aclanthology.org/I13-1062
Bougouin, A., Boudin, F., Daille, B.: Keyphrase annotation with graph co-ranking. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 2945–2955. The COLING 2016 Organizing Committee, Osaka, Japan (December 2016). https://aclanthology.org/C16-1277
Campos, R., Mangaravite, V., Pasquali, A., Jorge, A., Nunes, C., Jatowt, A.: Yake! keyword extraction from single documents using multiple local features. Inf. Sci. 509, 257–289 (2020)
Article Google Scholar
Collins, A., Beel, J.: Document embeddings vs. keyphrases vs. terms for recommender systems: a large-scale online evaluation. In: Proceedings of the 18th Joint Conference on Digital Libraries, pp. 130–133. JCDL 2019, IEEE Press (2019). https://doi.org/10.1109/JCDL.2019.00027
Fagan, J.: Automatic phrase indexing for document retrieval. In: Proceedings of the 10th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 91–101. SIGIR 1987, Association for Computing Machinery, New York, NY, USA (1987). https://doi.org/10.1145/42005.42016
Ferrara, F., Pudota, N., Tasso, C.: A keyphrase-based paper recommender system. In: Agosti, M., Esposito, F., Meghini, C., Orio, N. (eds.) IRCDL 2011. CCIS, vol. 249, pp. 14–25. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-27302-5_2
Chapter Google Scholar
Florescu, C., Caragea, C.: Positionrank: an unsupervised approach to keyphrase extraction from scholarly documents. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1105–1115 (2017)
Google Scholar
Gallina, Y., Boudin, F., Daille, B.: KPTimes: A large-scale dataset for keyphrase generation on news documents. In: Proceedings of the 12th International Conference on Natural Language Generation, pp. 130–135. Association for Computational Linguistics, Tokyo, Japan, Oct-Nov 2019. https://doi.org/10.18653/v1/W19-8617, https://aclanthology.org/W19-8617
Han, J., Kim, T., Choi, J.: Web document clustering by using automatic keyphrase extraction. In: Proceedings of the 2007 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Workshops, pp. 56–59. WI-IATW 2007, IEEE Computer Society, USA (2007)
Google Scholar
Heymann, P., Ramage, D., Garcia-Molina, H.: Social tag prediction. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 531–538 (2008)
Google Scholar
Hulth, A., Megyesi, B.B.: A study on automatically extracted keywords in text categorization. In: Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pp. 537–544. Association for Computational Linguistics, Sydney, Australia (July 2006). https://doi.org/10.3115/1220175.1220243, https://aclanthology.org/P06-1068
Jones, S., Staveley, M.S.: Phrasier: a system for interactive document retrieval using keyphrases. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 160–167. SIGIR 1999, Association for Computing Machinery, New York, NY, USA (1999). https://doi.org/10.1145/312624.312671
Mahata, D., Kuriakose, J., Shah, R., Zimmermann, R.: Key2vec: automatic ranked keyphrase extraction from scientific articles using phrase embeddings. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 2 (Short Papers), pp. 634–639 (2018)
Google Scholar
Medelyan, O., Witten, I.H.: Thesaurus based automatic keyphrase indexing. In: Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 296–297. JCDL 2006, Association for Computing Machinery, New York, NY, USA (2006). https://doi.org/10.1145/1141753.1141819
Meng, R., Zhao, S., Han, S., He, D., Brusilovsky, P., Chi, Y.: Deep keyphrase generation. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 582–592. Association for Computational Linguistics, Vancouver, Canada (July 2017). https://doi.org/10.18653/v1/P17-1054, https://aclanthology.org/P17-1054
Mihalcea, R., Tarau, P.: TextRank: bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 404–411. Association for Computational Linguistics, Barcelona, Spain (July 2004). https://aclanthology.org/W04-3252
Mu, F., et al.: Keyphrase extraction with span-based feature representations. arXiv preprint arXiv:2002.05407 (2020)
Park, S., Caragea, C.: Scientific keyphrase identification and classification by pre-trained language models intermediate task transfer learning. In: Proceedings of the 28th International Conference on Computational Linguistics, pp. 5409–5419 (2020)
Google Scholar
Sahrawat, D.: Keyphrase extraction as sequence labeling using contextualized embeddings. Adv. Inf. Retr. 12036, 328 (2020)
Google Scholar
Song, Y., Zhang, L., Giles, C.L.: Automatic tag recommendation algorithms for social recommender systems. ACM Trans. Web (TWEB) 5(1), 1–31 (2011)
Article Google Scholar
Sun, Z., Tang, J., Du, P., Deng, Z.H., Nie, J.Y.: Divgraphpointer: a graph pointer network for extracting diverse keyphrases. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 755–764. SIGIR 2019, Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3331184.3331219
Wan, X., Yang, J., Xiao, J.: Towards an iterative reinforcement approach for simultaneous document summarization and keyword extraction. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 552–559. Association for Computational Linguistics, Prague, Czech Republic (June 2007). https://aclanthology.org/P07-1070
Witten, I.H., Paynter, G.W., Frank, E., Gutwin, C., Nevill-Manning, C.G.: Kea: practical automatic keyphrase extraction. In: Proceedings of the Fourth ACM Conference on Digital Libraries, p. 254–255. DL 1999, Association for Computing Machinery, New York, NY, USA (1999). https://doi.org/10.1145/313238.313437
Xiong, L., Hu, C., Xiong, C., Campos, D., Overwijk, A.: Open domain web keyphrase extraction beyond language modeling. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 5175–5184 (2019)
Google Scholar
Ye, H., Wang, L.: Semi-supervised learning for neural keyphrase generation. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 4142–4153. Association for Computational Linguistics, Brussels, Belgium, Oct-Nov 2018. https://doi.org/10.18653/v1/D18-1447, https://aclanthology.org/D18-1447
Ye, J., Gui, T., Luo, Y., Xu, Y., Zhang, Q.: One2Set: generating diverse keyphrases as a set. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 4598–4608. Association for Computational Linguistics, Online (August 2021). https://doi.org/10.18653/v1/2021.acl-long.354, https://aclanthology.org/2021.acl-long.354
Yuan, X., et al.: One size does not fit all: generating and evaluating variable number of keyphrases. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7961–7975. Association for Computational Linguistics, Online (July 2020). https://doi.org/10.18653/v1/2020.acl-main.710, https://aclanthology.org/2020.acl-main.710
Zha, H.: Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 113–120. SIGIR 2002, Association for Computing Machinery, New York, NY, USA (2002). https://doi.org/10.1145/564376.564398

Download references

Acknowledgments

Florian Boudin is partially supported by the French National Research Agency through the DELICES project (ANR-19-CE38-0005-01). Rui Meng was partially supported by the Amazon Research Awards for the project “Transferable, Controllable, Applicable Keyphrase Generation” and by the University of Pittsburgh Center for Research Computing through the resources provided.

Author information

Authors and Affiliations

Salesforce Research, Palo Alto, USA
Rui Meng
Moody’s Analytics, New York, USA
Debanjan Mahata
LS2N, Nantes Université, Nantes, France
Florian Boudin

Authors

Rui Meng
View author publications
You can also search for this author in PubMed Google Scholar
Debanjan Mahata
View author publications
You can also search for this author in PubMed Google Scholar
Florian Boudin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rui Meng .

Editor information

Editors and Affiliations

Martin Luther University Halle-Wittenberg, Halle, Germany
Matthias Hagen
Leiden University, Leiden, The Netherlands
Suzan Verberne
University of Glasgow, Glasgow, UK
Craig Macdonald
University of Duisburg-Essen, Essen, Germany
Christin Seifert
University of Stavanger, Stavanger, Norway
Krisztian Balog
Norwegian University of Science and Technology, Trondheim, Norway
Kjetil Nørvåg
University of Stavanger, Stavanger, Norway
Vinay Setty

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Meng, R., Mahata, D., Boudin, F. (2022). From Fundamentals to Recent Advances: A Tutorial on Keyphrasification. In: Hagen, M., et al. Advances in Information Retrieval. ECIR 2022. Lecture Notes in Computer Science, vol 13186. Springer, Cham. https://doi.org/10.1007/978-3-030-99739-7_73

Download citation

DOI: https://doi.org/10.1007/978-3-030-99739-7_73
Published: 05 April 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-99738-0
Online ISBN: 978-3-030-99739-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

From Fundamentals to Recent Advances: A Tutorial on Keyphrasification