Skip to main content

From Fundamentals to Recent Advances: A Tutorial on Keyphrasification

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13186))

Abstract

Keyphrases represent the most important information of text which often serve as a surrogate for efficiently summarizing text documents. With the advancement of deep neural networks, recent years have witnessed rapid development in automatic identification of keyphrases. The performance of keyphrase extraction methods has been greatly improved by the progresses made in natural language understanding, enable models to predict relevant phrases not mentioned in the text. We name the task of summarizing texts with phrases keyphrasification.

In this half-day tutorial, we provide a comprehensive overview of keyphrasification as well as hands-on practice with popular models and tools. This tutorial covers important topics ranging from basics of the task to the advanced topics and applications. By the end of the tutorial, participants will have a better understanding of 1) classical and state-of-the-art keyphrasification methods, 2) current evaluation practices and their issues, and 3) current trends and future directions in keyphrasification research. Tutorial-related resources are available at https://keyphrasification.github.io/.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://github.com/boudinfl/pke.

References

  1. Alzaidy, R., Caragea, C., Giles, C.L.: Bi-lstm-crf sequence labeling for keyphrase extraction from scholarly documents. In: The World Wide Web Conference, pp. 2551–2557 (2019)

    Google Scholar 

  2. Belém, F., Almeida, J., Gonçalves, M.: Tagging and tag recommendation (September 2019). https://doi.org/10.5772/intechopen.82242

  3. Bennani-Smires, K., Musat, C., Hossmann, A., Baeriswyl, M., Jaggi, M.: Simple unsupervised keyphrase extraction using sentence embeddings. In: Proceedings of the 22nd Conference on Computational Natural Language Learning, pp. 221–229 (2018)

    Google Scholar 

  4. Berend, G.: Opinion expression mining by exploiting keyphrase extraction. In: Proceedings of 5th International Joint Conference on Natural Language Processing, pp. 1162–1170. Asian Federation of Natural Language Processing, Chiang Mai, Thailand (November 2011). https://aclanthology.org/I11-1130

  5. Boudin, F., Gallina, Y., Aizawa, A.: Keyphrase generation for scientific document retrieval. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 1118–1126. Association for Computational Linguistics, Online (July 2020). https://doi.org/10.18653/v1/2020.acl-main.105, https://aclanthology.org/2020.acl-main.105

  6. Bougouin, A., Boudin, F., Daille, B.: TopicRank: graph-based topic ranking for keyphrase extraction. In: Proceedings of the Sixth International Joint Conference on Natural Language Processing, pp. 543–551. Asian Federation of Natural Language Processing, Nagoya, Japan (October 2013). https://aclanthology.org/I13-1062

  7. Bougouin, A., Boudin, F., Daille, B.: Keyphrase annotation with graph co-ranking. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 2945–2955. The COLING 2016 Organizing Committee, Osaka, Japan (December 2016). https://aclanthology.org/C16-1277

  8. Campos, R., Mangaravite, V., Pasquali, A., Jorge, A., Nunes, C., Jatowt, A.: Yake! keyword extraction from single documents using multiple local features. Inf. Sci. 509, 257–289 (2020)

    Article  Google Scholar 

  9. Collins, A., Beel, J.: Document embeddings vs. keyphrases vs. terms for recommender systems: a large-scale online evaluation. In: Proceedings of the 18th Joint Conference on Digital Libraries, pp. 130–133. JCDL 2019, IEEE Press (2019). https://doi.org/10.1109/JCDL.2019.00027

  10. Fagan, J.: Automatic phrase indexing for document retrieval. In: Proceedings of the 10th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 91–101. SIGIR 1987, Association for Computing Machinery, New York, NY, USA (1987). https://doi.org/10.1145/42005.42016

  11. Ferrara, F., Pudota, N., Tasso, C.: A keyphrase-based paper recommender system. In: Agosti, M., Esposito, F., Meghini, C., Orio, N. (eds.) IRCDL 2011. CCIS, vol. 249, pp. 14–25. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-27302-5_2

    Chapter  Google Scholar 

  12. Florescu, C., Caragea, C.: Positionrank: an unsupervised approach to keyphrase extraction from scholarly documents. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1105–1115 (2017)

    Google Scholar 

  13. Gallina, Y., Boudin, F., Daille, B.: KPTimes: A large-scale dataset for keyphrase generation on news documents. In: Proceedings of the 12th International Conference on Natural Language Generation, pp. 130–135. Association for Computational Linguistics, Tokyo, Japan, Oct-Nov 2019. https://doi.org/10.18653/v1/W19-8617, https://aclanthology.org/W19-8617

  14. Han, J., Kim, T., Choi, J.: Web document clustering by using automatic keyphrase extraction. In: Proceedings of the 2007 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Workshops, pp. 56–59. WI-IATW 2007, IEEE Computer Society, USA (2007)

    Google Scholar 

  15. Heymann, P., Ramage, D., Garcia-Molina, H.: Social tag prediction. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 531–538 (2008)

    Google Scholar 

  16. Hulth, A., Megyesi, B.B.: A study on automatically extracted keywords in text categorization. In: Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pp. 537–544. Association for Computational Linguistics, Sydney, Australia (July 2006). https://doi.org/10.3115/1220175.1220243, https://aclanthology.org/P06-1068

  17. Jones, S., Staveley, M.S.: Phrasier: a system for interactive document retrieval using keyphrases. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 160–167. SIGIR 1999, Association for Computing Machinery, New York, NY, USA (1999). https://doi.org/10.1145/312624.312671

  18. Mahata, D., Kuriakose, J., Shah, R., Zimmermann, R.: Key2vec: automatic ranked keyphrase extraction from scientific articles using phrase embeddings. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 2 (Short Papers), pp. 634–639 (2018)

    Google Scholar 

  19. Medelyan, O., Witten, I.H.: Thesaurus based automatic keyphrase indexing. In: Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 296–297. JCDL 2006, Association for Computing Machinery, New York, NY, USA (2006). https://doi.org/10.1145/1141753.1141819

  20. Meng, R., Zhao, S., Han, S., He, D., Brusilovsky, P., Chi, Y.: Deep keyphrase generation. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 582–592. Association for Computational Linguistics, Vancouver, Canada (July 2017). https://doi.org/10.18653/v1/P17-1054, https://aclanthology.org/P17-1054

  21. Mihalcea, R., Tarau, P.: TextRank: bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 404–411. Association for Computational Linguistics, Barcelona, Spain (July 2004). https://aclanthology.org/W04-3252

  22. Mu, F., et al.: Keyphrase extraction with span-based feature representations. arXiv preprint arXiv:2002.05407 (2020)

  23. Park, S., Caragea, C.: Scientific keyphrase identification and classification by pre-trained language models intermediate task transfer learning. In: Proceedings of the 28th International Conference on Computational Linguistics, pp. 5409–5419 (2020)

    Google Scholar 

  24. Sahrawat, D.: Keyphrase extraction as sequence labeling using contextualized embeddings. Adv. Inf. Retr. 12036, 328 (2020)

    Google Scholar 

  25. Song, Y., Zhang, L., Giles, C.L.: Automatic tag recommendation algorithms for social recommender systems. ACM Trans. Web (TWEB) 5(1), 1–31 (2011)

    Article  Google Scholar 

  26. Sun, Z., Tang, J., Du, P., Deng, Z.H., Nie, J.Y.: Divgraphpointer: a graph pointer network for extracting diverse keyphrases. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 755–764. SIGIR 2019, Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3331184.3331219

  27. Wan, X., Yang, J., Xiao, J.: Towards an iterative reinforcement approach for simultaneous document summarization and keyword extraction. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 552–559. Association for Computational Linguistics, Prague, Czech Republic (June 2007). https://aclanthology.org/P07-1070

  28. Witten, I.H., Paynter, G.W., Frank, E., Gutwin, C., Nevill-Manning, C.G.: Kea: practical automatic keyphrase extraction. In: Proceedings of the Fourth ACM Conference on Digital Libraries, p. 254–255. DL 1999, Association for Computing Machinery, New York, NY, USA (1999). https://doi.org/10.1145/313238.313437

  29. Xiong, L., Hu, C., Xiong, C., Campos, D., Overwijk, A.: Open domain web keyphrase extraction beyond language modeling. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 5175–5184 (2019)

    Google Scholar 

  30. Ye, H., Wang, L.: Semi-supervised learning for neural keyphrase generation. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 4142–4153. Association for Computational Linguistics, Brussels, Belgium, Oct-Nov 2018. https://doi.org/10.18653/v1/D18-1447, https://aclanthology.org/D18-1447

  31. Ye, J., Gui, T., Luo, Y., Xu, Y., Zhang, Q.: One2Set: generating diverse keyphrases as a set. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 4598–4608. Association for Computational Linguistics, Online (August 2021). https://doi.org/10.18653/v1/2021.acl-long.354, https://aclanthology.org/2021.acl-long.354

  32. Yuan, X., et al.: One size does not fit all: generating and evaluating variable number of keyphrases. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7961–7975. Association for Computational Linguistics, Online (July 2020). https://doi.org/10.18653/v1/2020.acl-main.710, https://aclanthology.org/2020.acl-main.710

  33. Zha, H.: Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 113–120. SIGIR 2002, Association for Computing Machinery, New York, NY, USA (2002). https://doi.org/10.1145/564376.564398

Download references

Acknowledgments

Florian Boudin is partially supported by the French National Research Agency through the DELICES project (ANR-19-CE38-0005-01). Rui Meng was partially supported by the Amazon Research Awards for the project “Transferable, Controllable, Applicable Keyphrase Generation” and by the University of Pittsburgh Center for Research Computing through the resources provided.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rui Meng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Meng, R., Mahata, D., Boudin, F. (2022). From Fundamentals to Recent Advances: A Tutorial on Keyphrasification. In: Hagen, M., et al. Advances in Information Retrieval. ECIR 2022. Lecture Notes in Computer Science, vol 13186. Springer, Cham. https://doi.org/10.1007/978-3-030-99739-7_73

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-99739-7_73

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-99738-0

  • Online ISBN: 978-3-030-99739-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics