Abstract
Keyphrases represent the most important information of text which often serve as a surrogate for efficiently summarizing text documents. With the advancement of deep neural networks, recent years have witnessed rapid development in automatic identification of keyphrases. The performance of keyphrase extraction methods has been greatly improved by the progresses made in natural language understanding, enable models to predict relevant phrases not mentioned in the text. We name the task of summarizing texts with phrases keyphrasification.
In this half-day tutorial, we provide a comprehensive overview of keyphrasification as well as hands-on practice with popular models and tools. This tutorial covers important topics ranging from basics of the task to the advanced topics and applications. By the end of the tutorial, participants will have a better understanding of 1) classical and state-of-the-art keyphrasification methods, 2) current evaluation practices and their issues, and 3) current trends and future directions in keyphrasification research. Tutorial-related resources are available at https://keyphrasification.github.io/.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
References
Alzaidy, R., Caragea, C., Giles, C.L.: Bi-lstm-crf sequence labeling for keyphrase extraction from scholarly documents. In: The World Wide Web Conference, pp. 2551–2557 (2019)
Belém, F., Almeida, J., Gonçalves, M.: Tagging and tag recommendation (September 2019). https://doi.org/10.5772/intechopen.82242
Bennani-Smires, K., Musat, C., Hossmann, A., Baeriswyl, M., Jaggi, M.: Simple unsupervised keyphrase extraction using sentence embeddings. In: Proceedings of the 22nd Conference on Computational Natural Language Learning, pp. 221–229 (2018)
Berend, G.: Opinion expression mining by exploiting keyphrase extraction. In: Proceedings of 5th International Joint Conference on Natural Language Processing, pp. 1162–1170. Asian Federation of Natural Language Processing, Chiang Mai, Thailand (November 2011). https://aclanthology.org/I11-1130
Boudin, F., Gallina, Y., Aizawa, A.: Keyphrase generation for scientific document retrieval. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 1118–1126. Association for Computational Linguistics, Online (July 2020). https://doi.org/10.18653/v1/2020.acl-main.105, https://aclanthology.org/2020.acl-main.105
Bougouin, A., Boudin, F., Daille, B.: TopicRank: graph-based topic ranking for keyphrase extraction. In: Proceedings of the Sixth International Joint Conference on Natural Language Processing, pp. 543–551. Asian Federation of Natural Language Processing, Nagoya, Japan (October 2013). https://aclanthology.org/I13-1062
Bougouin, A., Boudin, F., Daille, B.: Keyphrase annotation with graph co-ranking. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 2945–2955. The COLING 2016 Organizing Committee, Osaka, Japan (December 2016). https://aclanthology.org/C16-1277
Campos, R., Mangaravite, V., Pasquali, A., Jorge, A., Nunes, C., Jatowt, A.: Yake! keyword extraction from single documents using multiple local features. Inf. Sci. 509, 257–289 (2020)
Collins, A., Beel, J.: Document embeddings vs. keyphrases vs. terms for recommender systems: a large-scale online evaluation. In: Proceedings of the 18th Joint Conference on Digital Libraries, pp. 130–133. JCDL 2019, IEEE Press (2019). https://doi.org/10.1109/JCDL.2019.00027
Fagan, J.: Automatic phrase indexing for document retrieval. In: Proceedings of the 10th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 91–101. SIGIR 1987, Association for Computing Machinery, New York, NY, USA (1987). https://doi.org/10.1145/42005.42016
Ferrara, F., Pudota, N., Tasso, C.: A keyphrase-based paper recommender system. In: Agosti, M., Esposito, F., Meghini, C., Orio, N. (eds.) IRCDL 2011. CCIS, vol. 249, pp. 14–25. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-27302-5_2
Florescu, C., Caragea, C.: Positionrank: an unsupervised approach to keyphrase extraction from scholarly documents. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1105–1115 (2017)
Gallina, Y., Boudin, F., Daille, B.: KPTimes: A large-scale dataset for keyphrase generation on news documents. In: Proceedings of the 12th International Conference on Natural Language Generation, pp. 130–135. Association for Computational Linguistics, Tokyo, Japan, Oct-Nov 2019. https://doi.org/10.18653/v1/W19-8617, https://aclanthology.org/W19-8617
Han, J., Kim, T., Choi, J.: Web document clustering by using automatic keyphrase extraction. In: Proceedings of the 2007 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Workshops, pp. 56–59. WI-IATW 2007, IEEE Computer Society, USA (2007)
Heymann, P., Ramage, D., Garcia-Molina, H.: Social tag prediction. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 531–538 (2008)
Hulth, A., Megyesi, B.B.: A study on automatically extracted keywords in text categorization. In: Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pp. 537–544. Association for Computational Linguistics, Sydney, Australia (July 2006). https://doi.org/10.3115/1220175.1220243, https://aclanthology.org/P06-1068
Jones, S., Staveley, M.S.: Phrasier: a system for interactive document retrieval using keyphrases. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 160–167. SIGIR 1999, Association for Computing Machinery, New York, NY, USA (1999). https://doi.org/10.1145/312624.312671
Mahata, D., Kuriakose, J., Shah, R., Zimmermann, R.: Key2vec: automatic ranked keyphrase extraction from scientific articles using phrase embeddings. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 2 (Short Papers), pp. 634–639 (2018)
Medelyan, O., Witten, I.H.: Thesaurus based automatic keyphrase indexing. In: Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 296–297. JCDL 2006, Association for Computing Machinery, New York, NY, USA (2006). https://doi.org/10.1145/1141753.1141819
Meng, R., Zhao, S., Han, S., He, D., Brusilovsky, P., Chi, Y.: Deep keyphrase generation. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 582–592. Association for Computational Linguistics, Vancouver, Canada (July 2017). https://doi.org/10.18653/v1/P17-1054, https://aclanthology.org/P17-1054
Mihalcea, R., Tarau, P.: TextRank: bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 404–411. Association for Computational Linguistics, Barcelona, Spain (July 2004). https://aclanthology.org/W04-3252
Mu, F., et al.: Keyphrase extraction with span-based feature representations. arXiv preprint arXiv:2002.05407 (2020)
Park, S., Caragea, C.: Scientific keyphrase identification and classification by pre-trained language models intermediate task transfer learning. In: Proceedings of the 28th International Conference on Computational Linguistics, pp. 5409–5419 (2020)
Sahrawat, D.: Keyphrase extraction as sequence labeling using contextualized embeddings. Adv. Inf. Retr. 12036, 328 (2020)
Song, Y., Zhang, L., Giles, C.L.: Automatic tag recommendation algorithms for social recommender systems. ACM Trans. Web (TWEB) 5(1), 1–31 (2011)
Sun, Z., Tang, J., Du, P., Deng, Z.H., Nie, J.Y.: Divgraphpointer: a graph pointer network for extracting diverse keyphrases. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 755–764. SIGIR 2019, Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3331184.3331219
Wan, X., Yang, J., Xiao, J.: Towards an iterative reinforcement approach for simultaneous document summarization and keyword extraction. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 552–559. Association for Computational Linguistics, Prague, Czech Republic (June 2007). https://aclanthology.org/P07-1070
Witten, I.H., Paynter, G.W., Frank, E., Gutwin, C., Nevill-Manning, C.G.: Kea: practical automatic keyphrase extraction. In: Proceedings of the Fourth ACM Conference on Digital Libraries, p. 254–255. DL 1999, Association for Computing Machinery, New York, NY, USA (1999). https://doi.org/10.1145/313238.313437
Xiong, L., Hu, C., Xiong, C., Campos, D., Overwijk, A.: Open domain web keyphrase extraction beyond language modeling. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 5175–5184 (2019)
Ye, H., Wang, L.: Semi-supervised learning for neural keyphrase generation. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 4142–4153. Association for Computational Linguistics, Brussels, Belgium, Oct-Nov 2018. https://doi.org/10.18653/v1/D18-1447, https://aclanthology.org/D18-1447
Ye, J., Gui, T., Luo, Y., Xu, Y., Zhang, Q.: One2Set: generating diverse keyphrases as a set. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 4598–4608. Association for Computational Linguistics, Online (August 2021). https://doi.org/10.18653/v1/2021.acl-long.354, https://aclanthology.org/2021.acl-long.354
Yuan, X., et al.: One size does not fit all: generating and evaluating variable number of keyphrases. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7961–7975. Association for Computational Linguistics, Online (July 2020). https://doi.org/10.18653/v1/2020.acl-main.710, https://aclanthology.org/2020.acl-main.710
Zha, H.: Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 113–120. SIGIR 2002, Association for Computing Machinery, New York, NY, USA (2002). https://doi.org/10.1145/564376.564398
Acknowledgments
Florian Boudin is partially supported by the French National Research Agency through the DELICES project (ANR-19-CE38-0005-01). Rui Meng was partially supported by the Amazon Research Awards for the project “Transferable, Controllable, Applicable Keyphrase Generation” and by the University of Pittsburgh Center for Research Computing through the resources provided.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Meng, R., Mahata, D., Boudin, F. (2022). From Fundamentals to Recent Advances: A Tutorial on Keyphrasification. In: Hagen, M., et al. Advances in Information Retrieval. ECIR 2022. Lecture Notes in Computer Science, vol 13186. Springer, Cham. https://doi.org/10.1007/978-3-030-99739-7_73
Download citation
DOI: https://doi.org/10.1007/978-3-030-99739-7_73
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-99738-0
Online ISBN: 978-3-030-99739-7
eBook Packages: Computer ScienceComputer Science (R0)