Abstract
We present a keyphrase extraction algorithm named TopicLPRank in this paper, which is an improved TopicRank algorithm. Different from the TopicRank which only uses the relative distance information of the text, we think that the length and absolute position of the text candidate keyphrases also have a certain influence on the results of the model for extraction keyphrases. Therefore, the proposed TopicLPRank incorporates these two factors on the basis of the TopicRank. The experimental results show that adding the location information and length information of candidate keyphrases can, respectively, increase the F-Score of the model by around 2.7\(\%\) points and 1.7\(\%\) points, which is equivalent to an increase of 19.6 and 12.3\(\%\) compared with the TopicRank. At the same time, the fusion of the length and location information of the candidate keyphrase can increase the F-Score by around 3.5 percentage points, which is equivalent to an increase of 25.21\(\%\) compared with the TopicRank in the dataset NUS.









Similar content being viewed by others
Data availability
All data included in this study are available upon request by contact with the corresponding author.
References
Dung NT, Min-Yen K (2007) Keyphrase Extraction in Scientific Publications. In: International Conference on Asian Digital Libraries (ICADL), pp 317–326
Martinez-Romo J, Araujo L, Fernandez AD (2016) Semgraph: extracting keyphrases following a novel semantic graph-based approach. J Am Soc Inf Sci 67(1):71–82
Li J (2021) A comparative study of keyword extraction algorithms for English texts. J Intell Syst 30(1):808–815
Saef Ullah Miah M, Junaida S, Bin ST, Zamli Kamal Z, Rajan J (2021) Study of keyword extraction techniques for electric double layer capacitor domain using text similarity indexes: an experimental analysis. arXiv:2111.07068
Rossi F, Caloffi A, Colovic A, Russo M (2022) New business models for public innovation intermediaries supporting emerging innovation systems: the case of the Internet of Things. Technol Forecast Soc Chang 175:121357
Krutarth K, Cornelia C, Wu J, Lee Giles C (2020) Keyphrase Extraction in Scholarly Digital Library Search Engines. In: International Conference on Web Services (ICWA), pp 179–196
Xinyun W, Hongyun N (2020) TF-IDF Keyword Extraction Method Combining Context and Semantic Classification. In: international Conference on Computer Information and Big Data Applications (CIBDA), pp 344–347
Adrien B, Florian B, Béatrice D (2013) TopicRank: Graph-Based Topic Ranking for Keyphrase Extraction. In: International Joint Conference on Natural Language Processing (IJCNLP), pp 543–551
Rada M and Paul T (2004) TextRank: Bringing Order into Texts. In: Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 1–9
Elsheh M (2020) An investigation of keywords extraction from textual documents using Word2Vec and decision tree. Int J Comput Sci Inform Secur 18(5):13–18
Eibe F, Paynter Gordon W, Witten Ian H, Gutwin Carl, Nevill-Manning Craig G (1999). Domain-Specic Keyphrase Extraction. In: Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (IJCAI), pp 668-673
Kuo Z, Hui X, Jie T, Zi LJ (2006) Keyword Extraction Using Support Vector Machine. In: Proceedings of the 7th International Conference on Advances in Web-Age Information Management (WAIM), pp 85-96
Wen-tau Y, Joshua G, Vitor C (2006) Finding Advertising Keywords on Web Pages. In: Proceedings of the 15th International Conference on World Wide Web (WWW), pp 213–222
Kumar BS, Sathya BK (2017) Automatic keyword extraction for text summarization: a survey. arXiv:1704.03242
Ding T, Yang W, Wei F, Ding C, Kang P, Wenxiu B (2022) Chinese keyword extraction model with distributed computing. Comput Electr Eng 97:107639
Xiaojun W, Jianguo X (2008) Single Document Keyphrase Extraction Using Neighborhood Knowledge. In: Proceedings of the 23rd National Conference on Artificial Intelligence (AAAI), pp 855–860
Campos R, Mangaravite V, Pasquali A, Jorge A, Nunes C, Jatowt A (2020) YAKE! Keyword extraction from single documents using multiple local features. Inf Sci 509:257–289
Rossi RG, Marcacini RM, Rezende SO (2014) Analysis of domain independent statistical keyword extraction methods for incremental clustering. Learn Nonlinear Models 12(1):17–37
Tian X (2013) Study on keyword extraction using word position weighted TextRank. New Technol Libr Inform Serv 29(9):30–34
Hung SM, Herbert G, Arthur C, William B, Steve L (2014) Unsupervised training of an HMM-based self-organizing unit recognizer with applications to topic classification and keyword discovery. Comput Speech Lang 28(1):210–223
Kr S, Biswas MB, Shreya J (2018) A graph based keyword extraction model using collective node weight. Expert Syst Appl 97:51–59
Corina F, Cornelia C (2017) PositionRank: an unsupervised approach to Keyphrase extraction from scholarly documents. In: Proceedings of the annual meeting of the association for computational linguistics (ACL), pp 1105–1115
Arroyo-Fernandez I, Mendez-Cruz C-F, Sierra G, Torres-Moreno J-M, Sidorov G (2019) Unsupervised sentence representations as word information series: revisiting TF-IDF. Comput Speech Lang 56:107–129
Zhiyuan L, Wenyi H, Yabin Z, Maosong S (2010) Automatic Keyphrase Extraction via Topic Decomposition. In: Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (EMNLP), pp 366–376
Xiong A, Liu D, Tian H, Liu Z, Peng Yu, Kadoch M (2021) News keyword extraction algorithm based on semantic clustering and word graph model. Tsinghua Sci Technol 26(6):886–893
Lawrence P, Sergey B, Rajeev M, Terry W (1999) The PageRank citation ranking : bringing order to the web. Available online: https://www.bibsonomy.org/bibtex/2eb5a6b6671b4dd97e6921da016f85993/albinzehe
Tomas M, Kai C, Greg C, Jeffrey D (2013) Efficient estimation of word representations in vector space. arXiv:1301.3781
Armand J, Edouard G, Piotr B, Tomas M (2017) Bag of Tricks for Efficient Text Classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL), pp 427–431
Matthew P, Mark N, Mohit I, Matt G, Christopher C, Kenton L, Luke Z (2018) Deep contextualized word representations. In: North American Association for Computational Linguistics (NAACL), pp 1–16
Tuohang L, Liang H, Hongtu L, Chengyu S, Shuai L, Ling C (2021) TripleRank: an unsupervised keyphrase extraction algorithm. Knowl-Based Syst 219:106864. https://doi.org/10.1016/j.knosys.2021.106846
Muskan G, Mukesh K (2022) KEST: a graph-based keyphrase extraction technique for tweets summarization using Markov Decision Process. Knowl-Based Syst 209:118110. https://doi.org/10.1016/j.eswa.2022.118110
Bodlaj Jernej and Batagelj Vladimir (2015) Hierarchical link clustering algorithm in networks. Phys Rev E Stat Nonlinear Soft Matter Phys 91(6):062814
Park Y, Bader JS (2011) Resolving the structure of interactomes with hierarchical agglomerative clustering. BMC Bioinform 12(Suppl 1):S44. https://doi.org/10.1186/1471-2105-12-S1-S44
Sergey B, Lawrence P (1998) The anatomy of a large-scale hypertextual Web search engine. Comput Netw ISDN Syst 30:107–117
Anette H (2003) Improved Automatic Keyword Extraction Given More Linguistic Knowledge. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 216–223
Kim SN, Medelyan O, Kan MY, Baldwin T (2013) Automatic Keyphrase extraction from scientific articles. Lang Resour Eval 47(3):723–742
Kristina T, Dan K, Christoper M, Yoram S (2003) Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (NAACL), pp 173–180
Xiaolei L, Tommy C (2021) Duration modeling with semi-Markov conditional random fields for Keyphrase extraction. IEEE Trans Knowl Data Eng 33(4):1453–1466
Liu R, Lin Z, Wang W (2021) Addressing extraction and generation separately: Keyphrase prediction with pre-trained language models. IEEE/ACM Trans Audio Speech Langu Process 29:3180–3191. https://doi.org/10.1109/TASLP.2021.3120587
Linhan Z, Qian C, Wen W, Chong D, Shiliang Z, Bing L, Wei W, Xin C (2022) MDERank-a masked document embedding rank approach for unsupervised Keyphrase extraction, Annual Meeting of the Association for Computational Linguistics (ACL), pp 396–409
Acknowledgments
This work was support by National Natural Science Foundation of China (grant number: 62077023 and 61937001), and National Key R &D Program of China titled with the Large-Scale Longitudinal and Cross-Sectional Study of Student Development (grant number: 2021YFC3340800).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liao, S., Yang, Z., Liao, Q. et al. TopicLPRank: a keyphrase extraction method based on improved TopicRank. J Supercomput 79, 9073–9092 (2023). https://doi.org/10.1007/s11227-022-05022-0
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-022-05022-0