Improving TextRank Algorithm for Automatic Keyword Extraction with Tolerance Rough Set

Qiu, Dong; Zheng, Qin

doi:10.1007/s40815-021-01190-y

Improving TextRank Algorithm for Automatic Keyword Extraction with Tolerance Rough Set

Published: 21 November 2021

Volume 24, pages 1332–1342, (2022)
Cite this article

International Journal of Fuzzy Systems Aims and scope Submit manuscript

Dong Qiu^1,2 &
Qin Zheng¹

403 Accesses
5 Citations
Explore all metrics

Abstract

Aiming at the shortcomings of the TextRank method (TM) which only considers the co-occurrence between words and the incipient word importance when extracting keywords, this paper proposes a tolerance rough set (TRS)-based unsupervised keyword extraction method. Generally, how to score the words in a document has a significant influence on the word graph modeling. In this paper, we improve TM in two aspects with TRS theory that is used to mine vocabulary, semantics, grammar and other information in the corpus. First, the degree of words belonging to each document is calculated to form a fuzzy membership matrix, which helps to characterize the incipient word importance. Second, the fuzzy membership of words to each word tolerance class is calculated to form a semantic correlation matrix, which contributes to optimize the transition probability of all graph edges. We apply the proposed methods to the clustering tasks of two datasets, outperforming the strong baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Document keyword extraction based on semantic hierarchical graph model

Article 30 March 2023

TextRank Keyword Extraction Method Based on Multi-feature Fusion

Improved Automatic Keyword Extraction Given More Semantic Knowledge

Notes

References

Jaffry, S.W., Nasar, Z., Malik, M.K.: Textual keyword extraction and summarization: state of the art. Inf. Process. Manage. 56, 1–31 (2019)
Article Google Scholar
Wang, Q., Ren, J.: Abstractive summarization with keyword and generated word attention. In: International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary, pp. 1–8 (2014)
Li, T., Chen, Y.: Web page clustering based on searching keywords. In: International Conference on Intelligent Computation Technology and Automation(ICICTA), pp. 1163–1166 (2010)
Gu, Y., Shen, J.: Short text classification based on keywords extension. In: Chinese Automation Congress (CAC), pp. 2616–2621 (2019)
Mala, V., Lobiyal, D.K.: Semantic and keyword–based web techniques in information retrieval. In: International Conference on Computing, Communication and Automation (ICCCA), Greater Noida, India, pp. 23–26 (2016)
Mihalcea, R., Tarau, P.: Textrank: Bringing order into text. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 404–411 (2004)
Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the web. Stanford Digital Libraries Working Paper. (1998)
Skowron, A., Stepaniuk, J.: Tolerance approximation spaces. Fundam. Inf. 27(23), 245–253 (1996)
MathSciNet MATH Google Scholar
Qiu, D., Jiang, H., Yan, R.: Tolerance rough set-based bag-of-words model for document representation. Int. J. Comput. Intell. Syst. 1(13), 1–8 (2020)
Google Scholar
Yan, R., Qiu, D., Jiang, H.: Sentence similarity calculation based on probabilistic tolerance rough sets. Math. Probl. Eng. 1, 1–9 (2021)
Google Scholar
Witten, I.H., Paynter, G.W., Frank, E., Gutwin, C., Nevill–Manning, C.G.: KEA: Practical automatic keyphrase extraction. In: Proceedings of the Fourth ACM Conference on Digital Libraries (ACM), pp. 254–255 (1999)
Wu, Y.F.B., Li, Q., S, R., Chen, X.: Domain-specific keyphrase extraction. In: Proceedings of the 14th ACM Conference on Digital Libraries (ACM), Bremen, Germany, pp. 283–284 (2005)
Turney, P.D.: Learning to extract keyphrases from text. 1–45 (1999)
Wang, J., Song, F., Walia, K., Farber, J., Dara, R.: Using convolutional neural networks to extract keywords and keyphrases: a case study for foodborne illnesses. In: 18th IEEE International Conference on Machine Learning and Applications (ICMLA), Boca Raton, FL, USA, pp. 1398–1403 (2019)
Hung, B.T.: Vietnamese keyword extraction using hybrid deep learning methods. In: 2018 5th NAFOSTED Conference on Information and Computer Science (NICS), Ho Chi Minh City, Vietnam, pp. 412–417 (2018)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Bougouin, A., Boudin, F., Daille, B.: TopicRank: Graph-based topic ranking for keyphrase extraction. In: International Joint Conference on Natural Language Processing (IJCNLP), Nagoya, Japan, pp. 543–551 (2013)
Zhou, Q., Fang, Y., Shang, Z., Zhong, W.: Keyword extraction method for complex nodes based on textrank algorithm. In: International Conference on Computer Engineering and Application (ICCEA), Guangzhou, China, pp. 359–363 (2020)
Huang, Z.X., Xie, Z.P.: A patent keywords extraction method using TextRank model with prior public knowledge. Complex Intell. Syst. (2021). https://doi.org/10.1007/s40747-021-00343-8
Article Google Scholar
Xiong, A., Liu, D.R., Tian, H.K., Liu, Z.Y., Yu, P., Kadoch, M.: News keyword extraction algorithm based on semantic clustering and word graph model. Tsinghua Sci. Technol. 26, 886–893 (2021)
Article Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding, pp. 1–16 (2018)
Tsatsaronis, G., Varlamis, I.: SemanticRank: Ranking keywords and sentences using semantic graphs. In: Proceedings of the 23rd International Conference on Computational Linguistics (ICCL), Beijing, China, pp. 1074–1082 (2010)
J, Martinez-Romo., L, Araujo, A.D, Fernandez: SemGraph: extracting keyphrases following a novel semantic graph-based approach. J. Assoc. Inf. Sci. Technol. 1(67), 71–82 (2016)
Google Scholar
Duari, S., Bhatnagar, V.: SCAKE: semantic connectivity aware keyword extraction. Inf. Sci. 10, 100–117 (2018)
Google Scholar
Bellaachia, A., Al-Dhelaan, M.: NE-rank: A novel graph-based keyphrase extraction in Twitter. In: IEEE/WIC/ACM International Conferences on Web Intelligence & Intelligent Agent Technology, pp. 372–379 (2012)
Florescu, C., Caragea, C.: PositionRank: An unsupervised approach to keyphrase extraction from scholarly documents. In: Proceedings of The 55th Annual Meeting of the Association for Computational Linguistics, pp. 1105–1115 (2017)
Biswas, S.K., Bordoloi, M., Shreya, J.: A graph-based keyword extraction model using collective node weight. Expert Syst. Appl. 12(97), 51–59 (2017)
Google Scholar
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manage. 24(5), 513–523 (1988)
Article Google Scholar
Pawlak, Z.: Rough sets. Int. J. Comput. Inf. Sci. 11, 341–356 (1982)
Article Google Scholar
Kawasaki, S., Binh, N., Tu, B.: Hierarchical document clustering based on tolerance rough set model. In: European Conference on Principles of Data Mining and Knowledge Discovery (PDMKD), Springer, Berlin, Heidelberg, pp. 458–463 (2000)
Mao, X., Huang, S., Li, R., Shen, L.: Automatic keywords extraction based on co-occurrence and semantic relationships between words. IEEE Access. 8, 117528–117538 (2020)
Article Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundations of China (Grant No. 12171065 and 11671001).

Author information

Authors and Affiliations

School of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Nanan, Chongqing, 400065, China
Dong Qiu & Qin Zheng
College of Science, Chongqing University of Posts and Telecommunications, Nanan, Chongqing, 400065, China
Dong Qiu

Authors

Dong Qiu
View author publications
You can also search for this author in PubMed Google Scholar
Qin Zheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dong Qiu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Qiu, D., Zheng, Q. Improving TextRank Algorithm for Automatic Keyword Extraction with Tolerance Rough Set. Int. J. Fuzzy Syst. 24, 1332–1342 (2022). https://doi.org/10.1007/s40815-021-01190-y

Download citation

Received: 25 April 2021
Revised: 04 August 2021
Accepted: 15 September 2021
Published: 21 November 2021
Issue Date: April 2022
DOI: https://doi.org/10.1007/s40815-021-01190-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving TextRank Algorithm for Automatic Keyword Extraction with Tolerance Rough Set

Abstract

Access this article

Similar content being viewed by others

Document keyword extraction based on semantic hierarchical graph model

TextRank Keyword Extraction Method Based on Multi-feature Fusion

Improved Automatic Keyword Extraction Given More Semantic Knowledge

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improving TextRank Algorithm for Automatic Keyword Extraction with Tolerance Rough Set

Abstract

Access this article

Similar content being viewed by others

Document keyword extraction based on semantic hierarchical graph model

TextRank Keyword Extraction Method Based on Multi-feature Fusion

Improved Automatic Keyword Extraction Given More Semantic Knowledge

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation