Document keyword extraction based on semantic hierarchical graph model

Zhang, Tingting; Lee, Baozhen; Zhu, Qinghua; Han, Xi; Chen, Ke

doi:10.1007/s11192-023-04677-7

Document keyword extraction based on semantic hierarchical graph model

Published: 30 March 2023

Volume 128, pages 2623–2647, (2023)
Cite this article

Scientometrics Aims and scope Submit manuscript

Tingting Zhang¹,
Baozhen Lee ORCID: orcid.org/0000-0002-6160-1390¹,
Qinghua Zhu²,
Xi Han³ &
…
Ke Chen¹

589 Accesses
1 Citation
Explore all metrics

Abstract

Keyword provide a brief profile of document contents and serve as an important method for quickly obtaining the document’s themes. Traditional keyword extraction methods are mostly based on statistical relationships between words, with no deeper understanding of the words’ structures. In addition, most studies to date performing keyword extraction are based on ranking-related measure values, without considering the cohesion of the extracted keyword set. In this paper, a keyword extraction method based on a semantic hierarchical graph model is proposed. First, the semantic graph for the document is constructed based on the hierarchical extraction of feature terms. Then, the keyword collection of the document is chosen from the constructed semantic graph. The keyword extraction method in this paper fully accounts for both the context of the keywords and the internal structure by which they are related. By mining the deep hidden structure of feature terms, the proposed method can effectively reveal the hierarchical association between terms within the semantic graph and obtain a keyword collection result with high probability. Moreover, several experiments conducted on released datasets show that our method outperforms the existing methods in terms of precision, recall, and F-measure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Recent automatic text summarization techniques: a survey

Article 29 March 2016

A comprehensive and analytical review of text clustering techniques

Article 08 April 2024

Automatic Text Summarization Methods: A Comprehensive Review

Article 28 October 2022

Notes

References

Abilhoa, W. D., & De Castro, L. N. (2014). A keyword extraction method from twitter messages represented as graphs. Applied Mathematics and Computation, 240, 308–325.
Article Google Scholar
Alqaryouti, O., Khwileh, H., Farouk, T., Nabhan, A., & Shaalan, K. (2018). Graph-based keyword extraction. In Intelligent Natural Language Processing: Trends and Applications (pp.159–172). Springer.
Aizawa, A. (2003). An information-theoretic perspective of tf–idf measures. Information Processing & Management, 39(1), 45–65.
Article MATH Google Scholar
Beliga, S., Kitanović, O., Stanković, R., & Martinčić-Ipšić, S. Keyword Extraction from Parallel Abstracts of Scientific Publications.( 2017). In International KEYSTONE Conference on Semantic Keyword-Based Search on Structured Data Sources, pp. 44–55.
Beliga, S., Meštrović, A., & Martinčić-Ipšić, S. (2015). An overview of graph-based keyword extraction methods and approaches. Journal of Information and Organizational Sciences, 39(1), 1–20.
Google Scholar
Biswas, S. K., Bordoloi, M., & Shreya, J. (2018). A graph based keyword extraction model using collective node weight. Expert Systems with Applications, 97, 51–59.
Article Google Scholar
Blanco, R., & Lioma, C. (2012). Graph-based term weighting for information retrieval. Information Retrieval, 15(1), 54–92.
Article Google Scholar
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research., 3, 993–1022.
MATH Google Scholar
Boudin, F. (2018). Unsupervised keyphrase extraction with multipartite graphs. arXiv Preprint arXiv:1803.08721. https://doi.org/10.48550/arXiv.1803.08721
Article Google Scholar
Bougouin A, Boudin F, Daille B. Topicrank: Graph-based topic ranking for keyphrase extraction. In International joint conference on natural language processing pp. 543–551.
Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems, 30(1–7), 107–117.
Article Google Scholar
Campos, R., Mangaravite, V., Pasquali, A., Jorge, A. M., Nunes, C., & Jatowt, A. (2018). A Text Feature Based Automatic Keyword Extraction Method for Single Documents. In European Conference on Information Retrieval, 684–691.
Chidambaram, S., & Srinivasagan, K. (2016). Optimization approach for feature selection and classification with support vector machine. Computational Intelligence in Data Mining, 1, 103–111.
Google Scholar
Duari, S., & Bhatnagar, V. (2019). sCAKE: Semantic connectivity aware keyword extraction. Information Sciences, 477, 100–117.
Article Google Scholar
El-Beltagy, S. R., & Rafea, A. (2009). KP-Miner: A keyphrase extraction system for English and Arabic documents. Information Systems, 34(1), 132–144.
Article Google Scholar
Figueroa, G., Chen, P.-C., & Chen, Y.-S. (2018). RankUp: Enhancing graph-based keyphrase extraction methods with error-feedback propagation. Computer Speech & Language, 47, 112–131.
Article Google Scholar
Garg, M., & Kumar, M. (2018). The structure of word co-occurrence network for microblogs. Physica a: Statistical Mechanics and Its Applications, 512, 698–720.
Article Google Scholar
Gopan E , Rajesh S , Gr V , et al. (2020). Comparative Study on Different Approaches in Keyword Extraction. 2020 Fourth International Conference on Computing Methodologies and Communication (ICCMC). 2020: pp. 70–74.
Hashemzahde, B., & Abdolrazzagh-Nezhad, M. (2020). Improving keyword extraction in multilingual texts. International Journal of Electrical and Computer Engineering, 10(6), 5909.
Google Scholar
Hulth A, Megyesi B. (2006). A study on automatically extracted keywords in text categorization. Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics. pp. 537–544.
Jose, L. M., & Rahamathulla, K. (2016). A semantic graph based approach on interest extraction from user generated texts in social media. In Data Mining and Advanced Computing (SAPIENCE), International Conference on, 101–104.
Kumar, M., & Rehan, P. (2021). Graph node rank based important keyword detection from Twitter. Applied Computing and Informatics, 17(2), 194–209.
Article Google Scholar
Litvak, M., Last, M., Aizenman, H., Gobits, I., & Kandel, A. (2011). DegExt—A language-independent graph-based keyphrase extractor. In Advances in Intelligent Web Mastering–3. 121–130.
Liu, Z., Li, P., Zheng, Y., & Sun, M. (2009). Clustering to find exemplar terms for keyphrase extraction. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 (pp. 257–266).
Liu, Z., Huang, W., Zheng, Y.,et al. (2010). Automatic keyphrase extraction via topic decomposition. In Proceedings of the 2010 conference on empirical methods in natural language processing,(pp. 366–376).
Mahata, D., Kuriakose, J., Shah, R., & Zimmermann, R. (2018, June). Key2vec: Automatic ranked keyphrase extraction from scientific articles using phrase embeddings. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). 634–639.
Mihalcea, R., & Tarau, P. (2004). Textrank: Bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing (pp. 401–411).
Mothe, J., Ramiandrisoa, F., & Rasolomanana, M. (2018). Automatic keyphrase extraction using graph-based methods. In Proceedings of the 33rd Annual ACM Symposium on Applied Computing (pp. 728–730). https://doi.org/10.1145/3167132.3167392
Naidu, R., Bharti, S. K., Babu, K. S., & Mohapatra, R. K. (2018). Text summarization with automatic keyword extraction in Telugu e-newspapers. Smart Computing and Informatics, 1, 555–564.
Article Google Scholar
Nasar, Z., Jaffry, S. W., & Malik, M. K. (2018). Information extraction from scientific articles: A survey. Scientometrics, 117(3), 1931–1990.
Article Google Scholar
Nguyen, Thuy Dung, & Min-Yen Kan.(2007) "Keyphrase extraction in scientific publications." International conference on Asian digital libraries. Springer, Berlin, Heidelberg: pp. 317–326.
Onan, A., Korukoğlu, S., & Bulut, H. (2016). Ensemble of keyword extraction methods and classifiers in text classification. Expert Systems with Applications, 57, 232–247.
Article Google Scholar
Papagiannopoulou, E., & Tsoumakas, G. (2019). A review of keyphrase extraction. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. https://doi.org/10.1002/widm.1339
Article Google Scholar
Pu, X., Jin, R., Wu, G., Han, D., & Xue, G.-R. (2015).Topic modeling in semantic space with keywords. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 1141–1150.
Pujara, J., Miao, H., Getoor, L., & Cohen, W. (2013). Knowledge graph identification. In International Semantic Web Conference, pp. 542–557.
Qian, Y., Santus, E., Jin, Z., Guo, J., & Barzilay, R. (2018). GraphIE: A graph-based framework for information extraction. arXiv Preprint arXiv:1810.13083. https://doi.org/10.48550/arXiv.1810.13083
Article Google Scholar
Rafiei-Asl, J., & Nickabadi, A. (2017). TSAKE: A topical and structural automatic keyphrase extractor. Applied Soft Computing, 58, 620–630.
Article Google Scholar
Ravinuthala, M. K. V., & Ch, S. R. (2016). Thematic text graph: A text representation technique for keyword weighting in extractive summarization system. International Journal of Information Engineering and Electronic Business, 8(4), 18.
Article Google Scholar
Rose, S., Engel, D., Cramer, N., & Cowley, W. (2010). Automatic keyword extraction from individual documents. Text Mining: Applications and Theory. https://doi.org/10.1002/9780470689646.ch1
Article Google Scholar
Siddiqi, S., & Sharan, A. (2015). Keyword and keyphrase extraction techniques: A literature review. International Journal of Computer Applications. https://doi.org/10.5120/19161-0607
Article Google Scholar
Sterckx, L., Demeester, T., & Deleu, J. (2015). Topical word importance for fast keyphrase extraction. In Proceedings of the 24th International Conference on World Wide Web (pp. 121–122).
Tixier, A., Malliaros, F., & Vazirgiannis, M. (2016). A graph degeneracy-based approach to keyword extraction. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 1860–1870.
Treeratpituk, P., Teregowda, P., Huang, J., & Giles, C. L. (2010). Seerlab: A system for extracting key phrases from scholarly documents. In Proceedings of the 5th international workshop on semantic evaluation, pp. 182–185.
Tutkan, M., Ganiz, M. C., & Akyokuş, S. (2016). Helmholtz principle based supervised and unsupervised feature selection methods for text mining. Information Processing & Management, 52(5), 885–910.
Article Google Scholar
Vanyushkin, A., & Graschenko, L. (2020). Analysis of text collections for the purposes of keyword extraction task. Journal of Information and Organizational Sciences, 44(1), 171–184.
Article Google Scholar
Wan, X., & Xiao, J. (2008). CollabRank: towards a collaborative approach to single-document keyphrase extraction. In Proceedings of the 22nd International Conference on Computational Linguistics (pp. 969–976).
Wang, R., Liu, W., & McDonald, C. (2015, June). Using word embeddings to enhance keyword identification for scientific publications. In Australasian Database Conference. 257–268.
Wang, H., Ye, J., Yu, Z., et al. (2020). Unsupervised keyword extraction methods based on a word graph network. International Journal of Ambient Computing and Intelligence, 11(2), 68–79.
Article Google Scholar
Witten, I. H., et al. (2005). Kea: Practical automated keyphrase extraction. Design and usability of digital libraries: Case studies in the asia pacific (pp. 129–152).
Book Google Scholar
Xie, F., Wu, X., & Zhu, X. (2017). Efficient sequential pattern mining with wildcards for keyphrase extraction. Knowledge-Based Systems, 115, 27–39.
Article Google Scholar
Xu, Z., & Zhang, J. (2021). Extracting keywords from texts based on word frequency and association features. Procedia Computer Science, 187, 77–82.
Article Google Scholar
Yang, L., Li, K., & Huang, H. (2018). A new network model for extracting text keywords. Scientometrics, 116, 339–361.
Article Google Scholar
Ying, Y., Qingping, T., Qinzheng, X., Ping, Z., & Panpan, L. (2017). A graph-based approach of automatic keyphrase extraction. Procedia Computer Science, 107, 248–255.
Article Google Scholar
Zhang, K., Xu, H., Tang, J., & Li, J. (2006). Keyword extraction using support vector machine. InAdvances in Web-Age Information Management: 7th International Conference, WAIM 2006, Hong Kong, China, June 17-19, 2006. Proceedings 7 (pp. 85–96). Springer Berlin Heidelberg.
Zhang, C. (2008). Automatic keyword extraction from documents using conditional random fields. Journal of Computational Information Systems, 4(3), 1169–1180.

Download references

Acknowledgements

The authors are grateful to the reviewers for their valuable suggestions on how to improve the paper. We thank Dr. Mouda for help with the manuscript writing and Dr. Haobo for the proofreading. The author acknowledges the support by the Project No. 72074117, 71972090, 72274040 funded by National Natural Science Foundation of China; Project No. 20KJB630012, 2020SJA0344 funded by University Science Research Project and Philosophy and Social Science Research of Jiangsu Province. Project No. 2021SJZDA153 funded by the Significant Project of Jiangsu College Philosophy and Social Sciences Research. We also thank Dr. Minwei for advice on experimental design.

Author information

Authors and Affiliations

School of Information Engineering, Nanjing Audit University, Nanjing, 211815, China
Tingting Zhang, Baozhen Lee & Ke Chen
School of Information Management, Nanjing University, Nanjing, 210023, China
Qinghua Zhu
School of Business Administration, Guangdong University of Finance and Economics, Guangzhou, 510320, China
Xi Han

Authors

Tingting Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Baozhen Lee
View author publications
You can also search for this author in PubMed Google Scholar
Qinghua Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Xi Han
View author publications
You can also search for this author in PubMed Google Scholar
Ke Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Baozhen Lee.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhang, T., Lee, B., Zhu, Q. et al. Document keyword extraction based on semantic hierarchical graph model. Scientometrics 128, 2623–2647 (2023). https://doi.org/10.1007/s11192-023-04677-7

Download citation

Received: 31 January 2021
Accepted: 24 February 2023
Published: 30 March 2023
Issue Date: May 2023
DOI: https://doi.org/10.1007/s11192-023-04677-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Document keyword extraction based on semantic hierarchical graph model

Abstract

Access this article

Similar content being viewed by others

Recent automatic text summarization techniques: a survey

A comprehensive and analytical review of text clustering techniques

Automatic Text Summarization Methods: A Comprehensive Review

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Document keyword extraction based on semantic hierarchical graph model

Abstract

Access this article

Similar content being viewed by others

Recent automatic text summarization techniques: a survey

A comprehensive and analytical review of text clustering techniques

Automatic Text Summarization Methods: A Comprehensive Review

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation