A new network model for extracting text keywords

Yang, Liu; Li, Keping; Huang, Hangfei

doi:10.1007/s11192-018-2743-5

A new network model for extracting text keywords

Published: 19 April 2018

Volume 116, pages 339–361, (2018)
Cite this article

Scientometrics Aims and scope Submit manuscript

1199 Accesses
26 Citations
Explore all metrics

Abstract

Text keywords are defined as meaningful and important words in a document, which provide a precise overview of its content and reflect the author’s writing intention. Keyword extraction methods have received a lot of attentions, among which is the network-based method. However, existing network-based keyword extraction methods only consider the connections between words in a document, while ignoring the impact of sentences. Since a sentence is made of many words, while words affect one another in a sentence, neglecting the influence of sentences will result in the loss of information. In this paper, we introduce a word network whose nodes represent words in a document, and define that any keyword extraction method based on a word network is called as a Word-net method. Then, we propose a new network model which considers the influence of sentences, and a new word-sentence method based on the new model. Experimental results demonstrate that our method outperforms the Word-net method, the classical term frequency-inverse document frequency (TF-IDF) method, most frequent method and TextRank method. The precision, recall, and F-measure of our result are respectively 7.95, 8.27 and 6.54% higher than the Word-net result, and the average precision of our result is 17.56% higher than the TF-IDF result. A two-way analysis of variance is employed to validate the empirical analysis, which indicates that keyword extraction methods and keyword numbers have statistically significant effects on the evaluation of metric values.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Chinese News Keyword Extraction Algorithm Based on TextRank and Word-Sentence Collaboration

A document-structure-based complex network model for extracting text keywords

Article 17 June 2020

Keyword Extraction Approach Based on Probabilistic-Entropy, Graph, and Neural Network Methods

References

Abilhoa, W. D., & Castro, L. N. D. (2014). A keyword extraction method from twitter messages represented as graphs. Applied Mathematics and Computation, 240(4), 308–325.
Article Google Scholar
Beliga, S., & Martinčićipšić, S. (2014). Node selectivity as a measure for graph-based keyword extraction in Croatian news. Veterinary Microbiology, 152(3–4), 235–246.
Google Scholar
Beliga, S., Meštrović, A., & Martinčić-Ipšić, S. (2015). An overview of graph-based keyword extraction methods and approaches. Journal of Information and Organizational Sciences, 39(1), 1–20.
Google Scholar
Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual web search engine. Computer Networks, 30, 107–117.
Google Scholar
Cancho, R. F. I., & Solé, R. V. (2001). The small world of human language. Proceedings Biological Sciences, 268(1482), 2261.
Article Google Scholar
Carretero-Campos, C., Bernaola-Galván, P., Coronado, A. V., et al. (2013). Improving statistical keyword detection in short texts: Entropic and clustering approaches. Physica A: Statistical Mechanics and Its Applications, 392(6), 1481–1492.
Article Google Scholar
Chen, G., & Xiao, L. (2016). Selecting publication keywords for domain analysis in bibliometrics: A comparison of three methods. Journal of Informetrics, 10(1), 212–223.
Article MathSciNet Google Scholar
Chen, Q., Jiang, Z., & Bian, J. (2014). Chinese keyword extraction using semantically weighted network. In Sixth international conference on intelligent human–machine systems and cybernetics. IEEE Computer Society (pp. 83–86).
Frank, E., Paynter, G. W., Witten, I. H., Gutwin, C., & Nevill-Manning, C. G. (1999). Domain-specific keyphrase extraction. In ACM CIKM international conference on information and knowledge management (Vol. 2, pp. 283–284). Bremen, Germany.
Hasan, K. S., & Ng, V. (2011). Automatic keyphrase extraction: A survey of the state of the art. Meeting of the Association for Computational Linguistics, 2011, 1262–1273.
Google Scholar
Hong, B., & Zhen, D. (2012). An extended keyword extraction method. Physics Procedia, 24(24), 1120–1127.
Article Google Scholar
Hu, K., Wu, H., Qi, K., et al. (2017). A domain keyword analysis approach extending term frequency-keyword active index with google Word2Vec model. Scientometrics, 1, 1–38.
Google Scholar
Kleinberg, J. M. (1999). Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5), 604–632.
Article MathSciNet MATH Google Scholar
Kuhn, T., Perc, M., & Helbing, D. (2014). Inheritance patterns in citation networks reveal scientific memes. Social Science Electronic Publishing, 4(4), 041036.
Google Scholar
Lahiri, S., Choudhury, S. R., & Caragea, C. (2014). Keyword and keyphrase extraction using centrality measures on collocation networks. Computer Science, 26(1), 1–16.
Google Scholar
Lin, G. (2014). A supervised keyphrase extraction method based on the logistic regression model for social question answering sites. Journal of Information and Computational Science, 11(10), 3571–3583.
Article Google Scholar
Lv, L., Chen, D., Ren, X. L., Zhang, Q. M., Zhang, Y. C., & Zhou, T. (2016). Vital nodes identification in complex networks. Physics Reports, 650, 1–63.
Article MathSciNet Google Scholar
Lynn, H. M., Lee, E., Chang, C., Kim, P., Lynn, H. M., Lee, E., et al. (2017). Swiftrank: an unsupervised statistical approach of keyword and salient sentence extraction for individual documents. Procedia Computer Science, 113, 472–477.
Article Google Scholar
Mihalcea, R., & Tarau, P. (2004). TextRank: Bringing order into texts. In Conference on empirical methods in natural language processing (pp. 404–411).
Onan, A., Korukoğlu, V., & Bulut, H. (2016). Ensemble of keyword extraction methods and classifiers in text classification. Expert Systems with Applications, 15(57), 232–247.
Article Google Scholar
Perc, M. (2010). Growth and structure of Slovenia’s scientific collaboration network. Journal of Informetrics, 4, 475–482.
Article Google Scholar
Perc, M. (2013). Self-organization of progress across the century of physics. Scientific Reports, 3(1720), 1720.
Article Google Scholar
Perc, M. (2014). The Matthew effect in empirical data. Journal of the Royal Society, Interface, 11(98), 20140378.
Article Google Scholar
Rose, S., Engel, D., Cramer, N., et al. (2010). Automatic keyword extraction from individual documents. Text Mining: Applications and theory (pp. 1–20). New York: Wiley.
Google Scholar
Rossi, R. G., Marcacini, R. M., & Rezende, S. O. (2014). Analysis of domain independent statistical keyword extraction methods for incremental clustering. Learning and Nonlinear Models, 12(1), 17–37.
Article Google Scholar
Salton, Gerard, & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing and Management, 24(5), 513–523.
Article Google Scholar
Siddiqi, S., & Sharan, A. (2015). Keyword and keyphrase extraction techniques: A literature review. International Journal of Computer Applications, 109(2), 18–23.
Article Google Scholar
Sonawane, S. S., & Kulkarni, P. A. (2014). Graph based representation and analysis of text document: A survey of techniques. International Journal of Computer Applications, 96(19), 1–8.
Article Google Scholar
Su, X., Deng, S., & Shen, S. (2014). The design and application value of the Chinese social science citation index. Scientometrics, 98(3), 1567–1582.
Article Google Scholar
Walker, S. K. (2011). Connected: The surprising power of our social networks and how they shape our lives. Journal of Family Theory and Review, 3(3), 220–224.
Article Google Scholar
Wan, X., & Xiao, J. (2008). Single document keyphrase extraction using neighborhood knowledge. In National conference on artificial intelligence (pp. 855–860). AAAI Press.
Wang, X., Wang, L., Li, J., et al. (2012a). Exploring simultaneous keyword and key sentence extraction, improve graph-based ranking using Wikipedia. In ACM international conference on information and knowledge management. ACM (pp. 2619–2622).
Wang, Z. Y., Li, G., Li, C. Y., & Li, A. (2012b). Research on the semantic-based co-word analysis. Scientometrics, 90(3), 855–875.
Article Google Scholar
Witten, I. H., Paynter, G. W., Frank, E., Gutwin, C., & Nevill-Manning, C. G. (1999). KEA: practical automatic keyphrase extraction. In ACM conference on digital libraries. ACM (pp. 254–255).
Yang, Z., Lei, J., Fan, K., et al. (2013). Keyword extraction by entropy difference between the intrinsic and extrinsic mode. Physica A: Statistical Mechanics and Its Applications, 392(19), 4523–4531.
Article Google Scholar
Zanin, M., & Lillo, F. (2013). Modelling the air transport with complex networks: A short review. European Physical Journal Special Topics, 215(1), 5–21.
Article Google Scholar
Zanin, M., Papo, D., Sousa, P. A., et al. (2016). Combining complex networks and data mining: Why and how. Physics Reports, 635, 1–44.
Article MathSciNet Google Scholar
Zhao, P., Cai, Q. S., Wang, Q. Y., et al. (2007). Automatic keyword extraction of Chinese document algorithm based on complex network features. Pattern Recognition and Artificial Intelligence, 20(6), 827–831.
Google Scholar
Zhang, K., Xu, H., Tang, J., & Li, J. (2006). Keyword extraction using support vector machine. In International conference on advances in web-age information management (Vol. 47, pp. 85–96). Springer-Verlag.

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant No. U1434209), the National Key Research and Development Program of China (Grant No.2017YFB1201105) and the Research Foundation of State Key Laboratory of Railway Traffic Control and Safety (Grant No. RCS2018ZZ003).

Author information

Authors and Affiliations

State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing, 100044, China
Liu Yang, Keping Li & Hangfei Huang

Authors

Liu Yang
View author publications
You can also search for this author inPubMed Google Scholar
Keping Li
View author publications
You can also search for this author inPubMed Google Scholar
Hangfei Huang
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Liu Yang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, L., Li, K. & Huang, H. A new network model for extracting text keywords. Scientometrics 116, 339–361 (2018). https://doi.org/10.1007/s11192-018-2743-5

Download citation

Received: 03 January 2018
Published: 19 April 2018
Issue Date: July 2018
DOI: https://doi.org/10.1007/s11192-018-2743-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A new network model for extracting text keywords

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Chinese News Keyword Extraction Algorithm Based on TextRank and Word-Sentence Collaboration

A document-structure-based complex network model for extracting text keywords

Keyword Extraction Approach Based on Probabilistic-Entropy, Graph, and Neural Network Methods

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now