Skip to main content
Log in

A new network model for extracting text keywords

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

Text keywords are defined as meaningful and important words in a document, which provide a precise overview of its content and reflect the author’s writing intention. Keyword extraction methods have received a lot of attentions, among which is the network-based method. However, existing network-based keyword extraction methods only consider the connections between words in a document, while ignoring the impact of sentences. Since a sentence is made of many words, while words affect one another in a sentence, neglecting the influence of sentences will result in the loss of information. In this paper, we introduce a word network whose nodes represent words in a document, and define that any keyword extraction method based on a word network is called as a Word-net method. Then, we propose a new network model which considers the influence of sentences, and a new word-sentence method based on the new model. Experimental results demonstrate that our method outperforms the Word-net method, the classical term frequency-inverse document frequency (TF-IDF) method, most frequent method and TextRank method. The precision, recall, and F-measure of our result are respectively 7.95, 8.27 and 6.54% higher than the Word-net result, and the average precision of our result is 17.56% higher than the TF-IDF result. A two-way analysis of variance is employed to validate the empirical analysis, which indicates that keyword extraction methods and keyword numbers have statistically significant effects on the evaluation of metric values.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Abilhoa, W. D., & Castro, L. N. D. (2014). A keyword extraction method from twitter messages represented as graphs. Applied Mathematics and Computation, 240(4), 308–325.

    Article  Google Scholar 

  • Beliga, S., & Martinčićipšić, S. (2014). Node selectivity as a measure for graph-based keyword extraction in Croatian news. Veterinary Microbiology, 152(3–4), 235–246.

    Google Scholar 

  • Beliga, S., Meštrović, A., & Martinčić-Ipšić, S. (2015). An overview of graph-based keyword extraction methods and approaches. Journal of Information and Organizational Sciences, 39(1), 1–20.

    Google Scholar 

  • Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual web search engine. Computer Networks, 30, 107–117.

    Google Scholar 

  • Cancho, R. F. I., & Solé, R. V. (2001). The small world of human language. Proceedings Biological Sciences, 268(1482), 2261.

    Article  Google Scholar 

  • Carretero-Campos, C., Bernaola-Galván, P., Coronado, A. V., et al. (2013). Improving statistical keyword detection in short texts: Entropic and clustering approaches. Physica A: Statistical Mechanics and Its Applications, 392(6), 1481–1492.

    Article  Google Scholar 

  • Chen, G., & Xiao, L. (2016). Selecting publication keywords for domain analysis in bibliometrics: A comparison of three methods. Journal of Informetrics, 10(1), 212–223.

    Article  MathSciNet  Google Scholar 

  • Chen, Q., Jiang, Z., & Bian, J. (2014). Chinese keyword extraction using semantically weighted network. In Sixth international conference on intelligent humanmachine systems and cybernetics. IEEE Computer Society (pp. 83–86).

  • Frank, E., Paynter, G. W., Witten, I. H., Gutwin, C., & Nevill-Manning, C. G. (1999). Domain-specific keyphrase extraction. In ACM CIKM international conference on information and knowledge management (Vol. 2, pp. 283–284). Bremen, Germany.

  • Hasan, K. S., & Ng, V. (2011). Automatic keyphrase extraction: A survey of the state of the art. Meeting of the Association for Computational Linguistics, 2011, 1262–1273.

    Google Scholar 

  • Hong, B., & Zhen, D. (2012). An extended keyword extraction method. Physics Procedia, 24(24), 1120–1127.

    Article  Google Scholar 

  • Hu, K., Wu, H., Qi, K., et al. (2017). A domain keyword analysis approach extending term frequency-keyword active index with google Word2Vec model. Scientometrics, 1, 1–38.

    Google Scholar 

  • Kleinberg, J. M. (1999). Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5), 604–632.

    Article  MathSciNet  MATH  Google Scholar 

  • Kuhn, T., Perc, M., & Helbing, D. (2014). Inheritance patterns in citation networks reveal scientific memes. Social Science Electronic Publishing, 4(4), 041036.

    Google Scholar 

  • Lahiri, S., Choudhury, S. R., & Caragea, C. (2014). Keyword and keyphrase extraction using centrality measures on collocation networks. Computer Science, 26(1), 1–16.

    Google Scholar 

  • Lin, G. (2014). A supervised keyphrase extraction method based on the logistic regression model for social question answering sites. Journal of Information and Computational Science, 11(10), 3571–3583.

    Article  Google Scholar 

  • Lv, L., Chen, D., Ren, X. L., Zhang, Q. M., Zhang, Y. C., & Zhou, T. (2016). Vital nodes identification in complex networks. Physics Reports, 650, 1–63.

    Article  MathSciNet  Google Scholar 

  • Lynn, H. M., Lee, E., Chang, C., Kim, P., Lynn, H. M., Lee, E., et al. (2017). Swiftrank: an unsupervised statistical approach of keyword and salient sentence extraction for individual documents. Procedia Computer Science, 113, 472–477.

    Article  Google Scholar 

  • Mihalcea, R., & Tarau, P. (2004). TextRank: Bringing order into texts. In Conference on empirical methods in natural language processing (pp. 404–411).

  • Onan, A., Korukoğlu, V., & Bulut, H. (2016). Ensemble of keyword extraction methods and classifiers in text classification. Expert Systems with Applications, 15(57), 232–247.

    Article  Google Scholar 

  • Perc, M. (2010). Growth and structure of Slovenia’s scientific collaboration network. Journal of Informetrics, 4, 475–482.

    Article  Google Scholar 

  • Perc, M. (2013). Self-organization of progress across the century of physics. Scientific Reports, 3(1720), 1720.

    Article  Google Scholar 

  • Perc, M. (2014). The Matthew effect in empirical data. Journal of the Royal Society, Interface, 11(98), 20140378.

    Article  Google Scholar 

  • Rose, S., Engel, D., Cramer, N., et al. (2010). Automatic keyword extraction from individual documents. Text Mining: Applications and theory (pp. 1–20). New York: Wiley.

    Google Scholar 

  • Rossi, R. G., Marcacini, R. M., & Rezende, S. O. (2014). Analysis of domain independent statistical keyword extraction methods for incremental clustering. Learning and Nonlinear Models, 12(1), 17–37.

    Article  Google Scholar 

  • Salton, Gerard, & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing and Management, 24(5), 513–523.

    Article  Google Scholar 

  • Siddiqi, S., & Sharan, A. (2015). Keyword and keyphrase extraction techniques: A literature review. International Journal of Computer Applications, 109(2), 18–23.

    Article  Google Scholar 

  • Sonawane, S. S., & Kulkarni, P. A. (2014). Graph based representation and analysis of text document: A survey of techniques. International Journal of Computer Applications, 96(19), 1–8.

    Article  Google Scholar 

  • Su, X., Deng, S., & Shen, S. (2014). The design and application value of the Chinese social science citation index. Scientometrics, 98(3), 1567–1582.

    Article  Google Scholar 

  • Walker, S. K. (2011). Connected: The surprising power of our social networks and how they shape our lives. Journal of Family Theory and Review, 3(3), 220–224.

    Article  Google Scholar 

  • Wan, X., & Xiao, J. (2008). Single document keyphrase extraction using neighborhood knowledge. In National conference on artificial intelligence (pp. 855–860). AAAI Press.

  • Wang, X., Wang, L., Li, J., et al. (2012a). Exploring simultaneous keyword and key sentence extraction, improve graph-based ranking using Wikipedia. In ACM international conference on information and knowledge management. ACM (pp. 2619–2622).

  • Wang, Z. Y., Li, G., Li, C. Y., & Li, A. (2012b). Research on the semantic-based co-word analysis. Scientometrics, 90(3), 855–875.

    Article  Google Scholar 

  • Witten, I. H., Paynter, G. W., Frank, E., Gutwin, C., & Nevill-Manning, C. G. (1999). KEA: practical automatic keyphrase extraction. In ACM conference on digital libraries. ACM (pp. 254–255).

  • Yang, Z., Lei, J., Fan, K., et al. (2013). Keyword extraction by entropy difference between the intrinsic and extrinsic mode. Physica A: Statistical Mechanics and Its Applications, 392(19), 4523–4531.

    Article  Google Scholar 

  • Zanin, M., & Lillo, F. (2013). Modelling the air transport with complex networks: A short review. European Physical Journal Special Topics, 215(1), 5–21.

    Article  Google Scholar 

  • Zanin, M., Papo, D., Sousa, P. A., et al. (2016). Combining complex networks and data mining: Why and how. Physics Reports, 635, 1–44.

    Article  MathSciNet  Google Scholar 

  • Zhao, P., Cai, Q. S., Wang, Q. Y., et al. (2007). Automatic keyword extraction of Chinese document algorithm based on complex network features. Pattern Recognition and Artificial Intelligence, 20(6), 827–831.

    Google Scholar 

  • Zhang, K., Xu, H., Tang, J., & Li, J. (2006). Keyword extraction using support vector machine. In International conference on advances in web-age information management (Vol. 47, pp. 85–96). Springer-Verlag.

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant No. U1434209), the National Key Research and Development Program of China (Grant No.2017YFB1201105) and the Research Foundation of State Key Laboratory of Railway Traffic Control and Safety (Grant No. RCS2018ZZ003).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Liu Yang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, L., Li, K. & Huang, H. A new network model for extracting text keywords. Scientometrics 116, 339–361 (2018). https://doi.org/10.1007/s11192-018-2743-5

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-018-2743-5

Keywords