A document-structure-based complex network model for extracting text keywords

Liu, YiJun; Zhang, Li; Lian, Xiaoli

doi:10.1007/s11192-020-03542-1

A document-structure-based complex network model for extracting text keywords

Published: 17 June 2020

Volume 124, pages 1765–1791, (2020)
Cite this article

Scientometrics Aims and scope Submit manuscript

783 Accesses
1 Citation
Explore all metrics

Abstract

Keywords serving a dense summary of documents, are widely used in search engine and library to do information retrieval, content classification, speech recognition and automated text summarization. However, massive documents are lack of keywords, and the rapid generation of the large amount of content every day makes the human annotation really time-consuming. Lots of researches show that network-based approaches have remarkable performance for extracting text keywords. Traditionally, words are connected based upon their occurrence in documents. One recent work shows the significant influence of sentences on keywords extraction beyond the traditional methods only considering words. While in addition to words and sentences, chapters are the essential parts that are organized as the higher level semantic logic of the documents. Inspired by this idea, we therefore assume that chapters should contribute to the keyword extraction too. We further add the chapter factor to build a three-layer network model and propose a Word-Sentence-Chapter network-based approach for keywords extraction. Two experiments with Chinese and English documents respectively indicate that our approach outperforms the state of arts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Impact of word embedding models on text analytics in deep learning environment: a review

Article 22 February 2023

Short text topic modelling approaches in the context of big data: taxonomy, survey, and analysis

Article 26 October 2022

A survey on neural topic models: methods, applications, and challenges

Article Open access 25 January 2024

Notes

References

Basili, R., Moschitti, A., & Pazienza, M. T. (1999). A text classifier based on linguistic processing.
Beil, F., Ester, M., & Xu, X. (2002). Frequent term-based text clustering. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 436–442.
Beliga, S., & Martinčić-Ipšić, S. (2014). Node selectivity as a measure for graph-based keyword extraction in croatian news. In Proceedings of the 6th international conference on information technologies and information society, Slovenia, pp. 8–17.
Biswas, S. K., Bordoloi, M., & Shreya, J. (2018). A graph based keyword extraction model using collective node weight. Expert Systems with Applications, 97, 51–59.
Article Google Scholar
Brandes, U. (2001). A faster algorithm for betweenness centrality. Journal of Mathematical Sociology, 25(2), 163–177.
Article MATH Google Scholar
Cancho, R. F. I., & Solé, R. V. (2001). The small world of human language. Proceedings of the Royal Society of London. Series B: Biological Sciences, 268(1482), 2261–2265.
Article Google Scholar
Chang, C. (2018). Research on graph-based keyphrase extraction method integrating multiple feature. Master’s thesis, China Civil Aviation University.
Diestel, R. (2000). Graph theory. Mathematical Gazette, 173(502), 67–128.
MATH Google Scholar
Duari, S., & Bhatnagar, V. (2019). scake: Semantic connectivity aware keyword extraction. Information Sciences, 477, 100–117.
Article Google Scholar
Florescu, C., & Caragea, C. (2017). A position-biased pagerank algorithm for keyphrase extraction. In Thirty-First AAAI conference on artificial intelligence, pp. 4923–4924.
Hasan, K. S., & Ng, V. (2014). Automatic keyphrase extraction: A survey of the state of the art. In Proceedings of the 52nd annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp. 1262–1273.
Hulth, A. (2003). Improved automatic keyword extraction given more linguistic knowledge. In Proceedings of the 2003 conference on Empirical methods in natural language processing, pp. 216–223. Association for Computational Linguistics.
Huyang Qi, X. X., & Chen, C. (2012). Outline of applied linguistics. Beijing: Commercial Press.
Google Scholar
Jiang, J., Yang, Y., He, J., Blanc, X., & Zhang, L. (2017). Who should comment on this pull request? Analyzing attributes for more accurate commenter recommendation in pull-based development. Information and Software Technology, 84, 48–62.
Article Google Scholar
Krulwich, B., & Burkey, C. (1996). Learning user information interests through the extraction of semantically significant phrases. In AAAI 1996 Spring Symposium on Machine Learning in Information Access, pp. 106–110. AAAI Press.
Lahiri, S., Choudhury, S. R., & Caragea, C. (2014). Keyword and keyphrase extraction using centrality measures on collocation networks. arXiv:1401.6571.
Li, G., & Wang, H. (2014). Improved automatic keyword extraction based on textrank using domain knowledge. In Natural language processing and Chinese computing, pp. 403–413. Springer.
Li, P. (2014). Study on center nodes of co-occurrence networks of six different languages. Master’s thesis, Shandong University, China.
Li, X., Zhao, S., Luo, Y., Chen, M., & Liu, M. (2016). Statistics law of same frequency words in chinese text and its application to keywords extraction. Application Research of Computers, 33, 1007–1012.
Google Scholar
Liang, Y. (Nov 2017). Chinese keyword extraction based on weighted complex network. In 2017 12th international conference on intelligent systems and knowledge engineering (ISKE), pp. 1–5.
Lin, Z.-L., & Wang, C.-J. (2019). Keyword extraction with character-level convolutional neural tensor networks. In Pacific-Asia conference on knowledge discovery and data mining, pp. 400–413. Springer.
Liu, L., & Peng, T. (2014). Clustering-based method for positive and unlabeled text categorization enhanced by improved TFIDF. Journal of Information Science and Engineering, 30, 1463–1481.
Google Scholar
Liu, Z. (March 2011). Research on Keyword Extraction Using Document Topical Structure. PhD thesis, Computer Science and Technology, Tsinghua University, China.
Luo, Y., Zhao, S., Li, X., Han, Y., & Ding, Y. (2016). A method of text keyword extraction based on word frequency statistics. Journal of Computer Applications, 36(3), 718–725.
Google Scholar
Mahata, D., Kuriakose, J., Shah, R., & Zimmermann, R. (2018a). Key2vec: Automatic ranked keyphrase extraction from scientific articles using phrase embeddings. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pp. 634–639.
Mahata, D., Shah, R. R., Kuriakose, J., Zimmermann, R., & Talburt, J. R. (2018b). Theme-weighted ranking of keywords from text documents using phrase embeddings. In 2018 IEEE conference on multimedia information processing and retrieval (MIPR), pp. 184–189. IEEE.
McCallum, A., & Nigam, K. (1999). Text classification by bootstrapping with keywords, EM and shrinkage. In Workshop On Unsupervised Learning in Natural Language Processing, pp. 52–58.
Merrouni, Z. A., Frikh, B., & Ouhbi, B. (2019). Automatic keyphrase extraction: A survey and trends. Journal of Intelligent Information Systems, 1–34.
Mihalcea, R., & Tarau, P. (2004). Textrank: Bringing order into text. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 404–411.
Nagarajan, R., Nair, S., Aruna, P., & Puviarasan, N. (2016). Keyword extraction using graph based approach. International Journal of Advanced Research in Computer Science and Software Engineering, 6(10), 25–29.
Google Scholar
Okamoto, K., Chen, W., & Li, X.-Y. (2008). Ranking of closeness centrality for large-scale social networks. In International Workshop on Frontiers in Algorithmics, pp. 186–195. Springer.
Onan, A., Korukoğlu, S., & Bulut, H. (2016). Ensemble of keyword extraction methods and classifiers in text classification. Expert Systems with Applications, 57, 232–247.
Article Google Scholar
Pojanapunya, P., & Todd, R. W. (2018). Log-likelihood and odds ratio: Keyness statistics for different purposes of keyword analysis. Corpus Linguistics and Linguistic Theory, 14(1), 133–167.
Article Google Scholar
Raulamo-Jurvanen, P., Mantyla, M. V., & Garousi, V. (2015). Citation and topic analysis of the ESEM papers. In 2015 ACM/IEEE international symposium on empirical software engineering and measurement (ESEM), pp. 1–4. IEEE.
Ravinuthala, M. K. V., & Ch, S. R. (2016). Thematic text graph: A text representation technique for keyword weighting in extractive summarization system. International Journal of Information Engineering and Electronic Business, 8(4), 18–25.
Article Google Scholar
Rose, R. C., & Paul, D. B. (1990). A hidden Markov model based keyword recognition system. In ICASSP-90., 1990 International Conference on Acoustics, Speech, and Signal Processing, 1990, pp. 129–132. IEEE.
Sakakibara, Y., & Misue, K. (1995). Building of a document classification tree by recursive optimization of keyword selection function. US Patent 5,463,773.
Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5), 513–523.
Article Google Scholar
Salton, G., Wong, A., & Yang, C.-S. (1975). A vector space model for automatic indexing. Communications of the ACM, 18(11), 613–620.
Article MATH Google Scholar
Saramäki, J., Kivelä, M., Onnela, J.-P., Kaski, K., & Kertesz, J. (2007). Generalizations of the clustering coefficient to weighted complex networks. Physical Review E, 75(2), 027105.
Article Google Scholar
Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys (CSUR), 34(1), 1–47.
Article Google Scholar
Song, H.-J., Go, J., Park, S.-B., Park, S.-Y., & Kim, K. Y. (2017). A just-in-time keyword extraction from meeting transcripts using temporal and participant information. Journal of Intelligent Information Systems, 48(1), 117–140.
Article Google Scholar
Steinbach, M., Karypis, G., Kumar, V. et al. (2000). A comparison of document clustering techniques. Technical Report TR 00-034, University of Minnesota, 200 Union Street SE, Minneapolis, MN, USA.
Tonella, P., Ricca, F., Pianta, E., & Girardi, C. (2003). Using keyword extraction for web site clustering. In Proceedings Fifth IEEE International Workshop on Web Site Evolution, 2003. Theme: Architecture, pp. 41–48. IEEE.
Turney, P. D. (2000). Learning algorithms for keyphrase extraction. Information Retrieval, 2(4), 303–336.
Article Google Scholar
Uzun, Y. (2006). Keyword extraction using naive bayes.
Vega-Oliveros, D. A., Gomes, P. S., Milios, E. E., & Berton, L. (2019). A multi-centrality index for graph-based keyword extraction. Information Processing & Management, 56(6), 102063.
Article Google Scholar
Vu, T., & Perez, V. (2013). Interest mining from user tweets. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management, pp. 1869–1872.
Wan, X., & Xiao, J. (2008). Single document keyphrase extraction using neighborhood knowledge. In Proceedings of the 23rd national conference on artificial intelligence-Volume 2, AAAI’08, pp. 855–860. AAAI Press.
Wang, J., Song, F., Walia, K., Farber, J., & Dara, R. (2019). Using convolutional neural networks to extract keywords and keyphrases: A case study for foodborne illnesses. In 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA), pp. 1398–1403. IEEE.
Wang, L., Huai, X., et al. (2012). Semantic-based keyword extraction algorithm for chinese text. Computer Engineering, 38, 1–4.
Google Scholar
Wen, Y., Yuan, H., & Zhang, P. (2016). Research on keyword extraction based on word2vec weighted textrank. In 2016 2nd IEEE International Conference on Computer and Communications (ICCC), pp. 2109–2113. IEEE.
Wilpon, J. G., Rabiner, L. R., Lee, C.-H., & Goldman, E. (1990). Automatic recognition of keywords in unconstrained speech using hidden markov models. IEEE Transactions on Acoustics, Speech, and Signal Processing, 38(11), 1870–1878.
Article Google Scholar
Xu, G. (2015). Research on web page keyword extraction method based on word span. Master’s thesis, Xiangtan University.
Yang, K., Chen, Z., Cai, Y., Huang, D., & Leung, H.-F. (2016). Improved automatic keyword extraction given more semantic knowledge. In International conference on database systems for advanced applications, pp. 112–125. Springer.
Yang, L., Li, K., & Huang, H. (2018). A new network model for extracting text keywords. Scientometrics, 116(1), 339–361.
Article Google Scholar
Yang, R., Huang, L., & Lai, Y.-C. (2008). Selectivity-based spreading dynamics on complex networks. Physical Review E, 78(2), 026111.
Article Google Scholar
Zhang, K., Xu, H., Tang, J., & Li, J. (2006). Keyword extraction using support vector machine. In international conference on web-age information management, pp. 85–96. Springer.
Zhang, L., Pu, M., Liu, Y., & JiahaoTian, T. Y. (2018). Investigation of empirical researches in software engineering. Journal of Software, 29, 1422–1450.
Google Scholar
Zhang, Y., Liu, H., Wang, S., Ip, W., Fan, W., & Xiao, C. (2019). Automatic keyphrase extraction using word embeddings. Soft Computing, 1–16.
Zhang, Y., Tuo, M., Yin, Q., Qi, L., Wang, X., & Liu, T. (2020). Keywords extraction with deep neural network model. Neurocomputing, 383, 113–121.
Article Google Scholar
Zhang, Y., Zincir-Heywood, N., & Milios, E. (2005). Narrative text classification for automatic key phrase extraction in web document corpora. In Proceedings of the 7th annual ACM international workshop on Web information and data management, pp. 51–58.
Zhang, Z., Petrak, J., & Maynard, D. (2018). Adapted textrank for term extraction: A generic method of improving automatic term extraction algorithms. Procedia Computer Science, 137, 102–108.
Article Google Scholar
Zhao, Y., Liu, J., Tang, J., & Zhu, Q. (2013). Conceptualizing perceived affordances in social media interaction design. In Aslib Proceedings. Emerald Group Publishing Limited.
Zuo, X. (2013). Research on keyword extraction based on complex network. Master’s thesis, Xidian University.

Download references

Acknowledgements

Funding for this work has been provided by the National Science Foundation of China Grant Nos. 61672078 and 61732019.

Author information

Authors and Affiliations

Beihang University, Beijing, China
YiJun Liu, Li Zhang & Xiaoli Lian

Authors

YiJun Liu
View author publications
You can also search for this author in PubMed Google Scholar
Li Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoli Lian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaoli Lian.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, Y., Zhang, L. & Lian, X. A document-structure-based complex network model for extracting text keywords. Scientometrics 124, 1765–1791 (2020). https://doi.org/10.1007/s11192-020-03542-1

Download citation

Received: 19 December 2018
Published: 17 June 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s11192-020-03542-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A document-structure-based complex network model for extracting text keywords

Abstract

Access this article

Similar content being viewed by others

Impact of word embedding models on text analytics in deep learning environment: a review

Short text topic modelling approaches in the context of big data: taxonomy, survey, and analysis

A survey on neural topic models: methods, applications, and challenges

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A document-structure-based complex network model for extracting text keywords

Abstract

Access this article

Similar content being viewed by others

Impact of word embedding models on text analytics in deep learning environment: a review

Short text topic modelling approaches in the context of big data: taxonomy, survey, and analysis

A survey on neural topic models: methods, applications, and challenges

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation