ABSTRACT
Keyphrase extraction technology can obtain the main content and semantic expression of academic literature, which plays an essential role in text retrieval, classification and clustering. We propose a new method, FeturesRank, to automatically identify meaningful and authoritative keyphrases from Chinese academic texts. FeturesRank integrates three features of keyphrase: frequency, contextual relevance and grammatical relation to measure the likelihood of sequence of words to be a meaningful phrase and introduces a scoring mechanism that combines the influence of words in the network graph with a new “phraseness” feature to calculate a normalized score for every candidate. The experimental results show that the evaluation indexes of the proposed method on Chinese academic datasets are significantly improved compared with the four popular keyphrase extraction methods, which verifies the effectiveness of the method.
- Gutwin C, Paynter G, Witten I, Nevill-Manning C, Frank E. Improving browsing in digital libraries with keyphrase indexes. Decision Support Systems, 1999, 27(1): 81–104.Google ScholarDigital Library
- Kim SN, Medelyan O, Kan MY, Baldwin T. Automatic keyphrase extraction from scientific articles. Language Resources and Evaluation, 2013. 47(3):723-742.Google ScholarDigital Library
- P.D. Turney, Learning to Extract Keyphrases from Text, National Research Council Canada, Institute for Information Technology, 2002, pp. ERB–1057.Google Scholar
- K.S. Hasan, V. Ng, Automatic keyphrase extraction: A survey of the state of the art, in: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, vol. 1, 2014, pp. 1262–1273.Google ScholarCross Ref
- R. Mihalcea, P. Tarau, TextRank: bringing order into texts, in: Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2004.Google Scholar
- Page L, Brin S, Motwani R. The pagerank citation ranking: Bringing order to the Web. Technical Report, Stanford InfoLab, 1999.Google Scholar
- Wan X, Xiao J. Single Document Keyphrase Extraction Using Neighborhood Knowledge[C]// National Conference on Artificial Intelligence. AAAI Press, 2008.Google Scholar
- Wang R, Liu W, McDonald C. Corpus-Independent generic keyphrase extraction using word embedding vectors. In: Proc. of the Software Engineering Research Conf. 2014. 39.Google Scholar
- Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P. Natural language processing (almost) from scratch. Journal of Machine Learning Research, 2011, 12: 2493–2537.Google ScholarDigital Library
- Liu ZY, Huang WY, Zheng YB, Sun MS. Automatic keyphrase extraction via topic decomposition. In: Proc. of the EMNLP. Stroudsburg: ACL, 2010. 366–376.Google Scholar
- L. Sterckx, T. Demeester, J. Deleu, C. Develder, Topical word importance for fast keyphrase extraction, in: Proceedings of the 24th International Conference on World Wide Web Companion, 2015, pp. 121–122.Google ScholarDigital Library
- D. Blei, A. Ng, M. Jordan, J. Lafferty, Latent Dirichlet allocation, J. Mach. Learn. Res. 3 (2003) 993–1022.Google ScholarDigital Library
- S. Danesh, T. Sumner, and J. H. Martin, ‘‘SGRank: Combining statistical and graphical methods to improve the state of the art in unsupervised keyphrase extraction,’’ in Proc. 4th Joint Conf. Lexical Comput. Semantics, Denver, CO, USA, 2015, pp. 117–126, doi: 10.18653/v1/S15-1.Google ScholarCross Ref
- Campos R,Mangaravite V,Pasquali A,et al. A Text Feature Based Automatic Keyword Extraction Method for SingleDocuments[A]/Proceedings of the 40th European Conference onIR Research. 2018: 684-691.Google Scholar
- Florescu c, Caragea C. PositionRank: An Unsupervised Approach to Keyphrase Extraction from Scholarly Documents[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017: 1105-1115.Google Scholar
- Boudin F, Mougard H, Cram D. How document pre-processing affects keyphrase extraction performance. In: Proc. of the COLING Workshop on Noisy User-Generated Text. Osaka: The COLING 2016 Organizing Committee, 2016. 121–128.Google Scholar
- Hulth A. Improved automatic keyword extraction given more linguistic knowledge. In: Proc. of the ACL. Stroudsburg: ACL, 2003. 216–223.Google ScholarDigital Library
- Meng CX, Zang Y. Research on the Improved Approach of Keyword Ex-traction Based on TextRank [J]. Computer and Digital Engineering, 2020, 48(12):3022-3026.Google Scholar
- Xia Tian. Extracting Key-phrases from Chinese Scholarly Papers [IJ]. Data Analysis and Knowledge Discovery, 2020,4(7):76-86.Google Scholar
Index Terms
- FeaturesRank: An unsupervised keyphrase extraction approach based on features representation for Chinese documents
Recommendations
Keyphrase Extraction Based on Prior Knowledge
JCDL '18: Proceedings of the 18th ACM/IEEE on Joint Conference on Digital LibrariesKeyphrase is an important way to quickly get the topic of a document by providing highly-summative information. The previous approaches for keyphrase extraction simply rank keyphrases according to statistics-based model or graph-based model, which ...
Automatic keyphrase extraction for Arabic news documents based on KEA system
A keyphrase is a sequence of words that play an important role in the identification of the topics that are embedded in a given document. Keyphrase extraction is a process which extracts such phrases. This has many important applications such as document ...
Domain-specific keyphrase extraction
CIKM '05: Proceedings of the 14th ACM international conference on Information and knowledge managementDocument keyphrases provide semantic metadata characterizing documents and producing an overview of the content of a document. They can be used in many text-mining and knowledge management related applications. This paper describes a Keyphrase ...
Comments