research-article

FeaturesRank: An unsupervised keyphrase extraction approach based on features representation for Chinese documents

Authors:
Yining Zhao

College of Computer Sciences, Qufu Normal University, China and Shandong Institute of Big Data, China

College of Computer Sciences, Qufu Normal University, China and Shandong Institute of Big Data, China

0000-0002-8518-6152
View Profile

,
Xiaomin Zhu

Shandong Institute of Big Data, China

Shandong Institute of Big Data, China

0000-0002-7983-8978
View Profile

,
Maoli Wang

College of Cybersecurity, Qufu Normal University, China

College of Cybersecurity, Qufu Normal University, China

0000-0001-5420-1463
View Profile

,
Xinming Wang

Shandong Institute of Big Data, China

Shandong Institute of Big Data, China

0000-0002-9741-4853
View Profile

,
Min Zou

Shandong Institute of Big Data, China

Shandong Institute of Big Data, China

0000-0001-5307-0131
View Profile

,
Kaizhi Li

College of Computer Sciences, Qufu Normal University, China

College of Computer Sciences, Qufu Normal University, China

0000-0003-1240-9287
View Profile

EITCE '22: Proceedings of the 2022 6th International Conference on Electronic Information Technology and Computer EngineeringOctober 2022Pages 637–643https://doi.org/10.1145/3573428.3573541

Published:15 March 2023Publication History

EITCE '22: Proceedings of the 2022 6th International Conference on Electronic Information Technology and Computer Engineering

Pages 637–643

ABSTRACT

Keyphrase extraction technology can obtain the main content and semantic expression of academic literature, which plays an essential role in text retrieval, classification and clustering. We propose a new method, FeturesRank, to automatically identify meaningful and authoritative keyphrases from Chinese academic texts. FeturesRank integrates three features of keyphrase: frequency, contextual relevance and grammatical relation to measure the likelihood of sequence of words to be a meaningful phrase and introduces a scoring mechanism that combines the influence of words in the network graph with a new “phraseness” feature to calculate a normalized score for every candidate. The experimental results show that the evaluation indexes of the proposed method on Chinese academic datasets are significantly improved compared with the four popular keyphrase extraction methods, which verifies the effectiveness of the method.

References

Gutwin C, Paynter G, Witten I, Nevill-Manning C, Frank E. Improving browsing in digital libraries with keyphrase indexes. Decision Support Systems, 1999, 27(1): 81–104.Google ScholarDigital Library
Kim SN, Medelyan O, Kan MY, Baldwin T. Automatic keyphrase extraction from scientific articles. Language Resources and Evaluation, 2013. 47(3):723-742.Google ScholarDigital Library
P.D. Turney, Learning to Extract Keyphrases from Text, National Research Council Canada, Institute for Information Technology, 2002, pp. ERB–1057.Google Scholar
K.S. Hasan, V. Ng, Automatic keyphrase extraction: A survey of the state of the art, in: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, vol. 1, 2014, pp. 1262–1273.Google ScholarCross Ref
R. Mihalcea, P. Tarau, TextRank: bringing order into texts, in: Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2004.Google Scholar
Page L, Brin S, Motwani R. The pagerank citation ranking: Bringing order to the Web. Technical Report, Stanford InfoLab, 1999.Google Scholar
Wan X, Xiao J. Single Document Keyphrase Extraction Using Neighborhood Knowledge[C]// National Conference on Artificial Intelligence. AAAI Press, 2008.Google Scholar
Wang R, Liu W, McDonald C. Corpus-Independent generic keyphrase extraction using word embedding vectors. In: Proc. of the Software Engineering Research Conf. 2014. 39.Google Scholar
Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P. Natural language processing (almost) from scratch. Journal of Machine Learning Research, 2011, 12: 2493–2537.Google ScholarDigital Library
Liu ZY, Huang WY, Zheng YB, Sun MS. Automatic keyphrase extraction via topic decomposition. In: Proc. of the EMNLP. Stroudsburg: ACL, 2010. 366–376.Google Scholar
L. Sterckx, T. Demeester, J. Deleu, C. Develder, Topical word importance for fast keyphrase extraction, in: Proceedings of the 24th International Conference on World Wide Web Companion, 2015, pp. 121–122.Google ScholarDigital Library
D. Blei, A. Ng, M. Jordan, J. Lafferty, Latent Dirichlet allocation, J. Mach. Learn. Res. 3 (2003) 993–1022.Google ScholarDigital Library
S. Danesh, T. Sumner, and J. H. Martin, ‘‘SGRank: Combining statistical and graphical methods to improve the state of the art in unsupervised keyphrase extraction,’’ in Proc. 4th Joint Conf. Lexical Comput. Semantics, Denver, CO, USA, 2015, pp. 117–126, doi: 10.18653/v1/S15-1.Google ScholarCross Ref
Campos R,Mangaravite V,Pasquali A,et al. A Text Feature Based Automatic Keyword Extraction Method for SingleDocuments[A]/Proceedings of the 40th European Conference onIR Research. 2018: 684-691.Google Scholar
Florescu c, Caragea C. PositionRank: An Unsupervised Approach to Keyphrase Extraction from Scholarly Documents[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017: 1105-1115.Google Scholar
Boudin F, Mougard H, Cram D. How document pre-processing affects keyphrase extraction performance. In: Proc. of the COLING Workshop on Noisy User-Generated Text. Osaka: The COLING 2016 Organizing Committee, 2016. 121–128.Google Scholar
Hulth A. Improved automatic keyword extraction given more linguistic knowledge. In: Proc. of the ACL. Stroudsburg: ACL, 2003. 216–223.Google ScholarDigital Library
Meng CX, Zang Y. Research on the Improved Approach of Keyword Ex-traction Based on TextRank [J]. Computer and Digital Engineering, 2020, 48(12):3022-3026.Google Scholar
Xia Tian. Extracting Key-phrases from Chinese Scholarly Papers [IJ]. Data Analysis and Knowledge Discovery, 2020,4(7):76-86.Google Scholar

Index Terms

FeaturesRank: An unsupervised keyphrase extraction approach based on features representation for Chinese documents
1. Software and its engineering
  1. Software creation and management
    1. Software development process management
      1. Software development methods
        Agile software development

Recommendations

Keyphrase Extraction Based on Prior Knowledge
JCDL '18: Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries

Keyphrase is an important way to quickly get the topic of a document by providing highly-summative information. The previous approaches for keyphrase extraction simply rank keyphrases according to statistics-based model or graph-based model, which ...
Read More
Automatic keyphrase extraction for Arabic news documents based on KEA system

A keyphrase is a sequence of words that play an important role in the identification of the topics that are embedded in a given document. Keyphrase extraction is a process which extracts such phrases. This has many important applications such as document ...
Read More
Domain-specific keyphrase extraction
CIKM '05: Proceedings of the 14th ACM international conference on Information and knowledge management

Document keyphrases provide semantic metadata characterizing documents and producing an overview of the content of a document. They can be used in many text-mining and knowledge management related applications. This paper describes a Keyphrase ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

EITCE '22: Proceedings of the 2022 6th International Conference on Electronic Information Technology and Computer Engineering
October 2022
1999 pages
ISBN:9781450397148
DOI:10.1145/3573428

Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 March 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
TextRank
features representation
keyphrase extraction technology
scoring mechanism
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate508of972submissions,52%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 23
  Total Downloads
- Downloads (Last 12 months)19
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

FeaturesRank: An unsupervised keyphrase extraction approach based on features representation for Chinese documents

EITCE '22: Proceedings of the 2022 6th International Conference on Electronic Information Technology and Computer Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Keyphrase Extraction Based on Prior Knowledge

Automatic keyphrase extraction for Arabic news documents based on KEA system

Domain-specific keyphrase extraction

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

FeaturesRank: An unsupervised keyphrase extraction approach based on features representation for Chinese documents

EITCE '22: Proceedings of the 2022 6th International Conference on Electronic Information Technology and Computer Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Keyphrase Extraction Based on Prior Knowledge

Automatic keyphrase extraction for Arabic news documents based on KEA system

Domain-specific keyphrase extraction

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media