skip to main content
10.1145/3573428.3573541acmotherconferencesArticle/Chapter ViewAbstractPublication PageseitceConference Proceedingsconference-collections
research-article

FeaturesRank: An unsupervised keyphrase extraction approach based on features representation for Chinese documents

Published:15 March 2023Publication History

ABSTRACT

Keyphrase extraction technology can obtain the main content and semantic expression of academic literature, which plays an essential role in text retrieval, classification and clustering. We propose a new method, FeturesRank, to automatically identify meaningful and authoritative keyphrases from Chinese academic texts. FeturesRank integrates three features of keyphrase: frequency, contextual relevance and grammatical relation to measure the likelihood of sequence of words to be a meaningful phrase and introduces a scoring mechanism that combines the influence of words in the network graph with a new “phraseness” feature to calculate a normalized score for every candidate. The experimental results show that the evaluation indexes of the proposed method on Chinese academic datasets are significantly improved compared with the four popular keyphrase extraction methods, which verifies the effectiveness of the method.

References

  1. Gutwin C, Paynter G, Witten I, Nevill-Manning C, Frank E. Improving browsing in digital libraries with keyphrase indexes. Decision Support Systems, 1999, 27(1): 81–104.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Kim SN, Medelyan O, Kan MY, Baldwin T. Automatic keyphrase extraction from scientific articles. Language Resources and Evaluation, 2013. 47(3):723-742.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. P.D. Turney, Learning to Extract Keyphrases from Text, National Research Council Canada, Institute for Information Technology, 2002, pp. ERB–1057.Google ScholarGoogle Scholar
  4. K.S. Hasan, V. Ng, Automatic keyphrase extraction: A survey of the state of the art, in: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, vol. 1, 2014, pp. 1262–1273.Google ScholarGoogle ScholarCross RefCross Ref
  5. R. Mihalcea, P. Tarau, TextRank: bringing order into texts, in: Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2004.Google ScholarGoogle Scholar
  6. Page L, Brin S, Motwani R. The pagerank citation ranking: Bringing order to the Web. Technical Report, Stanford InfoLab, 1999.Google ScholarGoogle Scholar
  7. Wan X, Xiao J. Single Document Keyphrase Extraction Using Neighborhood Knowledge[C]// National Conference on Artificial Intelligence. AAAI Press, 2008.Google ScholarGoogle Scholar
  8. Wang R, Liu W, McDonald C. Corpus-Independent generic keyphrase extraction using word embedding vectors. In: Proc. of the Software Engineering Research Conf. 2014. 39.Google ScholarGoogle Scholar
  9. Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P. Natural language processing (almost) from scratch. Journal of Machine Learning Research, 2011, 12: 2493–2537.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Liu ZY, Huang WY, Zheng YB, Sun MS. Automatic keyphrase extraction via topic decomposition. In: Proc. of the EMNLP. Stroudsburg: ACL, 2010. 366–376.Google ScholarGoogle Scholar
  11. L. Sterckx, T. Demeester, J. Deleu, C. Develder, Topical word importance for fast keyphrase extraction, in: Proceedings of the 24th International Conference on World Wide Web Companion, 2015, pp. 121–122.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. D. Blei, A. Ng, M. Jordan, J. Lafferty, Latent Dirichlet allocation, J. Mach. Learn. Res. 3 (2003) 993–1022.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Danesh, T. Sumner, and J. H. Martin, ‘‘SGRank: Combining statistical and graphical methods to improve the state of the art in unsupervised keyphrase extraction,’’ in Proc. 4th Joint Conf. Lexical Comput. Semantics, Denver, CO, USA, 2015, pp. 117–126, doi: 10.18653/v1/S15-1.Google ScholarGoogle ScholarCross RefCross Ref
  14. Campos R,Mangaravite V,Pasquali A,et al. A Text Feature Based Automatic Keyword Extraction Method for SingleDocuments[A]/Proceedings of the 40th European Conference onIR Research. 2018: 684-691.Google ScholarGoogle Scholar
  15. Florescu c, Caragea C. PositionRank: An Unsupervised Approach to Keyphrase Extraction from Scholarly Documents[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017: 1105-1115.Google ScholarGoogle Scholar
  16. Boudin F, Mougard H, Cram D. How document pre-processing affects keyphrase extraction performance. In: Proc. of the COLING Workshop on Noisy User-Generated Text. Osaka: The COLING 2016 Organizing Committee, 2016. 121–128.Google ScholarGoogle Scholar
  17. Hulth A. Improved automatic keyword extraction given more linguistic knowledge. In: Proc. of the ACL. Stroudsburg: ACL, 2003. 216–223.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Meng CX, Zang Y. Research on the Improved Approach of Keyword Ex-traction Based on TextRank [J]. Computer and Digital Engineering, 2020, 48(12):3022-3026.Google ScholarGoogle Scholar
  19. Xia Tian. Extracting Key-phrases from Chinese Scholarly Papers [IJ]. Data Analysis and Knowledge Discovery, 2020,4(7):76-86.Google ScholarGoogle Scholar

Index Terms

  1. FeaturesRank: An unsupervised keyphrase extraction approach based on features representation for Chinese documents

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      EITCE '22: Proceedings of the 2022 6th International Conference on Electronic Information Technology and Computer Engineering
      October 2022
      1999 pages
      ISBN:9781450397148
      DOI:10.1145/3573428

      Copyright © 2022 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 15 March 2023

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate508of972submissions,52%
    • Article Metrics

      • Downloads (Last 12 months)19
      • Downloads (Last 6 weeks)1

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format