skip to main content
10.1145/3587716.3587972acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlcConference Proceedingsconference-collections
research-article

A Lightweight Keyword Extraction Algorithm Using a Term Weighting Scheme with Word Features

Published: 07 September 2023 Publication History

Abstract

The rapid growth of numerous collections of unstructured text increases the need to extract meaningful information. This paper proposes a new lightweight keyword extraction algorithm based on a term weighting scheme with multiple word features, including term frequency, inverse sentence frequency, term difference sentence, term position, and term length. The goal is to automatically extract important words and phrases from unstructured text without training data or domain-specific knowledge. The experimental results on several benchmark datasets show that the proposed algorithm significantly outperforms baseline and state-of-the-art approaches in terms of F1 scores.

Supplementary Material

This is a presentation file in the title of A Lightweight Keyword Extraction Algorithm Using a Term Weighting Scheme with Word Features (18112022_lightweight_keyword_extraction_algorithm_term_weighting_scheme.pdf)
This is a presentation file in the title of A Lightweight Keyword Extraction Algorithm Using a Term Weighting Scheme with Word Features (18112022_lightweight_keyword_extraction_algorithm_term_weighting_scheme.pdf)

References

[1]
Muhammad Abulaish, Jahiruddin, and Lipika Dey. 2011. Deep Text Mining for Automatic Keyphrase Extraction from Text Documents. J. Intell. Syst. 20, 4 (2011), 327–351.
[2]
Florian Boudin. 2016. pke: an open source python-based keyphrase extraction toolkit. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations. coling, Osaka, Japan, 69–73.
[3]
Ricardo Campos, Vítor Mangaravite, Arian Pasquali, Alípio Jorge, Célia Nunes, and Adam Jatowt. 2020. YAKE! Keyword extraction from single documents using multiple local features. Information Sciences 509 (2020), 257–289.
[4]
Martin Doč ekal and Pavel Smrž. 2022. Query-Based Keyphrase Extraction from Long Documents. The International FLAIRS Conference Proceedings 35 (may 2022).
[5]
Zhaoxin Huang and Zhenping Xie. 2021. A patent keywords extraction method using TextRank model with prior public knowledge. Complex & Intelligent Systems 8, 1 (mar 2021), 1–12.
[6]
Anette Hulth. 2003. Improved Automatic Keyword Extraction given More Linguistic Knowledge. In Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing(EMNLP ’03). Association for Computational Linguistics, USA, 216–223.
[7]
Muhammad Qasim Khan, Abdul Shahid, M. Irfan Uddin, Muhammad Roman, Abdullah Alharbi, Wael Alosaimi, Jameel Almalki, and Saeed M. Alshahrani. 2022. Impact analysis of keyword extraction using contextual word embedding. PeerJ Computer Science 8 (May 2022), e967.
[8]
Su Nam Kim, Olena Medelyan, Min-Yen Kan, and Timothy Baldwin. 2010. SemEval-2010 Task 5 : Automatic Keyphrase Extraction from Scientific Articles. In Proceedings of the 5th International Workshop on Semantic Evaluation. Association for Computational Linguistics, Uppsala, Sweden, 21–26. https://aclanthology.org/S10-1004
[9]
Mario M. Kubek and Herwig Unger. 2016. Centroid Terms as Text Representatives. In Proceedings of the 2016 ACM Symposium on Document Engineering (Vienna, Austria) (DocEng ’16). Association for Computing Machinery, New York, NY, USA, 99–102.
[10]
Huaishao Luo, Tianrui Li, Bing Liu, Bin Wang, and Herwig Unger. 2019. Improving Aspect Term Extraction With Bidirectional Dependency Tree Representation. IEEE/ACM Transactions on Audio, Speech, and Language Processing 27, 7 (2019), 1201–1212.
[11]
Debanjan Mahata, Rajiv Ratn Shah, John Kuriakose, Roger Zimmermann, and John R. Talburt. 2018. Theme-Weighted Ranking of Keywords from Text Documents Using Phrase Embeddings. In 2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR). ieee, Miami, Florida, USA, 184–189.
[12]
Thuy Dung Nguyen and Min-Yen Kan. 2007. Keyphrase Extraction in Scientific Publications. In Asian Digital Libraries. Looking Back 10 Years and Forging New Frontiers, Dion Hoe-Lian Goh, Tru Hoang Cao, Ingeborg Torvik Sølvberg, and Edie Rasmussen (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 317–326.
[13]
Eirini Papagiannopoulou, Grigorios Tsoumakas, and Apostolos Papadopoulos. 2021. Keyword Extraction Using Unsupervised Learning on the Document’s Adjacency Matrix. In Proceedings of the Fifteenth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-15). Association for Computational Linguistics, Mexico City, Mexico, 94–105.
[14]
Eirini Papagiannopoulou, Grigorios Tsoumakas, and Apostolos N. Papadopoulos. 2020. Keywords lie far from the mean of all words in local vector space.
[15]
Jakub Piskorski, Nicolas Stefanovitch, Guillaume Jacquet, and Aldo Podavini. 2021. Exploring Linguistically-Lightweight Keyword Extraction Techniques for Indexing News Articles in a Multilingual Set-up. In Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation. Association for Computational Linguistics, Online, 35–44.
[16]
Gollam Rabby, Saiful Azad, Mufti Mahmud, Kamal Zuhairi Bin Zamli, and Mohammed Mostafizur Rahman. 2020. TeKET: a Tree-Based Unsupervised Keyphrase Extraction Technique. Cognitive Computation 12 (2020), 811 – 833.
[17]
Gollam Rabby, Saiful Azad, “ Mufti Mahmud, Kamal Z. Zamli, and Mohammed Mostafizur Rahman. 2018. A Flexible Keyphrase Extraction Technique for Academic Literature. Procedia Computer Science 135 (2018), 553–563. The 3rd International Conference on Computer Science and Computational Intelligence (ICCSCI 2018) : Empowering Smart Technology in Digital Era for a Better Life.
[18]
Claude Sammut and Geoffrey I. Webb (Eds.). 2010. TF–IDF. Springer US, Boston, MA, 986–987.
[19]
Chengyu Sun, Liang Hu, Shuai Li, Tuohang Li, Hongtu Li, and Ling Chi. 2020. A Review of Unsupervised Keyphrase Extraction Methods Using Within-Collection Resources. Symmetry 12, 11 (2020), 1–20.
[20]
Zhong Tang, Wenqiang Li, and Yan Li. 2020. An improved term weighting scheme for text classification. Concurrency and Computation: Practice and Experience 32, 9 (2020), e5604. e5604 CPE-19-0287.R1.
[21]
Asahi Ushio, Federico Liberatore, and Jose Camacho-Collados. 2021. Back to the Basics: A Quantitative Analysis of Statistical and Graph-Based Term Weighting Schemes for Keyword Extraction.
[22]
Xiaojun Wan and Jianguo Xiao. 2008. Single Document Keyphrase Extraction Using Neighborhood Knowledge. In Proceedings of the 23rd National Conference on Artificial Intelligence - Volume 2(AAAI’08). AAAI Press, Chicago, Illinois, 855–860.
[23]
Hongbin Wang, Jingzhen Ye, Zhengtao Yu, Jian Wang, and Cunli Mao. 2020. Unsupervised Keyword Extraction Methods Based on a Word Graph Network. International Journal of Ambient Computing and Intelligence (IJACI) 11, 2 (April 2020), 68–79.

Index Terms

  1. A Lightweight Keyword Extraction Algorithm Using a Term Weighting Scheme with Word Features

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICMLC '23: Proceedings of the 2023 15th International Conference on Machine Learning and Computing
    February 2023
    619 pages
    ISBN:9781450398411
    DOI:10.1145/3587716
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 September 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. feature extraction
    2. keyword extraction
    3. statistical model
    4. term weighting
    5. unsupervised learning

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ICMLC 2023

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 46
      Total Downloads
    • Downloads (Last 12 months)25
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 20 Jan 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media