Keyword extraction as sequence labeling with classification algorithms

Kılıç Ünlü, Hüma; Çetin, Aydın

doi:10.1007/s00521-022-07906-x

Keyword extraction as sequence labeling with classification algorithms

Original Article
Published: 11 October 2022

Volume 35, pages 3413–3422, (2023)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

671 Accesses
9 Citations
1 Altmetric
Explore all metrics

Abstract

Keyword extraction is one of the main problems in clustering and linking textual content. In literature, several machine learning approaches were proposed for keyword and keyphrase extraction. However, the state-of-the-art performance results are still below the expectations. In this paper, we propose a novel hybrid keyword extraction model, HybridKEM. The proposed model addresses the keyword extraction problem as a sequence labelling task. Naive Bayes (NB), Polynomial Regression (PR) Support Vector Machine (SVM), Multi-Layer Perceptron (MLP), and Random Forest (RF) classification algorithms were trained separately in the Token Classification module of the model. The Token Classification process was performed by using text, graphic, embedding, and set features in the model. The performance of the model was evaluated using the Inspec, Semeval-2017, 500N-KPCrowd datasets, which are widely used in studies in the literature, and two newly collected, TRDizinEn and DergiParkEn datasets. The model achieved an average $F_1$-score of 0.664 for all datasets. The highest $F_1$-score (0.74) was obtained with the TRDizinEn dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Novel Graph-Based Ensemble Token Classification Model for Keyword Extraction

Article 04 March 2023

Keyword Extraction Approach Based on Probabilistic-Entropy, Graph, and Neural Network Methods

YAKE! Collection-Independent Automatic Keyword Extractor

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Shamshirband S, Rabczuk T, Chau K-W (2019) A survey of deep learning techniques: application in wind and solar energy resources. IEEE Access 7:164650–164666
Article Google Scholar
Fan Y, Xu K, Wu H, Zheng Y, Tao B (2020) Spatiotemporal modeling for nonlinear distributed thermal processes based on kl decomposition, mlp and lstm network. IEEE Access 8:25111–25121
Article Google Scholar
Afan HA, Osman A, Essam Y, Ahmed AN, Huang YF, Kisi O, Sherif M, Sefelnasr A, Chau K-W, El-Shafie A (2021) Modeling the fluctuations of groundwater level by employing ensemble deep learning techniques. Eng Appl Comput Fluid Mech 15(1):1420–1439
Google Scholar
Wang W-C, Du Y-J, Chau K-W, Xu D-M, Liu C-J, Ma Q (2021) An ensemble hybrid forecasting model for annual runoff based on sample entropy, secondary decomposition, and long short-term memory neural network. Water Resour Manage 35(14):4695–4726
Article Google Scholar
Chen C, Zhang Q, Kashani MH, Jun C, Bateni SM, Band SS, Dash SS, Chau K-W (2022) Forecast of rainfall distribution based on fixed sliding window long short-term memory. Eng Appl Comput Fluid Mech 16(1):248–261
Google Scholar
Wang X, Zhang S, Qiao H, Liu L, Tian F (2022) Mid-long term forecasting of reservoir inflow using the coupling of time-varying filter-based empirical mode decomposition and gated recurrent unit. Environ Sci Pollut Res 45:1–18
Google Scholar
Jung S, Jeoung J, Hong T (2022) Occupant-centered real-time control of indoor temperature using deep learning algorithms. Build Environ 208:108633
Article Google Scholar
Tomokiyo T, Hurst M (2003) A language model approach to keyphrase extraction. In: Proceedings of the ACL 2003 Workshop on Multiword Expressions: Analysis, Acquisition and Treatment, pp. 33–40
Haddoud M, Mokhtari A, Lecroq T, Abdeddaïm S (2015) Accurate keyphrase extraction from scientific papers by mining linguistic information. In: CLBib@ ISSI, pp. 12–17
Hong B, Zhen D (2012) An extended keyword extraction method. Phys Proc 24:1120–1127
Article Google Scholar
Ramos J, et al (2003) Using tf-idf to determine word relevance in document queries. In: Proceedings of the First Instructional Conference on Machine Learning, vol. 242, pp. 29–48. Citeseer
El-Beltagy SR, Rafea A (2009) Kp-miner: a keyphrase extraction system for english and arabic documents. Inf Syst 34(1):132–144
Article Google Scholar
Campos R, Mangaravite V, Pasquali A, Jorge AM, Nunes C, Jatowt A (2018) A text feature based automatic keyword extraction method for single documents. In: European Conference on Information Retrieval, pp. 684–691. Springer
Mihalcea R, Tarau P (2004) Textrank: Bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 404–411
Zhao WX, Jiang J, He J, Song Y, Achanauparp P, Lim E-P, Li X (2011) Topical keyphrase extraction from twitter. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 379–388
Alfarra MR, Alfarra A (2018) Graph-based technique for extracting keyphrases in a single-document (gtek). In: 2018 International Conference on Promising Electronic Technologies (ICPET), pp. 92–97. IEEE
Bennani-Smires K, Musat C, Hossmann A, Baeriswyl M, Jaggi M (2018) Simple unsupervised keyphrase extraction using sentence embeddings. Preprint at https://arxiv.org/abs/1801.04470
Sun Y, Qiu H, Zheng Y, Wang Z, Zhang C (2020) Sifrank: a new baseline for unsupervised keyphrase extraction based on pre-trained language model. IEEE Access 8:10896–10906
Article Google Scholar
Liang X, Wu S, Li M, Li Z (2021) Unsupervised keyphrase extraction by jointly modeling local and global context. Preprint at https://arxiv.org/abs/2109.07293
Ajallouda L, Fagroud FZ, Zellou A, Lahmar EB (2022) Kp-use: an unsupervised approach for key-phrases extraction from documents. Int J Adv Computer Sci Appl 13:4
Google Scholar
Lau JH, Baldwin T (2016) An empirical evaluation of doc2vec with practical insights into document embedding generation. http://arxiv.org/abs/1607.05368
Pagliardini M, Gupta P, Jaggi M (2017) Unsupervised learning of sentence embeddings using compositional n-gram features. http://arxiv.org/abs/1703.02507
Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. Preprint at https://arxiv.org/abs/1810.04805
Cer D, Yang Y, Kong S-Y, Hua N, Limtiaco N, John RS, Constant N, Guajardo-Cespedes M, Yuan S, Tar C et al (2018) Universal sentence encoder. http://arxiv.org/abs/1803.11175
Zehtab-Salmasi A, Feizi-Derakhshi M-R, Balafar M-A (2021) FRAKE: Fusional real-time automatic keyword extraction. Preprint at https://arxiv.org/abs/2104.04830
Shen X, Wang Y, Meng R, Shang J (2022) Unsupervised deep keyphrase generation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 11303–11311
Meng R, Zhao S, Han S, He D, Brusilovsky P, Chi Y (2017) Deep keyphrase generation. Preprint at https://arxiv.org/abs/1704.06879
Yuan X, Wang T, Meng R, Thaker K, Brusilovsky P, He D, Trischler A (2018) One size does not fit all: generating and evaluating variable number of keyphrases. Preprint at https://arxiv.org/abs/1810.05241
Ye J, Cai R, Gui T, Zhang Q (2021) Heterogeneous graph neural networks for keyphrase generation. Preprint at https://arxiv.org/abs/2109.04703
Wu H, Liu W, Li L, Nie D, Chen T, Zhang F, Wang D (2021) UniKeyphrase: a unified extraction and generation framework for keyphrase prediction. Preprint at https://arxiv.org/abs/2106.04847
Zhang Y, Jiang T, Yang T, Li X, Wang S (2022) Htkg: Deep keyphrase generation with neural hierarchical topic guidance
Yang P, Ge Y, Yao Y, Yang Y (2022) Gcn-based document representation for keyphrase generation enhanced by maximizing mutual information. Knowl-Based Syst 243:108488
Article Google Scholar
Sahrawat D, Mahata D, Zhang H, Kulkarni M, Sharma A, Gosangi R, Stent A, Kumar Y, Shah RR, Zimmermann R (2020) Keyphrase extraction as sequence labeling using contextualized embeddings. In: European Conference on Information Retrieval, pp. 328–335. Springer
Duari S, Bhatnagar V (2020) Complex network based supervised keyword extractor. Expert Syst Appl 140:112876
Article Google Scholar
Liu R, Lin Z, Wang W (2020) Keyphrase prediction with pre-trained language model. arXiv preprint http://arxiv.org/abs/2004.10462
Gero Z, Ho J (2021) Word centrality constrained representation for keyphrase extraction. In: Proceedings of the 20th Workshop on Biomedical Language Processing, pp. 155–161
Nikzad-Khasmakhi N, Feizi-Derakhshi M-R, Asgari-Chenaghlu M, Balafar M-A, Feizi-Derakhshi A-R, Rahkar-Farshi T, Ramezani M, Jahanbakhsh-Nagadeh Z, Zafarani-Moattar E, Ranjbar-Khadivi M (2021) Phraseformer: Multimodal key-phrase extraction using transformer and graph embedding. http://arxiv.org/abs/2106.04939
Basaldella M, Antolli E, Serra G, Tasso C (2018) Bidirectional lstm recurrent neural network for keyphrase extraction. In: Italian Research Conference on Digital Libraries, pp. 180–187. Springer
Alzaidy R, Caragea C, Giles CL (2019) Bi-lstm-crf sequence labeling for keyphrase extraction from scholarly documents. In: The World Wide Web Conference, pp. 2551–2557
Vega-Oliveros DA, Gomes PS, Milios EE, Berton L (2019) A multi-centrality index for graph-based keyword extraction. Inf Process Manage 56(6):102063
Article Google Scholar
Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Computers Electr Eng 40(1):16–28
Article Google Scholar
Hulth A (2003) Improved automatic keyword extraction given more linguistic knowledge. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, pp. 216–223
Marujo L, Viveiros M, Neto JPDS (2013) Keyphrase cloud generation of broadcast news. Preprint at https://arxiv.org/abs/1306.4606
Augenstein I, Das M, Riedel S, Vikraman L, McCallum A (2014) Semeval 2017 task 10: Scienceie-extracting keyphrases and relations from scientific publications. Preprint at https://arxiv.org/abs/1704.02853
Krapivin M, Autaeu A, Marchese M (2009) Large dataset for keyphrases extraction
Nguyen TD, Kan M-Y (2007) Keyphrase extraction in scientific publications. In: International Conference on Asian Digital Libraries, pp. 317–326. Springer
Aronson AR, Bodenreider O, Chang HF, Humphrey SM, Mork JG, Nelson SJ, Rindflesch TC, Wilbur WJ (2000) The nlm indexing initiative. In: Proceedings of the AMIA Symposium, p. 17. American Medical Informatics Association
Kim SN, Medelyan O, Kan M-Y, Baldwin T, Pingar L (2010) Semeval-2010 task 5: Automatic keyphrase extraction from scientific
Zhao M-J, Edakunni N, Pocock A, Brown G (2013) Beyond fano’s inequality: bounds on the optimal f-score, ber, and cost-sensitive risk and their implications. J Mach Learn Res 14(1):1033–1090
MATH Google Scholar
Marcot BG, Hanea AM (2021) What is an optimal value of k in k-fold cross-validation in discrete bayesian network analysis? Comput Stat 36(3):2009–2031
Article MATH Google Scholar
Argamon S, Levitan S (2005) Measuring the usefulness of function words for authorship attribution. In: Proceedings of the Joint Conference of the Association for Computers and the Humanities and the Association for Literary and Linguistic Computing, pp. 1–3
Ghosh S, Saha C, Molakathaala N (2022) Neuragen-a low-resource neural network based approach for gender classification. http://arxiv.org/abs/2203.15253
Hafeez S, Kathirisetty N (2022) Effects and comparison of different data pre-processing techniques and ml and deep learning models for sentiment analysis: Svm, knn, pca with svm and cnn. In: 2022 First International Conference on Artificial Intelligence Trends and Pattern Recognition (ICAITPR), pp. 1–6. IEEE
Passon M, Comuzzo M, Serra G, Tasso C (2019) 0Keyphrase extraction via an attentive model. In: Italian Research Conference on Digital Libraries, pp. 304–314. Springer

Download references

Acknowledgements

We thank TUBITAK Ulakbim for providing the TRDizinEn dataset for this study. We make the DergiParkEn dataset publicly available at http://github.com/humakilicunlu/DergiParkEn.

Author information

Authors and Affiliations

Computer Engineering Department, Faculty of Technology, Gazi University, 06500, Ankara, Turkey
Hüma Kılıç Ünlü & Aydın Çetin

Authors

Hüma Kılıç Ünlü
View author publications
You can also search for this author inPubMed Google Scholar
Aydın Çetin
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Hüma Kılıç Ünlü.

Ethics declarations

Conflict of Interest

No potential conflict of interest was reported by the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Kılıç Ünlü, H., Çetin, A. Keyword extraction as sequence labeling with classification algorithms. Neural Comput & Applic 35, 3413–3422 (2023). https://doi.org/10.1007/s00521-022-07906-x

Download citation

Received: 22 April 2022
Accepted: 28 September 2022
Published: 11 October 2022
Issue Date: February 2023
DOI: https://doi.org/10.1007/s00521-022-07906-x

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Keyword extraction as sequence labeling with classification algorithms

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Novel Graph-Based Ensemble Token Classification Model for Keyword Extraction

Keyword Extraction Approach Based on Probabilistic-Entropy, Graph, and Neural Network Methods

YAKE! Collection-Independent Automatic Keyword Extractor

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now