skip to main content
10.1145/2993318.2993323acmotherconferencesArticle/Chapter ViewAbstractPublication PagessemanticsConference Proceedingsconference-collections
research-article

A Supervised KeyPhrase Extraction System

Published: 12 September 2016 Publication History

Abstract

In this paper, we present a multi-featured supervised automatic keyword extraction system. We extracted salient semantic features which are descriptive of candidate keyphrases, a Random Forest classifier was used for training. The system achieved an accuracy of 58.3 % precision and has shown to outperform two top performing systems when benchmarked on a crowdsourced dataset. Furthermore, our approach achieved a personal best Precision and F-measure score of 32.7 and 25.5 respectively on the Semeval Keyphrase extraction challenge dataset. The paper describes the approaches used as well as the result obtained.

References

[1]
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. the Journal of machine Learning research, 3:993--1022, 2003.
[2]
G. Boella, L. Di Caro, A. Ruggeri, and L. Robaldo. Learning from syntax generalizations for automatic semantic annotation. Journal of Intelligent Information Systems, 43(2):231--246, 2014.
[3]
L. Breiman. Random forests. Machine learning, 45(1):5--32, 2001.
[4]
M. Cataldi, L. D. Caro, and C. Schifanella. Personalized emerging topic detection based on a term aging model. ACM Transactions on Intelligent Systems and Technology (TIST), 5(1):7, 2013.
[5]
J. Chuang, C. D. Manning, and J. Heer. Şwithout the clutter of unimportant wordsŤ: Descriptive keyphrases for text visualization. ACM Transactions on Computer-Human Interaction (TOCHI), 19(3):19, 2012.
[6]
L. Di Caro, K. S. Candan, and M. L. Sapino. Navigating within news collections using tag-flakes. Journal of Visual Languages & Computing, 22(2):120--139, 2011.
[7]
M. J. Giarlo. A comparative analysis of keyword extraction techniques. 2005.
[8]
K. S. Hasan and V. Ng. Automatic keyphrase extraction: A survey of the state of the art. In ACL (1), pages 1262--1273, 2014.
[9]
M. A. Hearst. Texttiling: Segmenting text into multi-paragraph subtopic passages. Computational linguistics, 23(1):33--64, 1997.
[10]
A. Hulth. Improved automatic keyword extraction given more linguistic knowledge. In Proceedings of the 2003 conference on Empirical methods in natural language processing, pages 216--223. Association for Computational Linguistics, 2003.
[11]
A. Hulth and B. B. Megyesi. A study on automatically extracted keywords in text categorization. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, pages 537--544. Association for Computational Linguistics, 2006.
[12]
S. N. Kim, O. Medelyan, M.-Y. Kan, and T. Baldwin. Semeval-2010 task 5: Automatic keyphrase extraction from scientific articles. In Proceedings of the 5th International Workshop on Semantic Evaluation, pages 21--26. Association for Computational Linguistics, 2010.
[13]
A. Liaw and M. Wiener. Classification and regression by randomforest. R news, 2(3):18--22, 2002.
[14]
Z. Liu, W. Huang, Y. Zheng, and M. Sun. Automatic keyphrase extraction via topic decomposition. In Proceedings of the 2010 conference on empirical methods in natural language processing, pages 366--376. Association for Computational Linguistics, 2010.
[15]
L. Marujo, A. Gershman, J. Carbonell, R. Frederking, and J. P. Neto. Supervised topical key phrase extraction of news stories using crowdsourcing, light filtering and co-reference normalization. arXiv preprint arXiv:1306.4886, 2013.
[16]
Y. Matsuo and M. Ishizuka. Keyword extraction from a single document using word co-occurrence statistical information. International Journal on Artificial Intelligence Tools, 13(01):157--169, 2004.
[17]
O. Medelyan. Human-competitive automatic topic indexing. PhD thesis, The University of Waikato, 2009.
[18]
O. Medelyan, V. Perrone, and I. H. Witten. Subject metadata support powered by maui. In Proceedings of the 10th annual joint conference on Digital libraries, pages 407--408. ACM, 2010.
[19]
R. Mihalcea and P. Tarau. Textrank: Bringing order into texts. Association for Computational Linguistics, 2004.
[20]
T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.
[21]
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111--3119, 2013.
[22]
T. D. Nguyen and M.-Y. Kan. Keyphrase extraction in scientific publications. In Asian Digital Libraries. Looking Back 10 Years and Forging New Frontiers, pages 317--326. Springer, 2007.
[23]
S. Rose, D. Engel, N. Cramer, and W. Cowley. Automatic keyword extraction from individual documents. Text Mining, pages 1--20, 2010.
[24]
P. Turney. Learning to extract keyphrases from text. 1999.
[25]
P. D. Turney. Learning algorithms for keyphrase extraction. Information Retrieval, 2(4):303--336, 2000.
[26]
P. D. Turney, P. Pantel, et al. From frequency to meaning: Vector space models of semantics. Journal of artificial intelligence research, 37(1):141--188, 2010.
[27]
L. Van Der Plas, V. Pallotta, M. Rajman, and H. Ghorbel. Automatic keyword extraction from spoken text. a comparison of two lexical resources: the edr and wordnet. arXiv preprint cs/0410062, 2004.
[28]
I. H. Witten, G. W. Paynter, E. Frank, C. Gutwin, and C. G. Nevill-Manning. Kea: Practical automatic keyphrase extraction. In Proceedings of the fourth ACM conference on Digital libraries, pages 254--255. ACM, 1999.

Cited By

View all
  • (2025)Unstructured Text Data Security Attribute Mining Method Based on Multi‐Model CollaborationConcurrency and Computation: Practice and Experience10.1002/cpe.836737:3Online publication date: 20-Jan-2025
  • (2024)Rapid Unsupervised Keyphrase Extraction from Single Document2024 36th Conference of Open Innovations Association (FRUCT)10.23919/FRUCT64283.2024.10749871(609-616)Online publication date: 30-Oct-2024
  • (2024)MOOC Concept Extraction from Chinese Texts: A Rule and Graph Propagation Based Method2024 11th International Conference on Behavioural and Social Computing (BESC)10.1109/BESC64747.2024.10780751(1-7)Online publication date: 16-Aug-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
SEMANTiCS 2016: Proceedings of the 12th International Conference on Semantic Systems
September 2016
207 pages
ISBN:9781450347525
DOI:10.1145/2993318
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • Ghent University: Ghent University
  • AIT: Austrian Institute of Technology
  • Stanford University: Stanford University
  • Wolters Kluwer: Wolters Kluwer, Germany
  • Semantic Web Company: Semantic Web Company

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 September 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Keywords
  2. Random Forest
  3. keyphrase

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

SEMANTiCS 2016

Acceptance Rates

SEMANTiCS 2016 Paper Acceptance Rate 18 of 85 submissions, 21%;
Overall Acceptance Rate 40 of 182 submissions, 22%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Unstructured Text Data Security Attribute Mining Method Based on Multi‐Model CollaborationConcurrency and Computation: Practice and Experience10.1002/cpe.836737:3Online publication date: 20-Jan-2025
  • (2024)Rapid Unsupervised Keyphrase Extraction from Single Document2024 36th Conference of Open Innovations Association (FRUCT)10.23919/FRUCT64283.2024.10749871(609-616)Online publication date: 30-Oct-2024
  • (2024)MOOC Concept Extraction from Chinese Texts: A Rule and Graph Propagation Based Method2024 11th International Conference on Behavioural and Social Computing (BESC)10.1109/BESC64747.2024.10780751(1-7)Online publication date: 16-Aug-2024
  • (2024)AdaptiveUKE: Towards adaptive unsupervised keyphrase extraction with gated topic modelingExpert Systems with Applications10.1016/j.eswa.2024.123926250(123926)Online publication date: Sep-2024
  • (2022)A Method of Domain Dictionary Construction for Electric Vehicles DisassemblyEntropy10.3390/e2403036324:3(363)Online publication date: 3-Mar-2022
  • (2021)Dijital Kütüphanelerde Dokümanlardan Bilgi Geri Kazanımı için Kullanılan Güncel Teknolojiler: Derleme ÇalışmasıCurrent Technologies for Information Retrieval of Documents in Digital Libraries: A SurveyDüzce Üniversitesi Bilim ve Teknoloji Dergisi10.29130/dubited.7969649:1(79-91)Online publication date: 31-Jan-2021
  • (2021)A novel cluster-based approach for keyphrase extraction from MOOC video lecturesKnowledge and Information Systems10.1007/s10115-021-01568-2Online publication date: 21-Apr-2021
  • (2020)FLAKE: Fuzzy Graph Centrality-based Automatic Keyword ExtractionThe Computer Journal10.1093/comjnl/bxaa13365:4(926-939)Online publication date: 5-Dec-2020
  • (2019)Toward Keyword Extraction in Constrained Information Retrieval in Vehicle Social NetworkIEEE Transactions on Vehicular Technology10.1109/TVT.2019.290679968:5(4285-4294)Online publication date: May-2019
  • (2019)Unstructured Text Resource Access Control Attribute Mining Technology Based on Convolutional Neural NetworkIEEE Access10.1109/ACCESS.2019.29078157(43031-43041)Online publication date: 2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media