skip to main content
10.1145/1099554.1099628acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

Domain-specific keyphrase extraction

Published: 31 October 2005 Publication History

Abstract

Document keyphrases provide semantic metadata characterizing documents and producing an overview of the content of a document. They can be used in many text-mining and knowledge management related applications. This paper describes a Keyphrase Identification Program (KIP), which extracts document keyphrases by using prior positive samples of human identified domain keyphrases to assign weights to the candidate keyphrases. The logic of our algorithm is: the more keywords a candidate keyphrase contains and the more significant these keywords are, the more likely this candidate phrase is a keyphrase. To obtain prior positive inputs, KIP first populates its glossary database using manually identified keyphrases and keywords. It then checks the composition of all noun phrases of a document, looks up the database and calculates scores for all these noun phrases. The ones having higher scores will be extracted as keyphrases.

References

[1]
Brill, E. Transformation-based Error-driven Learning and Natural Language Processing: A Case study in Part-of-speech Tagging. Computational Linguistics 21(4), 1995.
[2]
Frank, E., Paynter, G., Witten, I., Gutwin, C., and Nevill-Manning, C. Domain-specific keyphrase extraction. Proceeding of the sixteenth international joint conference on artificial intelligence, San Mateo, CA, 1999, 668--673.
[3]
Jones, S., and Staveley, M. Phrasier: A system for interactive document retrieval using keyphrases. Proceedings of SIGIR'99: ACM Press, Berkeley, CA, 1999, 160--167.
[4]
Li, Q., Wu, Y .B., Bot, R. S., and Chen, X. Incorporating Document Keyphrases in Search Results. Proceedings of the Tenth Americas Conference on Information Systems, New York, New York. 2004
[5]
Turney, P. D. Learning algorithm for keyphrase extraction. Information Retrieval, 2(4), 2000, 303--336.

Cited By

View all
  • (2023)Contextual topic discovery using unsupervised keyphrase extraction and hierarchical semantic graph modelJournal of Big Data10.1186/s40537-023-00833-110:1Online publication date: 12-Oct-2023
  • (2023)Exploring Potential Drivers Influencing BIM Adoption in the AEC Industry: A Systematic Approach2023 IEEE International Conference on Computing (ICOCO)10.1109/ICOCO59262.2023.10397641(124-129)Online publication date: 9-Oct-2023
  • (2023)Incorporation of company-related factual knowledge into pre-trained language models for stock-related spam tweet filteringExpert Systems with Applications10.1016/j.eswa.2023.121021234(121021)Online publication date: Dec-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '05: Proceedings of the 14th ACM international conference on Information and knowledge management
October 2005
854 pages
ISBN:1595931406
DOI:10.1145/1099554
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 October 2005

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. document keyphrase
  2. document metadata
  3. keyphrase extraction
  4. text mining

Qualifiers

  • Article

Conference

CIKM05
Sponsor:
CIKM05: Conference on Information and Knowledge Management
October 31 - November 5, 2005
Bremen, Germany

Acceptance Rates

CIKM '05 Paper Acceptance Rate 77 of 425 submissions, 18%;
Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)29
  • Downloads (Last 6 weeks)4
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Contextual topic discovery using unsupervised keyphrase extraction and hierarchical semantic graph modelJournal of Big Data10.1186/s40537-023-00833-110:1Online publication date: 12-Oct-2023
  • (2023)Exploring Potential Drivers Influencing BIM Adoption in the AEC Industry: A Systematic Approach2023 IEEE International Conference on Computing (ICOCO)10.1109/ICOCO59262.2023.10397641(124-129)Online publication date: 9-Oct-2023
  • (2023)Incorporation of company-related factual knowledge into pre-trained language models for stock-related spam tweet filteringExpert Systems with Applications10.1016/j.eswa.2023.121021234(121021)Online publication date: Dec-2023
  • (2022)Intelligent RFQ Summarization Using Natural Language Processing, Text Mining, and Machine Learning TechniquesJournal of Global Information Management10.4018/JGIM.30908230:1(1-26)Online publication date: 10-Aug-2022
  • (2022)Using Semantic Role Knowledge for Relevance Ranking of Key Phrases in Documents: An Unsupervised ApproachProceedings of the 5th Joint International Conference on Data Science & Management of Data (9th ACM IKDD CODS and 27th COMAD)10.1145/3493700.3493702(120-124)Online publication date: 8-Jan-2022
  • (2022)Keyphrases Frequency Analysis From Research Articles: A Region-Based Unsupervised Novel ApproachIEEE Access10.1109/ACCESS.2022.319895910(120838-120849)Online publication date: 2022
  • (2022)Designing an efficient unigram keyword detector for documents using Relative EntropyMultimedia Tools and Applications10.1007/s11042-022-12657-x81:26(37747-37761)Online publication date: 22-Apr-2022
  • (2022)A University Portrait System Incorporating Academic Social NetworkComputer Supported Cooperative Work and Social Computing10.1007/978-981-19-4549-6_3(25-36)Online publication date: 22-Jul-2022
  • (2021)Unsupervised Keyword Combination Query Generation from Online Health Related Content for Evidence-Based Fact CheckingThe 23rd International Conference on Information Integration and Web Intelligence10.1145/3487664.3487701(267-277)Online publication date: 29-Nov-2021
  • (2021)Improving TextRank Algorithm for Automatic Keyword Extraction with Tolerance Rough SetInternational Journal of Fuzzy Systems10.1007/s40815-021-01190-y24:3(1332-1342)Online publication date: 21-Nov-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media