skip to main content
10.1145/3307339.3342147acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
research-article

NamedKeys: Unsupervised Keyphrase Extraction for Biomedical Documents

Published: 04 September 2019 Publication History

Abstract

A vast amount of biomedical literature is generated and digitized every year. As a result is a growing need to develop methods for discovering, accessing, and sharing knowledge from medical literature. Keyphrase extraction is the task of summarizing a text by identifying the key concepts. The keyphrases can be single-word or multi-word linguistic units which can concisely represent a document. Although a variety of models have been proposed for automated keyphrase extraction, the performance is poor in comparison with other natural language processing tasks. The problem is even more daunting for biomedical domain where the text is filled with highly domain-specific terminologies. We propose a new method, NamedKeys, to automatically identify meaningful and informative keyphrases from biomedical text. NamedKeys integrates named entity recognition, phrase embedding, phrase quality scoring, ranking, and clustering to extract author-assigned keywords from biomedical documents. Performance evaluation on PubMed abstracts demonstrates that NamedKeys achieves significant improvements over existing state-of-the-art keyphrase extraction models. Furthermore, we propose the first benchmark dataset for keyphrase extraction from biomedical text.

References

[1]
Isabelle Augenstein, Mrinal Das, Sebastian Riedel, Lakshmi Vikraman, and Andrew McCallum. 2017. Semeval 2017 task 10: Scienceie-extracting keyphrases and relations from scientific publications. arXiv preprint arXiv:1704.02853 (2017).
[2]
Santosh Kumar Bharti and Korra Sathya Babu. 2017. Automatic keyword extraction for text summarization: A survey. arXiv preprint arXiv:1704.03242 (2017).
[3]
Willie Boag, Elena Sergeeva, Saurabh Kulshreshtha, Peter Szolovits, Anna Rumshisky, and Tristan Naumann. 2018. CliNER 2.0: Accessible and Accurate Clinical Concept Extraction. arXiv preprint arXiv:1803.02245 (2018).
[4]
Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics 5 (2017), 135--146.
[5]
Florian Boudin. 2016. pke: an open source python-based keyphrase extraction toolkit. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations. Osaka, Japan, 69--73. http: //aclweb.org/anthology/C16--2015
[6]
Florian Boudin. 2018. Unsupervised keyphrase extraction with multipartite graphs. arXiv preprint arXiv:1803.08721 (2018).
[7]
Adrien Bougouin, Florian Boudin, and Béatrice Daille. 2013. Topicrank: Graphbased topic ranking for keyphrase extraction. In International Joint Conference on Natural Language Processing (IJCNLP). 543--551.
[8]
Ricardo Campos, Vítor Mangaravite, Arian Pasquali, Alípio Mário Jorge, Célia Nunes, andAdam Jatowt. 2018. YAKE! collection-independent automatic keyword extractor. In European Conference on Information Retrieval. Springer, 806--810.
[9]
Jason Chuang, Christopher D Manning, and Jeffrey Heer. 2012. "Without the clutter of unimportant words": Descriptive keyphrases for text visualization. ACM Transactions on Computer-Human Interaction (TOCHI) 19, 3 (2012), 19.
[10]
Young Mee Chung and Jae Yun Lee. 2001. A corpus-based approach to comparative evaluation of statistical term association measures. Journal of the American Society for Information Science and Technology 52, 4 (Jan. 2001), 283--296.
[11]
Frans Coenen, Paul Leng, Robert Sanderson, and Yanbo J Wang. 2007. Statistical identification of key phrases for text classification. In International Workshop on Machine Learning and Data Mining in Pattern Recognition. Springer, 838--853.
[12]
Samhaa R El-Beltagy and Ahmed Rafea. 2010. Kp-miner: Participation in semeval- 2. In Proceedings of the 5th international workshop on semantic evaluation. 190--193.
[13]
Corina Florescu and Cornelia Caragea. 2017. Positionrank: An unsupervised approach to keyphrase extraction from scholarly documents. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1105--1115.
[14]
Brendan J Frey and Delbert Dueck. 2007. Clustering by passing messages between data points. science 315, 5814 (2007), 972--976.
[15]
Zelalem Gero and Joyce C. Ho. 2019. PMCVec: Distributed phrase representation for biomedical text processing. Journal of biomedical Informatics, in press (2019).
[16]
Glove vec {n. d.}. GloVe: Global Vectors for Word Representation. https://nlp. stanford.edu/projects/glove/.
[17]
Google {n. d.}. word2vec: Tool for computing continuous distributed representations of words. https://code.google.com/archive/p/word2vec/.
[18]
Kazi Saidul Hasan and Vincent Ng. 2014. Automatic keyphrase extraction: A survey of the state of the art. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vol. 1. 1262-- 1273.
[19]
Aminul Islam, Evangelos E Milios, and Vlado Keselj. 2012. Comparing word relatedness measures based on Google n-grams. In Proceedings of COLING 2012: Posters. 495--506.
[20]
Xin Jiang, Yunhua Hu, and Hang Li. 2009. A ranking approach to keyphrase extraction. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval. ACM, 756--757.
[21]
Su Nam Kim and Min-Yen Kan. 2009. Re-examining automatic keyphrase extraction approaches in scientific articles. In Proceedings of the workshop on multiword expressions: Identification, interpretation, disambiguation and applications. Association for Computational Linguistics, 9--16.
[22]
G Hemantha Kumar, Seyedmahmoud Talebi, and K Manoj. 2017. Users' Topic Detection from Tweets based on Keyword Extraction. International Journal of Computer Applications 975 (2017), 8887.
[23]
Quanzhi Li and Yi-Fang Brook Wu. 2006. Identifying important concepts from medical documents. Journal of biomedical informatics 39, 6 (2006), 668--679.
[24]
Zhiyuan Liu, Wenyi Huang, Yabin Zheng, and Maosong Sun. 2010. Automatic keyphrase extraction via topic decomposition. In Proceedings of the 2010 conference on empirical methods in natural language processing. Association for Computational Linguistics, 366--376.
[25]
Zhiyuan Liu, Peng Li, Yabin Zheng, and Maosong Sun. 2009. Clustering to find exemplar terms for keyphrase extraction. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1-Volume 1. Association for Computational Linguistics, 257--266.
[26]
Patrice Lopez and Laurent Romary. 2010. HUMB: Automatic key term extraction from scientific articles in GROBID. In Proceedings of the 5th international workshop on semantic evaluation. Association for Computational Linguistics, 248--251.
[27]
Debanjan Mahata, John Kuriakose, Rajiv Ratn Shah, and Roger Zimmermann. 2018. Key2vec: Automatic ranked keyphrase extraction from scientific articles using phrase embeddings. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). 634--639.
[28]
Yuqing Mao and Zhiyong Lu. 2017. MeSH Now: automatic MeSH indexing at scale via learning to rank. Journal of biomedical semantics 8, 1 (2017), 15.
[29]
Rada Mihalcea and Paul Tarau. 2004. Textrank: Bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing.
[30]
Naw Naw and Ei Ei Hlaing. 2013. Relevant words extraction method for recommendation system. Bulletin of Electrical Engineering and Informatics 2, 3 (2013), 169--176.
[31]
Mark Neumann, Daniel King, Iz Beltagy, and Waleed Ammar. 2019. ScispaCy: Fast and Robust Models for Biomedical Natural Language Processing. arXiv preprint arXiv:1902.07669 (2019).
[32]
Aurélie Névéol, Rezarta Islamaj Doan, and Zhiyong Lu. 2010. Author keywords in biomedical journal articles. In AMIA annual symposium proceedings, Vol. 2010. American Medical Informatics Association, 537.
[33]
Aditya Parameswaran, Hector Garcia-Molina, and Anand Rajaraman. 2010. Towards the web of concepts: Extracting concepts from large datasets. Proceedings of the VLDB Endowment 0, 1--2 (2010), 566--577.
[34]
Vahed Qazvinian, Dragomir R Radev, and Arzucan Ozgur. 2010. Citation summarization through keyphrase extraction. In Proceedings of the 23rd international conference on computational linguistics (COLING 2010). 895--903.
[35]
Wullianallur Raghupathi and Viju Raghupathi. 2014. Big data analytics in healthcare: promise and potential. Health information science and systems 2, 1 (2014), 3.
[36]
Kamal Sarkar. 2013. A hybrid approach to extract keyphrases from medical documents. arXiv preprint arXiv:1303.1441 (2013).
[37]
Kamal Sarkar. 2014. A keyphrase-based approach to text summarization for English and bengali documents. International Journal of Technology Diffusion (IJTD) 5, 2 (2014), 28--38.
[38]
Stamatina Thomaidou and Michalis Vazirgiannis. 2011. Multiword keyword recommendation system for online advertising. In 2011 International Conference on Advances in Social Networks Analysis and Mining. IEEE, 423--427.
[39]
Takashi Tomokiyo and Matthew Hurst. 2003. A language model approach to keyphrase extraction. In Proceedings of the ACL 2003 workshop on Multiword expressions: analysis, acquisition and treatment.
[40]
Peter D Turney. 2000. Learning algorithms for keyphrase extraction. Information retrieval 2, 4 (2000), 303--336.
[41]
Peter D Turney. 2002. Learning to extract keyphrases from text. arXiv preprint cs/0212013 (2002).
[42]
Xiaojun Wan and Jianguo Xiao. 2008. CollabRank: towards a collaborative approach to single-document keyphrase extraction. In Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1. Association for Computational Linguistics, 969--976.
[43]
Rui Wang, Wei Liu, and Chris McDonald. 2014. Corpus-independent generic keyphrase extraction using word embedding vectors. In Software Engineering Research Conference, Vol. 39.
[44]
Christian Wartena and Rogier Brussee. 2008. Topic detection by clustering keywords. In 2008 19th International Workshop on Database and Expert Systems Applications. IEEE, 54--58.
[45]
Ian H Witten, Gordon W Paynter, Eibe Frank, Carl Gutwin, and Craig G Nevill- Manning. 2005. Kea: Practical automated keyphrase extraction. In Design and Usability of Digital Libraries: Case Studies in the Asia Pacific. IGI Global, 129--152.
[46]
Wen-tau Yih, Joshua Goodman, and Vitor R Carvalho. 2006. Finding advertising keywords on web pages. In Proceedings of the 15th international conference on World Wide Web. ACM, 213--222.

Cited By

View all
  • (2024)A Centrality-Weighted Bidirectional Encoder Representation from Transformers Model for Enhanced Sequence Labeling in Key Phrase Extraction from Scientific TextsBig Data and Cognitive Computing10.3390/bdcc81201828:12(182)Online publication date: 4-Dec-2024
  • (2024)Cross-Domain Robustness of Transformer-Based Keyphrase GenerationData Analytics and Management in Data Intensive Domains10.1007/978-3-031-67826-4_19(249-265)Online publication date: 1-Oct-2024
  • (2022)Domain-Specific Keyword Extraction Using Joint Modeling of Local and Global Contextual SemanticsACM Transactions on Knowledge Discovery from Data10.1145/349456016:4(1-30)Online publication date: 8-Jan-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
BCB '19: Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics
September 2019
716 pages
ISBN:9781450366663
DOI:10.1145/3307339
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 September 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. concept extraction
  2. document summarization
  3. keyphrase extraction
  4. phrase embedding

Qualifiers

  • Research-article

Conference

BCB '19
Sponsor:

Acceptance Rates

BCB '19 Paper Acceptance Rate 42 of 157 submissions, 27%;
Overall Acceptance Rate 254 of 885 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)20
  • Downloads (Last 6 weeks)1
Reflects downloads up to 19 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A Centrality-Weighted Bidirectional Encoder Representation from Transformers Model for Enhanced Sequence Labeling in Key Phrase Extraction from Scientific TextsBig Data and Cognitive Computing10.3390/bdcc81201828:12(182)Online publication date: 4-Dec-2024
  • (2024)Cross-Domain Robustness of Transformer-Based Keyphrase GenerationData Analytics and Management in Data Intensive Domains10.1007/978-3-031-67826-4_19(249-265)Online publication date: 1-Oct-2024
  • (2022)Domain-Specific Keyword Extraction Using Joint Modeling of Local and Global Contextual SemanticsACM Transactions on Knowledge Discovery from Data10.1145/349456016:4(1-30)Online publication date: 8-Jan-2022
  • (2022)Evaluating Keyphrase Extraction Methods for Clustering Influenza-Related Scientific Papers2022 8th Iranian Conference on Signal Processing and Intelligent Systems (ICSPIS)10.1109/ICSPIS56952.2022.10043863(1-7)Online publication date: 28-Dec-2022
  • (2021)Unsupervised Keyword Combination Query Generation from Online Health Related Content for Evidence-Based Fact CheckingThe 23rd International Conference on Information Integration and Web Intelligence10.1145/3487664.3487701(267-277)Online publication date: 29-Nov-2021
  • (2021)Attention-based Unsupervised Keyphrase Extraction and Phrase Graph for COVID-19 Medical Literature RetrievalACM Transactions on Computing for Healthcare10.1145/34739393:1(1-16)Online publication date: 15-Oct-2021
  • (2021)Keyword Extraction from Biomedical Documents Using Deep Contextualized Embeddings2021 International Conference on INnovations in Intelligent SysTems and Applications (INISTA)10.1109/INISTA52262.2021.9548470(1-5)Online publication date: 25-Aug-2021
  • (2021)Uncertainty-based Self-training for Biomedical Keyphrase Extraction2021 IEEE EMBS International Conference on Biomedical and Health Informatics (BHI)10.1109/BHI50953.2021.9508592(1-4)Online publication date: 27-Jul-2021
  • (2020)Ir-ManProceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics10.1145/3388440.3412417(1-9)Online publication date: 21-Sep-2020

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media