research-article

Attention-based Unsupervised Keyphrase Extraction and Phrase Graph for COVID-19 Medical Literature Retrieval

Authors:
Haoran Ding

Indiana University-Purdue University Indianapolis, Indianapolis, IN

Indiana University-Purdue University Indianapolis, Indianapolis, IN
View Profile

,
Xiao Luo

Indiana University-Purdue University Indianapolis, Indianapolis, IN

Indiana University-Purdue University Indianapolis, Indianapolis, IN
View Profile

Authors Info & Claims

ACM Transactions on Computing for Healthcare Volume 3 Issue 1Article No.: 12pp 1–16https://doi.org/10.1145/3473939

Published:15 October 2021Publication History

ACM Transactions on Computing for Healthcare

Abstract

Searching, reading, and finding information from the massive medical text collections are challenging. A typical biomedical search engine is not feasible to navigate each article to find critical information or keyphrases. Moreover, few tools provide a visualization of the relevant phrases to the query. However, there is a need to extract the keyphrases from each document for indexing and efficient search. The transformer-based neural networks—BERT has been used for various natural language processing tasks. The built-in self-attention mechanism can capture the associations between words and phrases in a sentence. This research investigates whether the self-attentions can be utilized to extract keyphrases from a document in an unsupervised manner and identify relevancy between phrases to construct a query relevancy phrase graph to visualize the search corpus phrases on their relevancy and importance. The comparison with six baseline methods shows that the self-attention-based unsupervised keyphrase extraction works well on a medical literature dataset. This unsupervised keyphrase extraction model can also be applied to other text data. The query relevancy graph model is applied to the COVID-19 literature dataset and to demonstrate that the attention-based phrase graph can successfully identify the medical phrases relevant to the query terms.

REFERENCES

[1] Wang L. L., Lo K., Chandrasekhar Y., Reas R., Yang J., Eide D., Funk K., Kinney R., Liu Z., Merrill W., and Mooney P.. 2020. “Cord-19: The covid-19 open research dataset.” ArXiv. 2020 Jul 9.Google Scholar
[2] 2020. People with Certain Medical Conditions. Retrieved from https://www.cdc.gov/coronavirus/2019-ncov/need-extra-precautions/people-with-medical-conditions.html.Google Scholar
[3] Centers for Disease Control and Prevention. 2020. Symptoms of COVID-19. Retrieved on 7 September, 2021 from https://www.cdc.gov/coronavirus/2019-ncov/symptoms-testing/symptoms.html.Google Scholar
[4] World Health Organization (WHO). 2020. Global Research on Coronavirus Disease (COVID-19). Retrieved 7 September, 2021 from https://www.who.int/emergencies/diseases/novel-coronavirus-2019/global-research-on-novel-coronavirus-2019-ncov. 2020. World Health Organization (WHO) Global Research on Coronavirus Disease (COVID-19). Retrieved from https://www.who.int/emergencies/diseases/novel-coronavirus-2019/global-research-on-novel-coronavirus-2019-ncov.Google Scholar
[5] Alsentzer Emily, Murphy John R., Boag Willie, Weng Wei-Hung, Jin Di, Naumann Tristan, and McDermott Matthew. 2019. Publicly available clinical BERT embeddings. arXiv:1904.03323 (2019).Google Scholar
[6] Amer Eslam and Fouad Khaled M.. 2016. Keyphrase extraction methodology from short abstracts of medical documents. In 8th Cairo International Biomedical Engineering Conference (CIBEC). IEEE, 23–26.Google Scholar
[7] Aronson Alan R. 2001. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. In Proceedings of the AMIA Symposium. American Medical Informatics Association.Google Scholar
[8] Auer Sören, Bizer Christian, Kobilarov Georgi, Lehmann Jens, Cyganiak Richard, and Ives Zachary. 2007. DBpedia: A nucleus for a web of open data. In The Semantic Web. Springer, 722–735. Google ScholarDigital Library
[9] Beers M. H. and Porter R. S.. 2016. Merck Diagnostic and Treatment Manual.Google Scholar
[10] Bennani-Smires Kamil, Musat Claudiu, Hossmann Andreea, Baeriswyl Michael, and Jaggi Martin. 2018. Simple unsupervised keyphrase extraction using sentence embeddings. In Proceedings of CoNLL.Google Scholar
[11] Bougouin Adrien, Boudin Florian, and Daille Béatrice. 2013. Topicrank: Graph-based topic ranking for keyphrase extraction.Google Scholar
[12] Campos Ricardo, Mangaravite Vítor, Pasquali Arian, Jorge Alípio Mário, Nunes Célia, and Jatowt Adam. 2018. YAKE! Keyword extraction from single documents using multiple local features. Information Science 509 (2020), 257–289. YAKE! collection-independent automatic keyword extractor. In European Conference on Information Retrieval. Springer, 806–810.Google Scholar
[13] Clark Kevin, Khandelwal Urvashi, Levy Omer, and Manning Christopher D. 2019. What does BERT look at? An analysis of BERT’s attention. arXiv:1906.04341 (2019).Google Scholar
[14] Czeisler Mark É., Lane Rashon I., Petrosky Emiko, Wiley Joshua F., Christensen Aleta, Njai Rashid, Weaver Matthew D., Robbins Rebecca, Facer-Childs Elise R., Barger Laura K. et al. 2020. Mental health, substance use, and suicidal ideation during the COVID-19 pandemic–United States, June 24–30, 2020. Morbid. Mortal. Week. Rep. 69, 32 (2020), 1049.Google Scholar
[15] Devlin Jacob, Chang Ming-Wei, Lee Kenton, and Toutanova Kristina. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 (2018).Google Scholar
[16] Gero Zelalem and Ho Joyce C.. 2019. NamedKeys: Unsupervised keyphrase extraction for biomedical documents. In 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics. 328–337. Google ScholarDigital Library
[17] Goodwin Travis and Harabagiu Sanda M.. 2013. Automatic generation of a qualified medical knowledge graph and its usage for retrieving patient cohorts from electronic medical records. In IEEE 7th International Conference on Semantic Computing. IEEE, 363–370. Google ScholarDigital Library
[18] Gordon Michael and Pathak Praveen. 1999. Finding information on the World Wide Web: The retrieval effectiveness of search engines. Inf. Process. Manag. 35, 2 (1999), 141–180. Google ScholarDigital Library
[19] Hagberg Aric, Swart Pieter, and Chult Daniel S.. 2008. Exploring Network Structure, Dynamics, and Function Using NetworkX. Technical Report. Los Alamos National Lab.(LANL), Los Alamos, NM.Google Scholar
[20] Hersh William. 2008. Information Retrieval: A Health and Biomedical Perspective. Springer Science & Business Media. Google ScholarDigital Library
[21] Himani Shah and Vaidehi Dattani. 2017. A survey on medical information retrieval. In International Conference on Information and Communication Technology for Intelligent Systems. Springer, 543–550.Google Scholar
[22] Huang Shanshan and Wan Xiaojun. 2013. AKMiner: Domain-specific knowledge graph mining from academic literatures. In International Conference on Web Information Systems Engineering. Springer, 241–255.Google Scholar
[23] Jordan Rachel E., Adab Peymane, and Cheng K. K.. 2020. Covid-19: risk factors for severe disease and death.Google Scholar
[24] Li Jian, Tu Zhaopeng, Yang Baosong, Lyu Michael R., and Zhang Tong. 2018. Multi-head attention with disagreement regularization. arXiv preprint arXiv: 1810.10183Google Scholar
[25] Lindberg Donald A. B., Humphreys Betsy L., and McCray Alexa T.. 1993. The unified medical language system. Meth Inf. Med. 32, 4 (1993), 281.Google ScholarCross Ref
[26] Mahata Debanjan, Kuriakose John, Shah Rajiv, and Zimmermann Roger. 2018. Key2Vec: Automatic ranked keyphrase extraction from scientific articles using phrase embeddings. In Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 634–639.Google Scholar
[27] McGuinness Deborah L. and Harmelen Frank van. 2013. OWL web ontology language overview. W3C recommendation, W3C, Feb. 2004.Google Scholar
[28] Nelson Stuart J., Johnston W. Douglas, and Humphreys Betsy L.. 2001. Relationships in medical subject headings (MeSH). In Relationships in the Organization of Knowledge. Springer, 171–184.Google Scholar
[29] Neumann Mark, King Daniel, Beltagy Iz, and Ammar Waleed. 2019. ScispaCy: Fast and robust models for biomedical natural language processing. arXiv:1902.07669 (2019).Google Scholar
[30] Ni Wentao, Yang Xiuwen, Yang Deqing, Bao Jing, Li Ran, Xiao Yongjiu, Hou Chang, Wang Haibin, Liu Jie, Yang Donghong et al. 2020. Role of angiotensin-converting enzyme 2 (ACE2) in COVID-19. Crit. Care 24, 1 (2020), 1–10.Google Scholar
[31] Page Lawrence, Brin Sergey, Motwani Rajeev, and Winograd Terry. 1999. The PageRank Citation Ranking: Bringing Order to the Web.Technical Report. Stanford InfoLab.Google Scholar
[32] Papagiannopoulou Eirini and Tsoumakas Grigorios. 2018. Local word vectors guiding keyphrase extraction. Inf. Process. Manag. 54, 6 (2018), 888–902.Google ScholarCross Ref
[33] Pujara Jay, Miao Hui, Getoor Lise, and Cohen William. 2013. Knowledge graph identification. In International Semantic Web Conference. Springer, 542–557. Google ScholarDigital Library
[34] Ramaswami Prem. 2015. A remedy for your health-related questions: Health info in the knowledge graph. Google Official Blog 2018 (2015).Google Scholar
[35] Rose Stuart, Engel Dave, Cramer Nick, and Cowley Wendy. 2010. Automatic keyword extraction from individual documents. Text Mining: Applic. Theor. 1 (2010), 1–20.Google Scholar
[36] Rotmensch Maya, Halpern Yoni, Tlimat Abdulhakim, Horng Steven, and Sontag David. 2017. Learning a health knowledge graph from electronic medical records. Sci. Rep. 7, 1 (2017), 1–11.Google ScholarCross Ref
[37] Shi Longxiang, Li Shijian, Yang Xiaoran, Qi Jiaheng, Pan Gang, and Zhou Binbin. 2017. Semantic health knowledge graph: Semantic integration of heterogeneous medical knowledge and services. BioMed Res. Int. 2017 (2017).Google Scholar
[38] Sun Yi, Qiu Hangping, Zheng Yu, Wang Zhongwei, and Zhang Chaoran. 2020. SIFRank: A new baseline for unsupervised keyphrase extraction based on pre-trained language model. IEEE Access 8 (2020), 10896–10906.Google ScholarCross Ref
[39] Tang Yuanhua, Hu Qianjin, Yang Yonghong, Chen Chunnuan, and Mei Minghua. 2008. Full Text Query and Search Systems and Method of Use.US Patent App. 11/740, 247.Google Scholar
[40] Tsuruoka Yoshimasa, Tateishi Yuka, Kim Jin-Dong, Ohta Tomoko, McNaught John, Ananiadou Sophia, and Tsujii Jun’ichi. 2005. Developing a robust part-of-speech tagger for biomedical text. In Panhellenic Conference on Informatics. Springer, 382–392. Google ScholarDigital Library
[41] Vasiliev Yuli. 2020. Natural Language Processing with Python and SpaCy: A Practical Introduction. No Starch Press.Google Scholar
[42] Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N., Kaiser Łukasz, and Polosukhin Illia. 2017. Attention is all you need. In International Conference on Advances in Neural Information Processing Systems. 5998–6008. Google ScholarDigital Library
[43] Wan Xiaojun and Xiao Jianguo. 2008. Single document keyphrase extraction using neighborhood knowledge. In Association for the Advancement of Artificial Intelligence Conference. 855–860. Google ScholarDigital Library
[44] Wang Chengbin, Ma Xiaogang, Chen Jianguo, and Chen Jingwen. 2018. Information extraction and knowledge graph construction from geoscience literature. Comput. Geosci. 112 (2018), 112–120.Google ScholarCross Ref
[45] Wang Rui, Liu Wei, and McDonald Chris. 2014. Corpus-independent generic keyphrase extraction using word embedding vectors. In Software Engineering Research Conference. 1–8.Google Scholar
[46] Zhao Sendong, Huang Yong, Su Chang, Li Yuantong, and Wang Fei. 2020. Interactive attention networks for semantic text matching. In IEEE International Conference on Data Mining (ICDM). IEEE, 861–870.Google Scholar
[47] Zhao Xuejiao, Xing Zhenchang, Kabir Muhammad Ashad, Sawada Naoya, Li Jing, and Lin Shang-Wei. 2017. HDSKG: Harvesting domain specific knowledge graph from content of webpages. In IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 56–67.Google Scholar

Index Terms

Attention-based Unsupervised Keyphrase Extraction and Phrase Graph for COVID-19 Medical Literature Retrieval

Recommendations

Domain-specific keyphrase extraction
CIKM '05: Proceedings of the 14th ACM international conference on Information and knowledge management

Document keyphrases provide semantic metadata characterizing documents and producing an overview of the content of a document. They can be used in many text-mining and knowledge management related applications. This paper describes a Keyphrase ...
Read More
Automatic keyphrase extraction for Arabic news documents based on KEA system

A keyphrase is a sequence of words that play an important role in the identification of the topics that are embedded in a given document. Keyphrase extraction is a process which extracts such phrases. This has many important applications such as document ...
Read More
The impact of document structure on keyphrase extraction
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management

Keyphrases are short phrases that reflect the main topic of a document. Because manually annotating documents with keyphrases is a time-consuming process, several automatic approaches have been developed. Typically, candidate phrases are extracted using ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Computing for Healthcare Volume 3, Issue 1
January 2022
255 pages
EISSN:2637-8051
DOI:10.1145/3485154
Editors:
Insup Lee
University of Pennsylvania, USA
,
John A. Stankovic
University of Virginia, USA
Issue’s Table of Contents
Copyright © 2021 Copyright held by the owner/author(s). Publication rights licensed to ACM.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 October 2021
- Accepted: 1 July 2021
- Revised: 1 May 2021
- Received: 1 January 2021
Published in health Volume 3, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Keyphrase extraction
deep learning
medical information retrieval
COVID-19
Qualifiers
- research-article
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 486
  Total Downloads
- Downloads (Last 12 months)95
- Downloads (Last 6 weeks)15
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

HTML Format

View this article in HTML Format .

View HTML Format

Attention-based Unsupervised Keyphrase Extraction and Phrase Graph for COVID-19 Medical Literature Retrieval

ACM Transactions on Computing for Healthcare

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Domain-specific keyphrase extraction

Automatic keyphrase extraction for Arabic news documents based on KEA system

The impact of document structure on keyphrase extraction

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Full Text

HTML Format

Caption

Attention-based Unsupervised Keyphrase Extraction and Phrase Graph for COVID-19 Medical Literature Retrieval

ACM Transactions on Computing for Healthcare

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Domain-specific keyphrase extraction

Automatic keyphrase extraction for Arabic news documents based on KEA system

The impact of document structure on keyphrase extraction

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Full Text

HTML Format

Share this Publication link

Share on Social Media