skip to main content
research-article

Attention-based Unsupervised Keyphrase Extraction and Phrase Graph for COVID-19 Medical Literature Retrieval

Authors Info & Claims
Published:15 October 2021Publication History
Skip Abstract Section

Abstract

Searching, reading, and finding information from the massive medical text collections are challenging. A typical biomedical search engine is not feasible to navigate each article to find critical information or keyphrases. Moreover, few tools provide a visualization of the relevant phrases to the query. However, there is a need to extract the keyphrases from each document for indexing and efficient search. The transformer-based neural networks—BERT has been used for various natural language processing tasks. The built-in self-attention mechanism can capture the associations between words and phrases in a sentence. This research investigates whether the self-attentions can be utilized to extract keyphrases from a document in an unsupervised manner and identify relevancy between phrases to construct a query relevancy phrase graph to visualize the search corpus phrases on their relevancy and importance. The comparison with six baseline methods shows that the self-attention-based unsupervised keyphrase extraction works well on a medical literature dataset. This unsupervised keyphrase extraction model can also be applied to other text data. The query relevancy graph model is applied to the COVID-19 literature dataset and to demonstrate that the attention-based phrase graph can successfully identify the medical phrases relevant to the query terms.

REFERENCES

  1. [1] Wang L. L., Lo K., Chandrasekhar Y., Reas R., Yang J., Eide D., Funk K., Kinney R., Liu Z., Merrill W., and Mooney P.. 2020. “Cord-19: The covid-19 open research dataset.” ArXiv. 2020 Jul 9.Google ScholarGoogle Scholar
  2. [2] 2020. People with Certain Medical Conditions. Retrieved from https://www.cdc.gov/coronavirus/2019-ncov/need-extra-precautions/people-with-medical-conditions.html.Google ScholarGoogle Scholar
  3. [3] Centers for Disease Control and Prevention. 2020. Symptoms of COVID-19. Retrieved on 7 September, 2021 from https://www.cdc.gov/coronavirus/2019-ncov/symptoms-testing/symptoms.html.Google ScholarGoogle Scholar
  4. [4] World Health Organization (WHO). 2020. Global Research on Coronavirus Disease (COVID-19). Retrieved 7 September, 2021 from https://www.who.int/emergencies/diseases/novel-coronavirus-2019/global-research-on-novel-coronavirus-2019-ncov. 2020. World Health Organization (WHO) Global Research on Coronavirus Disease (COVID-19). Retrieved from https://www.who.int/emergencies/diseases/novel-coronavirus-2019/global-research-on-novel-coronavirus-2019-ncov.Google ScholarGoogle Scholar
  5. [5] Alsentzer Emily, Murphy John R., Boag Willie, Weng Wei-Hung, Jin Di, Naumann Tristan, and McDermott Matthew. 2019. Publicly available clinical BERT embeddings. arXiv:1904.03323 (2019).Google ScholarGoogle Scholar
  6. [6] Amer Eslam and Fouad Khaled M.. 2016. Keyphrase extraction methodology from short abstracts of medical documents. In 8th Cairo International Biomedical Engineering Conference (CIBEC). IEEE, 2326.Google ScholarGoogle Scholar
  7. [7] Aronson Alan R. 2001. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. In Proceedings of the AMIA Symposium. American Medical Informatics Association.Google ScholarGoogle Scholar
  8. [8] Auer Sören, Bizer Christian, Kobilarov Georgi, Lehmann Jens, Cyganiak Richard, and Ives Zachary. 2007. DBpedia: A nucleus for a web of open data. In The Semantic Web. Springer, 722735. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Beers M. H. and Porter R. S.. 2016. Merck Diagnostic and Treatment Manual.Google ScholarGoogle Scholar
  10. [10] Bennani-Smires Kamil, Musat Claudiu, Hossmann Andreea, Baeriswyl Michael, and Jaggi Martin. 2018. Simple unsupervised keyphrase extraction using sentence embeddings. In Proceedings of CoNLL.Google ScholarGoogle Scholar
  11. [11] Bougouin Adrien, Boudin Florian, and Daille Béatrice. 2013. Topicrank: Graph-based topic ranking for keyphrase extraction.Google ScholarGoogle Scholar
  12. [12] Campos Ricardo, Mangaravite Vítor, Pasquali Arian, Jorge Alípio Mário, Nunes Célia, and Jatowt Adam. 2018. YAKE! Keyword extraction from single documents using multiple local features. Information Science 509 (2020), 257–289. YAKE! collection-independent automatic keyword extractor. In European Conference on Information Retrieval. Springer, 806810.Google ScholarGoogle Scholar
  13. [13] Clark Kevin, Khandelwal Urvashi, Levy Omer, and Manning Christopher D. 2019. What does BERT look at? An analysis of BERT’s attention. arXiv:1906.04341 (2019).Google ScholarGoogle Scholar
  14. [14] Czeisler Mark É., Lane Rashon I., Petrosky Emiko, Wiley Joshua F., Christensen Aleta, Njai Rashid, Weaver Matthew D., Robbins Rebecca, Facer-Childs Elise R., Barger Laura K. et al. 2020. Mental health, substance use, and suicidal ideation during the COVID-19 pandemic–United States, June 24–30, 2020. Morbid. Mortal. Week. Rep. 69, 32 (2020), 1049.Google ScholarGoogle Scholar
  15. [15] Devlin Jacob, Chang Ming-Wei, Lee Kenton, and Toutanova Kristina. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 (2018).Google ScholarGoogle Scholar
  16. [16] Gero Zelalem and Ho Joyce C.. 2019. NamedKeys: Unsupervised keyphrase extraction for biomedical documents. In 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics. 328337. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] Goodwin Travis and Harabagiu Sanda M.. 2013. Automatic generation of a qualified medical knowledge graph and its usage for retrieving patient cohorts from electronic medical records. In IEEE 7th International Conference on Semantic Computing. IEEE, 363370. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] Gordon Michael and Pathak Praveen. 1999. Finding information on the World Wide Web: The retrieval effectiveness of search engines. Inf. Process. Manag. 35, 2 (1999), 141180. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] Hagberg Aric, Swart Pieter, and Chult Daniel S.. 2008. Exploring Network Structure, Dynamics, and Function Using NetworkX. Technical Report. Los Alamos National Lab.(LANL), Los Alamos, NM.Google ScholarGoogle Scholar
  20. [20] Hersh William. 2008. Information Retrieval: A Health and Biomedical Perspective. Springer Science & Business Media. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. [21] Himani Shah and Vaidehi Dattani. 2017. A survey on medical information retrieval. In International Conference on Information and Communication Technology for Intelligent Systems. Springer, 543550.Google ScholarGoogle Scholar
  22. [22] Huang Shanshan and Wan Xiaojun. 2013. AKMiner: Domain-specific knowledge graph mining from academic literatures. In International Conference on Web Information Systems Engineering. Springer, 241255.Google ScholarGoogle Scholar
  23. [23] Jordan Rachel E., Adab Peymane, and Cheng K. K.. 2020. Covid-19: risk factors for severe disease and death.Google ScholarGoogle Scholar
  24. [24] Li Jian, Tu Zhaopeng, Yang Baosong, Lyu Michael R., and Zhang Tong. 2018. Multi-head attention with disagreement regularization. arXiv preprint arXiv: 1810.10183Google ScholarGoogle Scholar
  25. [25] Lindberg Donald A. B., Humphreys Betsy L., and McCray Alexa T.. 1993. The unified medical language system. Meth Inf. Med. 32, 4 (1993), 281.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Mahata Debanjan, Kuriakose John, Shah Rajiv, and Zimmermann Roger. 2018. Key2Vec: Automatic ranked keyphrase extraction from scientific articles using phrase embeddings. In Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 634639.Google ScholarGoogle Scholar
  27. [27] McGuinness Deborah L. and Harmelen Frank van. 2013. OWL web ontology language overview. W3C recommendation, W3C, Feb. 2004.Google ScholarGoogle Scholar
  28. [28] Nelson Stuart J., Johnston W. Douglas, and Humphreys Betsy L.. 2001. Relationships in medical subject headings (MeSH). In Relationships in the Organization of Knowledge. Springer, 171184.Google ScholarGoogle Scholar
  29. [29] Neumann Mark, King Daniel, Beltagy Iz, and Ammar Waleed. 2019. ScispaCy: Fast and robust models for biomedical natural language processing. arXiv:1902.07669 (2019).Google ScholarGoogle Scholar
  30. [30] Ni Wentao, Yang Xiuwen, Yang Deqing, Bao Jing, Li Ran, Xiao Yongjiu, Hou Chang, Wang Haibin, Liu Jie, Yang Donghong et al. 2020. Role of angiotensin-converting enzyme 2 (ACE2) in COVID-19. Crit. Care 24, 1 (2020), 110.Google ScholarGoogle Scholar
  31. [31] Page Lawrence, Brin Sergey, Motwani Rajeev, and Winograd Terry. 1999. The PageRank Citation Ranking: Bringing Order to the Web.Technical Report. Stanford InfoLab.Google ScholarGoogle Scholar
  32. [32] Papagiannopoulou Eirini and Tsoumakas Grigorios. 2018. Local word vectors guiding keyphrase extraction. Inf. Process. Manag. 54, 6 (2018), 888902.Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] Pujara Jay, Miao Hui, Getoor Lise, and Cohen William. 2013. Knowledge graph identification. In International Semantic Web Conference. Springer, 542557. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] Ramaswami Prem. 2015. A remedy for your health-related questions: Health info in the knowledge graph. Google Official Blog 2018 (2015).Google ScholarGoogle Scholar
  35. [35] Rose Stuart, Engel Dave, Cramer Nick, and Cowley Wendy. 2010. Automatic keyword extraction from individual documents. Text Mining: Applic. Theor. 1 (2010), 120.Google ScholarGoogle Scholar
  36. [36] Rotmensch Maya, Halpern Yoni, Tlimat Abdulhakim, Horng Steven, and Sontag David. 2017. Learning a health knowledge graph from electronic medical records. Sci. Rep. 7, 1 (2017), 111.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Shi Longxiang, Li Shijian, Yang Xiaoran, Qi Jiaheng, Pan Gang, and Zhou Binbin. 2017. Semantic health knowledge graph: Semantic integration of heterogeneous medical knowledge and services. BioMed Res. Int. 2017 (2017).Google ScholarGoogle Scholar
  38. [38] Sun Yi, Qiu Hangping, Zheng Yu, Wang Zhongwei, and Zhang Chaoran. 2020. SIFRank: A new baseline for unsupervised keyphrase extraction based on pre-trained language model. IEEE Access 8 (2020), 1089610906.Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Tang Yuanhua, Hu Qianjin, Yang Yonghong, Chen Chunnuan, and Mei Minghua. 2008. Full Text Query and Search Systems and Method of Use.US Patent App. 11/740, 247.Google ScholarGoogle Scholar
  40. [40] Tsuruoka Yoshimasa, Tateishi Yuka, Kim Jin-Dong, Ohta Tomoko, McNaught John, Ananiadou Sophia, and Tsujii Jun’ichi. 2005. Developing a robust part-of-speech tagger for biomedical text. In Panhellenic Conference on Informatics. Springer, 382392. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. [41] Vasiliev Yuli. 2020. Natural Language Processing with Python and SpaCy: A Practical Introduction. No Starch Press.Google ScholarGoogle Scholar
  42. [42] Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N., Kaiser Łukasz, and Polosukhin Illia. 2017. Attention is all you need. In International Conference on Advances in Neural Information Processing Systems. 59986008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. [43] Wan Xiaojun and Xiao Jianguo. 2008. Single document keyphrase extraction using neighborhood knowledge. In Association for the Advancement of Artificial Intelligence Conference. 855860. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. [44] Wang Chengbin, Ma Xiaogang, Chen Jianguo, and Chen Jingwen. 2018. Information extraction and knowledge graph construction from geoscience literature. Comput. Geosci. 112 (2018), 112120.Google ScholarGoogle ScholarCross RefCross Ref
  45. [45] Wang Rui, Liu Wei, and McDonald Chris. 2014. Corpus-independent generic keyphrase extraction using word embedding vectors. In Software Engineering Research Conference. 18.Google ScholarGoogle Scholar
  46. [46] Zhao Sendong, Huang Yong, Su Chang, Li Yuantong, and Wang Fei. 2020. Interactive attention networks for semantic text matching. In IEEE International Conference on Data Mining (ICDM). IEEE, 861870.Google ScholarGoogle Scholar
  47. [47] Zhao Xuejiao, Xing Zhenchang, Kabir Muhammad Ashad, Sawada Naoya, Li Jing, and Lin Shang-Wei. 2017. HDSKG: Harvesting domain specific knowledge graph from content of webpages. In IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 5667.Google ScholarGoogle Scholar

Index Terms

  1. Attention-based Unsupervised Keyphrase Extraction and Phrase Graph for COVID-19 Medical Literature Retrieval

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Computing for Healthcare
          ACM Transactions on Computing for Healthcare  Volume 3, Issue 1
          January 2022
          255 pages
          EISSN:2637-8051
          DOI:10.1145/3485154
          Issue’s Table of Contents

          Copyright © 2021 Copyright held by the owner/author(s). Publication rights licensed to ACM.

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 15 October 2021
          • Accepted: 1 July 2021
          • Revised: 1 May 2021
          • Received: 1 January 2021
          Published in health Volume 3, Issue 1

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Full Text

        View this article in Full Text.

        View Full Text

        HTML Format

        View this article in HTML Format .

        View HTML Format