skip to main content
10.1145/3459637.3482273acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

MedRetriever: Target-Driven Interpretable Health Risk Prediction via Retrieving Unstructured Medical Text

Published: 30 October 2021 Publication History

Abstract

The broad adoption of electronic health record (EHR) systems and the advances of deep learning technology have motivated the development of health risk prediction models, which mainly depend on the expressiveness and temporal modeling capacity of deep neural networks (DNNs) to improve prediction performance. Some further augment the prediction by using external knowledge, however, a great deal of EHR information inevitably loses during the knowledge mapping. In addition, prediction made by existing models usually lacks reliable interpretation, which undermines their reliability in guiding clinical decision-making. To solve these challenges, we propose MedRetriever, an effective and flexible framework that leverages unstructured medical text collected from authoritative websites to augment health risk prediction as well as to provide understandable interpretation. Besides, MedRetriever explicitly takes the target disease documents into consideration, which provide key guidance for the model to learn in a target-driven direction, i.e., from the target disease to the input EHR. To specify, MedRetriever can flexibly choose its backbone from major predictive models to learn the EHR embedding for each visit. After that, the EHR embedding and features of target disease documents are aggregated into a query by self-attention to retrieve highly relevant text segments from the medical text pool, which is stored in the dynamically updated text memory. Finally, the comprehensive EHR embedding and the text memory are used for prediction and interpretation. We evaluate MedRetriever against nine state-of-the-art approaches across three real-world EHR datasets, which consistently achieves the best performance in AUC and recall metrics and outperforms the best baseline by at least 4.8% in recall on three test datasets. Furthermore, we conduct case studies to show the easy-to-understand interpretation by MedRetriever.

Supplementary Material

MP4 File (CIKM21-rgfp0552.mp4)
In this presentation, the authors will cover the proposed MedRetriever structure for the health risk prediction task. Firstly, they will introduce the background of health risk prediction and the gap between existing methods and the expectation for reliable and understandable interpretability. After that, they will talk about the unstructured medical text and how the proposed MedRetriever can use this type of data to improve the health risk prediction performance. Finally, the authors will discuss the experimental results and summary our work.

References

[1]
Tian Bai, Shanshan Zhang, Brian L Egleston, and Slobodan Vucetic. 2018. Interpretable representation learning for healthcare via capturing disease progression through time. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 43--51.
[2]
Inci M Baytas, Cao Xiao, Xi Zhang, Fei Wang, Anil K Jain, and Jiayu Zhou. 2017. Patient subtyping via time-aware lstm networks. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 65--74.
[3]
Edward Choi, Mohammad Taha Bahadori, Le Song, Walter F Stewart, and Jimeng Sun. 2017. GRAM: graph-based attention model for healthcare representation learning. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 787--795.
[4]
Edward Choi, Mohammad Taha Bahadori, Jimeng Sun, Joshua Kulas, Andy Schuetz, and Walter Stewart. 2016. Retain: An interpretable predictive model for healthcare using reverse time attention mechanism. In Advances in Neural Information Processing Systems. 3504--3512.
[5]
Edward Choi, Cao Xiao, Walter F Stewart, and Jimeng Sun. 2018. MiME: multilevel medical embedding of electronic health records for predictive healthcare. In Proceedings of the 32nd International Conference on Neural Information Processing Systems. 4552--4562.
[6]
Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014).
[7]
Kevin Donnelly et al. 2006. SNOMED-CT: The advanced terminology and coding system for eHealth. Studies in health technology and informatics, Vol. 121 (2006), 279.
[8]
Patrick Ernst, Amy Siu, and Gerhard Weikum. 2015. Knowlife: a versatile approach for constructing a large knowledge graph for biomedical sciences. BMC bioinformatics, Vol. 16, 1 (2015), 157.
[9]
Yu Gu, Robert Tinn, Hao Cheng, Michael Lucas, Naoto Usuyama, Xiaodong Liu, Tristan Naumann, Jianfeng Gao, and Hoifung Poon. 2020. Domain-specific language model pretraining for biomedical natural language processing. arXiv preprint arXiv:2007.15779 (2020).
[10]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, Vol. 9, 8 (1997), 1735--1780.
[11]
Halil Kilicoglu, Marcelo Fiszman, Alejandro Rodriguez, Dongwook Shin, A Ripple, and Thomas C Rindflesch. 2008. Semantic MEDLINE: a web application for managing the results of Searches. In Proceedings of the third international symposium for semantic mining in biomedicine, Vol. 2008. 69--76.
[12]
Bum Chul Kwon, Min-Je Choi, Joanne Taery Kim, Edward Choi, Young Bin Kim, Soonwook Kwon, Jimeng Sun, and Jaegul Choo. 2018. Retainvis: Visual analytics with interpretable and interactive recurrent neural networks on electronic medical records. IEEE transactions on visualization and computer graphics, Vol. 25, 1 (2018), 299--309.
[13]
Hung Le, Truyen Tran, and Svetha Venkatesh. 2018. Dual memory neural computer for asynchronous two-view sequential learning. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1637--1645.
[14]
Vladimir I Levenshtein. 1966. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics doklady, Vol. 10. 707--710.
[15]
Junyu Luo, Muchao Ye, Cao Xiao, and Fenglong Ma. 2020. HiTANet: Hierarchical Time-Aware Attention Networks for Risk Prediction on Electronic Health Records. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 647--656.
[16]
Fenglong Ma, Radha Chitta, Jing Zhou, Quanzeng You, Tong Sun, and Jing Gao. 2017. Dipole: Diagnosis prediction in healthcare via attention-based bidirectional recurrent neural networks. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 1903--1911.
[17]
Fenglong Ma, Jing Gao, Qiuling Suo, Quanzeng You, Jing Zhou, and Aidong Zhang. 2018a. Risk prediction on electronic health records with prior medical knowledge. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1910--1919.
[18]
Fenglong Ma, Yaqing Wang, Houping Xiao, Ye Yuan, Radha Chitta, Jing Zhou, and Jing Gao. 2018b. A general framework for diagnosis prediction via incorporating medical code descriptions. In 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE, 1070--1075.
[19]
Fenglong Ma, Quanzeng You, Houping Xiao, Radha Chitta, Jing Zhou, and Jing Gao. 2018c. Kame: Knowledge-based attention model for diagnosis prediction in healthcare. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. ACM, 743--752.
[20]
Thomas C Rindflesch, Halil Kilicoglu, Marcelo Fiszman, Graciela Rosemblat, and Dongwook Shin. 2011. Semantic MEDLINE: An advanced information management application for biomedicine. Information Services & Use, Vol. 31, 1--2 (2011), 15--21.
[21]
Junyuan Shang, Cao Xiao, Tengfei Ma, Hongyan Li, and Jimeng Sun. 2019. Gamenet: Graph augmented memory networks for recommending medication combination. In proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 1126--1133.
[22]
Huan Song, Deepta Rajan, Jayaraman Thiagarajan, and Andreas Spanias. 2018. Attend and diagnose: Clinical time series analysis using attention models. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.
[23]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008.
[24]
Muchao Ye, Junyu Luo, Cao Xiao, and Fenglong Ma. 2020. LSAN: Modeling Long-term Dependencies and Short-term Correlations with Hierarchical Attention for Risk Prediction. In Proceedings of the 29th ACM International Conference on Information and Knowledge Management.
[25]
Changchang Yin, Rongjian Zhao, Buyue Qian, Xin Lv, and Ping Zhang. 2019. Domain Knowledge guided deep learning with electronic health records. In 2019 IEEE International Conference on Data Mining (ICDM). IEEE, 738--747.
[26]
Kaiping Zheng, Wei Wang, Jinyang Gao, Kee Yuan Ngiam, Beng Chin Ooi, and Wei Luen James Yip. 2017. Capturing feature-level irregularity in disease progression modeling. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. 1579--1588.

Cited By

View all
  • (2024)EMERGE: Enhancing Multimodal Electronic Health Records Predictive Modeling with Retrieval-Augmented GenerationProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679582(3549-3559)Online publication date: 21-Oct-2024
  • (2024)Interpretable Disease Prediction via Path Reasoning over medical knowledge graphs and admission historyKnowledge-Based Systems10.1016/j.knosys.2023.111082281:COnline publication date: 1-Feb-2024
  • (2024)TransLSTD: Augmenting hierarchical disease risk prediction model with time and context awareness via disease clusteringInformation Systems10.1016/j.is.2024.102390124(102390)Online publication date: Sep-2024
  • Show More Cited By

Index Terms

  1. MedRetriever: Target-Driven Interpretable Health Risk Prediction via Retrieving Unstructured Medical Text

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management
      October 2021
      4966 pages
      ISBN:9781450384469
      DOI:10.1145/3459637
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 30 October 2021

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. data mining
      2. electronic health records
      3. external knowledge
      4. health risk prediction

      Qualifiers

      • Research-article

      Conference

      CIKM '21
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

      Upcoming Conference

      CIKM '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)129
      • Downloads (Last 6 weeks)5
      Reflects downloads up to 13 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)EMERGE: Enhancing Multimodal Electronic Health Records Predictive Modeling with Retrieval-Augmented GenerationProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679582(3549-3559)Online publication date: 21-Oct-2024
      • (2024)Interpretable Disease Prediction via Path Reasoning over medical knowledge graphs and admission historyKnowledge-Based Systems10.1016/j.knosys.2023.111082281:COnline publication date: 1-Feb-2024
      • (2024)TransLSTD: Augmenting hierarchical disease risk prediction model with time and context awareness via disease clusteringInformation Systems10.1016/j.is.2024.102390124(102390)Online publication date: Sep-2024
      • (2023)Toward attention-based learning to predict the risk of brain degeneration with multimodal medical dataFrontiers in Neuroscience10.3389/fnins.2022.104362616Online publication date: 18-Jan-2023
      • (2023)Knowledge-Enhanced Difference-Aware Clinical Time Series Representation Learning for Diagnosis Prediction2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)10.1109/BIBM58861.2023.10385859(1014-1021)Online publication date: 5-Dec-2023
      • (2023)Predicting line of therapy transition via similar patient augmentationJournal of Biomedical Informatics10.1016/j.jbi.2023.104511147(104511)Online publication date: Nov-2023
      • (2023)Integrating domain knowledge for biomedical text analysis into deep learningJournal of Biomedical Informatics10.1016/j.jbi.2023.104418143:COnline publication date: 1-Jul-2023
      • (2022)Generative Adversarial Networks Enhanced Pre-training for Insufficient Electronic Health Records ModelingProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3534678.3539020(3810-3818)Online publication date: 14-Aug-2022
      • (2022)MedSkim: Denoised Health Risk Prediction via Skimming Medical Claims Data2022 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM54844.2022.00018(81-90)Online publication date: Nov-2022
      • (2022)DL-BERT: a time-aware double-level BERT-style model with pre-training for disease prediction2022 IEEE International Conference on Big Data (Big Data)10.1109/BigData55660.2022.10020513(1801-1808)Online publication date: 17-Dec-2022
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media