ABSTRACT
Finding patients with specific clinical conditions, such as having a familial disease history of diabetes, is an important task for clinical decision support. Clinical notes in Electronic Health Records (EHR), which document the patient medical history and familial disease history, are valuable resources for patient cohort selection. However, such information is difficult to discover in clinical text, and full-text search techniques often fail due to the unique characteristics of clinical language. We describe a system---SearchEHR---that combines Natural Language Processing (NLP) and Information Retrieval (IR) techniques to facilitate utilising clinical notes to find cohorts of patients, with a special focus on family disease history.
Supplemental Material
- Belden J. Botkin M. Kochendorfer K. Kruse R. Strecker D. Alafaireet, P. and J. Williams. 2017. Embedding a Medical Search Engine Within an Electronic Health Record. Missouri medicine, Vol. 114, 4 (2017).Google Scholar
- Emily Alsentzer, John R Murphy, Willie Boag, Wei-Hung Weng, Di Jin, Tristan Naumann, and Matthew McDermott. 2019. Publicly available clinical BERT embeddings. In Proceedings of the 2nd Clinical Natural Language Processing Workshop. Minneapolis, Minnesota, 72--78.Google ScholarCross Ref
- Rui Antunes, João Figueira Silva, Arnaldo Pereira, and Sérgio Matos. 2019. Rule-based and Machine Learning Hybrid System for Patient Cohort Selection. In International Conference on Health Informatics. Prague, Czech Republic, 59--67.Google Scholar
- Daniel Cer, Yinfei Yang, Sheng yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Brian Strope, and Ray Kurzweil. 2018. Universal Sentence Encoder for English. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (System Demonstrations). Brussels, Belgium, 169--174.Google ScholarCross Ref
- Sungbin Choi, Jinwook Choi, Sooyoung Yoo, Heechun Kim, and Youngho Lee. 2014. Semantic concept-enriched dependence model for medical information retrieval. Journal of biomedical informatics (2014), 18--27.Google Scholar
- Xiang Dai, Sarvnaz Karimi, Ben Hachey, and Cecile Paris. 2020. Cost-effective Selection of Pretraining Data: A Case Study of Pretraining BERT on Social Media. In Findings of the Association for Computational Linguistics: EMNLP 2020. Online, 1675--1681.Google Scholar
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Minneapolis, Minnesota, 4171--4186.Google Scholar
- Rezarta Islamaj Doug an, Robert Leaman, and Zhiyong Lu. 2014. NCBI disease corpus: a resource for disease name recognition and concept normalization. Journal of biomedical informatics (2014), 1--10. Google ScholarDigital Library
- Erik Faessler and Michel Oleynik. 2019. JULIE Lab at the 2019 TREC Precision Medicine Track. In TREC. Gaithersburg, MD.Google Scholar
- David B Fogel. 2018. Factors associated with clinical trials that fail and opportunities for improving the likelihood of success: a review. Contemporary clinical trials communications (2018), 156--164.Google ScholarCross Ref
- Google. 2019. Google Health. https://www.youtube.com/watch?v=P3SYqcPXqNk. [Online; accessed 10-Apr-2021].Google Scholar
- Suchin Gururangan, Ana Marasovi?, Swabha Swayamdipta, Kyle Lo, Iz Beltagy, Doug Downey, and Noah A Smith. 2020. Don't Stop Pretraining: Adapt Language Models to Domains and Tasks. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Online, 8342--8360.Google ScholarCross Ref
- Hamed Hassanzadeh, Sarvnaz Karimi, and Anthony Nguyen. 2020. Matching patients to clinical trials using semantically enriched document representation. Journal of Biomedical Informatics, Vol. 105 (2020), 103406.Google ScholarCross Ref
- Sam Henry, Yanshan Wang, Feichen Shen, and Ozlem Uzuner. 2020. The 2019 National Natural language processing (NLP) Clinical Challenges (n2c2)/Open Health NLP (OHNLP) shared task on clinical concept normalization for clinical records. Journal of the American Medical Informatics Association (2020), 1529--1537.Google Scholar
- Richard Jackson, Ismail Kartoglu, Clive Stringer, Genevieve Gorrell, Angus Roberts, Xingyi Song, Honghan Wu, Asha Agrawal, Kenneth Lui, Tudor Groza, et al. 2018. CogStack-experiences of deploying integrated information retrieval and extraction services in a large National Health Service Foundation Trust hospital. BMC medical informatics and decision making, Vol. 18, 1 (2018), 1--13.Google Scholar
- Sravya Kakumanu, Braden Manns, Sophia Tran, Terry Saunders-Smith, Brenda Hemmelgarn, Marcello Tonelli, Ross Tsuyuki, Noah Ivers, Danielle Southern, Jeff Bakal, and David Campbell. 2019. Cost analysis and efficacy of recruitment strategies used in a large pragmatic community-based clinical trial targeting low-income seniors: a comparative descriptive analysis. Trials, Vol. 20, 577 (2019).Google Scholar
- NCBI. 2021. MedGen. https://www.ncbi.nlm.nih.gov/medgen/. [Online; accessed 10-Apr-2021].Google Scholar
- NLM. 2021. UMLS. https://www.nlm.nih.gov/research/umls/index.html. [Online; accessed 10-Apr-2021].Google Scholar
- Michel Oleynik, Amila Kugic, Zdenko Kasac, and Markus Kreuzthaler. 2019. Evaluating shallow and deep learning strategies for the 2018 n2c2 shared task on clinical text classification. Journal of the American Medical Informatics Association (2019), 1247--1254.Google Scholar
- Catherine Plaisant, Stanley Lam, Ben Shneiderman, Mark S. Smith, David Roseman, Greg Marchand, Michael Gillam, Craig Feied, Jonathan Handler, and Hank Rappaport. 2008. Searching electronic health records for temporal patterns in patient histories: A case study with microsoft amalga. In AMIA annual symposium proceedings, Vol. 2008. 601.Google Scholar
- Yada Pruksachatkun, Jason Phang, Haokun Liu, Phu Mon Htut, Xiaoyi Zhang, Richard Yuanzhe Pang, Clara Vania, Katharina Kann, and Samuel R Bowman. 2020. Intermediate-Task Transfer Learning with Pretrained Models for Natural Language Understanding: When and Why Does It Work?. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Online, 5231--5247.Google ScholarCross Ref
- Kirk Roberts, Dina Demner-Fushman, Ellen Voorhees, William R. Hersh, Steven Bedrick, Alexander Lazar, and Shubham Pant. 2017. Overview of the TREC 2017 Precision Medicine Track. In TREC. Gaithersburg, MD.Google Scholar
- Kirk Roberts, Dina Demner-Fushman, Ellen M. Voorhees, Steven Bedrick, and William R. Hersh. 2021. Overview of the TREC 2020 Precision Medicine Track. In (To appear in) TREC. Gaithersburg, MD.Google Scholar
- Kirk Roberts, Dina Demner-Fushman, Ellen M. Voorhees, William R. Hersh, Steven Bedrick, and Alexander J. Lazar. 2018. Overview of the TREC 2018 Precision Medicine Track. In TREC. Gaithersburg, MD.Google Scholar
- Kirk Roberts, Dina Demner-Fushman, Ellen M. Voorhees, William R. Hersh, Steven Bedrick, Alexander J. Lazar, Shubham Pant, and Funda Meric-Bernstam. 2019. Overview of the TREC 2019 Precision Medicine Track. In TREC. Gaithersburg, MD.Google Scholar
- Stephen Robertson, Steve Walker, Susan Jones, Micheline Hancock-Beaulieu, and Mike Gatford. 1995. Okapi at TREC-3. In TREC. Gaithersburg, MD, US. https://trec.nist.gov/pubs/trec3/t3_proceedings.htmlGoogle Scholar
- Maciej Rybinski, Xiang Dai, Sonit Singh, Sarvnaz Karimi, and Anthony Nguyen. 2021. Extracting Family History Information From Electronic Health Records: Natural Language Processing Analysis. JMIR Medical Informatics, Vol. 9, 5 (2021), e30153.Google ScholarCross Ref
- David L Sackett, William MC Rosenberg, JA Muir Gray, R Brian Haynes, and W Scott Richardson. 1996. Evidence based medicine: what it is and what it isn't. The BMJ (1996).Google Scholar
- Feichen Shen, Sijia Liu, Sunyang Fu, Yanshan Wang, Sam Henry, Ozlem Uzuner, and Hongfang Liu. 2021. Family History Extraction From Synthetic Clinical Narratives Using Natural Language Processing: Overview and Evaluation of a Challenge Data Set and Solutions for the 2019 National NLP Clinical Challenges (n2c2)/Open Health Natural Language Processing (OHNLP) Competition. JMIR Medical Informatics (2021), e24008.Google Scholar
- SNOMED. 2021. SNOMED CT. https://www.snomed.org/snomed-ct/five-step-briefing. [Online; accessed 10-Apr-2021].Google Scholar
- Amber Stubbs, Michele Filannino, Ergin Soysal, Samuel Henry, and Özlem Uzuner. 2019. Cohort selection for clinical trials: N2C2 2018 shared task track 1. Journal of the American Medical Informatics Association (2019), 1163--1171.Google Scholar
- Charles Sutton and Andrew McCallum. 2007. An introduction to conditional random fields for relational learning. The MIT Press.Google Scholar
- Ellen M Voorhees and William R Hersh. 2012. Overview of the TREC 2012 Medical Records Track. In Text REtrieval Conference.Google Scholar
- Yue Wang, Xitong Liu, and Hui Fang. 2014. A study of concept-based weighting regularization for medical records search. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Baltimore, Maryland, 603--612.Google ScholarCross Ref
- Yeming Wang, Dingyu Zhang, Guanhua Du, Ronghui Du, Jianping Zhao, Yang Jin, Shouzhi Fu, Ling Gao, Zhenshun Cheng, and Qiaofa Lu. 2020. Remdesivir in adults with severe COVID-19: a randomised, double-blind, placebo-controlled, multicentre trial. The Lancet (2020), 1569--1578.Google Scholar
- Honghan Wu, Giulia Toti, Katherine I Morley, Zina Ibrahim, Amos Folarin, Ismail Kartoglu, Richard Jackson, Asha Agrawal, Clive Stringer, Darren Gale, et al. 2017. SemEHR: surfacing semantic data from clinical notes in electronic health records for tailored care, trial recruitment, and clinical research. The Lancet, Vol. 390 (2017), S97.Google ScholarCross Ref
- Xuesi Zhou, Xin Chen, Jian Song, Gang Zhao, and Ji Wu. 2018. Team Cat-Garfield at TREC 2018 Precision Medicine Track. In TREC,, Ellen M. Voorhees and Angela Ellis (Eds.). Gaithersburg, MD.Google Scholar
Index Terms
- SearchEHR: A Family History Search System for Clinical Decision Support
Recommendations
Cooperative Epistemic Work in Medical Practice: An Analysis of Physicians' Clinical Notes
We examine an important part of the medical record that has not been studied extensively: physicians' clinical notes. These notes constitute an explanatory medical narrative that documents the patient's illness trajectory by combining each physician's ...
Experiencer Detection and Automated Extraction of a Family Disease Tree from Medical Texts in Russian Language
Computational Science – ICCS 2020AbstractText descriptions in natural language are an essential part of electronic health records (EHRs). Such descriptions usually contain facts about patient’s life, events, diseases and other relevant information. Sometimes it may also include facts ...
Identification of pediatric respiratory diseases using a fine-grained diagnosis system
Graphical abstractDisplay Omitted
Highlights- Diagnosing respiratory diseases in pediatrics with clinical notes is possible.
- ...
AbstractRespiratory diseases, including asthma, bronchitis, pneumonia, and upper respiratory tract infection (RTI), are among the most common diseases in clinics. The similarities among the symptoms of these diseases precludes prompt diagnosis ...
Comments