Original Research
A hybrid system to understand the relations between assessments and plans in progress notes

https://doi.org/10.1016/j.jbi.2023.104363Get rights and content

Highlights

  • An NLP pipeline that achieved a top-3 performance in 2022 n2c2 Track 3.

  • Fine-tuned transformer-based models to understand the semantics of progress notes.

  • Integrated medical ontology and order information to imporve the transformer models.

  • Designed an ensemble strategy to further boost our system performance.

Abstract

Objective:

The paper presents a novel solution to the 2022 National NLP Clinical Challenges (n2c2) Track 3, which aims to predict the relations between assessment and plan subsections in progress notes.

Methods:

Our approach goes beyond standard transformer models and incorporates external information such as medical ontology and order information to comprehend the semantics of progress notes. We fine-tuned transformers to understand the textual data and incorporated medical ontology concepts and their relationships to enhance the model’s accuracy. We also captured order information that regular transformers cannot by taking into account the position of the assessment and plan subsections in progress notes.

Results:

Our submission earned third place in the challenge phase with a macro-F1 score of 0.811. After refining our pipeline further, we achieved a macro-F1 of 0.826, outperforming the top-performing system during the challenge phase.

Conclusion:

Our approach, which combines fine-tuned transformers, medical ontology, and order information, outperformed other systems in predicting the relationships between assessment and plan subsections in progress notes. This highlights the importance of incorporating external information beyond textual data in natural language processing (NLP) tasks related to medical documentation. Our work could potentially improve the efficiency and accuracy of progress note analysis.

Introduction

Progress notes document patients’ health status and contain rich information such as medical observations, clinical diagnoses, and treatment plans. They are loosely organized as they are written by physicians in free-form text, and they typically follow the SOAP (Subjective, Objective, Assessment, and Plan) format [1]. The Subjective section describes a patient’s experience or feelings, the Objective section lists objective evidence such as vital signs and lab results, the Assessment section summarizes the patient’s problems based on both subjective and objective evidence, and the Plan section details the treatment plans to address these problems. The Assessment and Plan sections are crucial components of a progress note as they identify a patient’s main diagnosis and associated symptoms/complications, and outline the plans to address them [2]. Typically, one diagnosed problem has multiple associated treatment plans, and identifying the relations between Assessment and Plan section pairs (AP pairs) is essential for downstream tasks such as problem list generation [3], personalized treatment [4], and knowledge graph construction [5]. In practice, deciding the relations between AP pairs usually requires clinical reasoning and domain knowledge, and typically relies on manual chart review by physicians [6]. Therefore, developing a high-quality automatic reasoning tool to identify such a relation would be useful and cost-effective. Unlike prior clinical NLP research efforts that focus mostly on understanding the local context at the word or sentence level in solving tasks such as named entity recognition (NER) and relation extraction (RE) [7], [8], the reasoning of the relation between AP pairs requires the understanding of longer texts at the section level. Such a task is more challenging and has been largely unexplored. To address this challenge, the 2022 National NLP Clinical Challenges (n2c2) Track 3 was organized to solicit solutions for predicting the relation between AP pairs in progress notes.

A natural choice for tackling NLP problems is to leverage the advantages of the popular bidirectional encoder representations from transformers (BERT) language model [9]. BERT converts text tokens into dense vectors which are further encoded through a stack of multi-head self-attention layers. These models can be pretrained on massive text corpora with a language modeling objective and be used to produce meaningful contextualized representations of words. Variants of BERT have demonstrated promising performance not only in general NLP tasks [10], [11], [12], but also in clinical domains such as clinical information extraction and relation identification [13], [14], [15], [16]. In our study, we leveraged fine-tuned transformers to understand the semantics of text in AP pairs in progress notes. However, important (external) information may be missed when using transformer models on AP pairs. For example, if a plan subsection directly targets the main problem in the assessment, the main problem is likely to be discussed in the plan subsection as well. Because progress notes are written in free text by physicians and can contain a mix of medical jargons, synonyms, and abbreviations, it is difficult for transformer models to capture the co-occurrence of similar medical concepts in the AP pairs. In addition, as one assessment is usually associated with multiple plan subsections, physicians often rank treatment plans in order of relevance, with the most relevant treatment targeting the main problem being listed first and the least relevant being listed last. This order information over multiple sentences may be lost if it is not provided to a regular transformer model. In prior works, the combined use of deep neural networks with external information has achieved excellent performance in medical tasks such as medical image segmentation [17], medical NER [18], mortality prediction [19], infant image classification [20], and clinical note sentence similarity identification [21].

Inspired by the successful utilization of external information in prior works, we developed novel approaches to integrate external information with pretrained transformer models for identifying relations of AP pairs. We carefully constructed features from external information tailored for reasoning the relations between AP pairs. In particular, we leveraged medical ontology and existing entity recognition tools to inform the model of the existence of similar medical condition concepts and extracted the ranking of each plan subsection under the associated assessment. Such information was converted to numerical vectors and concatenated with the output of the transformer models. The pretrained transformer models were fine-tuned with the external information on downstream labeled data for assessment and plan relation prediction. We then employed an ensemble strategy that further improves model performance by combining the predictions of different individual transformer-based models. Among all participants in 2022 n2c2 Track 3, our best submission ranked third with a macro-F1 of 0.811. After the challenge phase, we refined our system by adopting a more updated NER tool when utilizing the medical ontology and achieved a macro-F1 of 0.826, higher than the best-performing team (a macro-F1 of 0.821) during the challenge phase. Our work demonstrates the effectiveness of a hybrid system that integrates the strengths of transformer models with external information for understanding the relations between the Assessment section and Plan subsections in progress notes.

Section snippets

Methods

In this study, we developed a procedure to identify similar medical condition concepts in AP pairs by constructing ICD-10 indicators using existing NER tools and medical ontology. We also extracted the order of each plan subsection under their associated assessment and designed a tailored concept embedding layer for regular transformer models. By integrating external information and fine-tuning these transformer models, we were able to ensemble the predictions and output a deterministic label

Results

We present the experimental results for predicting the relations of AP pairs. We examined the numerical performance of our system using either MetaMap or MedCat as the entity recognition tool. We also include a fine-tuned transformer model without external information as an ablation study. Our system performs better when using MedCat as the entity recognition tool due to its ability to identify more medical condition concepts from AP pairs. We also demonstrate that our ensemble strategy is

Discussion

In this study, we developed a computational system that combines fine-tuned transformers with external information to understand the relationship between assessment and plan subsections in progress notes. This system achieved third place in 2022 n2c2 Track 3 among all participants and a refined version outperformed the top performer in the challenge phase. The key strengths of our system include the use of EntityBERT, which is fine-tuned to understand the semantics of AP pairs enriched with

Conclusion

In this paper, we introduce a hybrid system that aims to understand the relationship between the assessment and plans in progress notes. To achieve this, we have integrated medical ontology and ranking information into fine-tuned transformers. The resulting system performed exceptionally well in the 2022 n2c2 Track 3 evaluation. Our work demonstrates the potential for combining deep learning models with external information to create automatic tools for interpreting unstructured progress notes.

CRediT authorship contribution statement

Jifan Gao: Conceptualization, Methodology, Software, Visualization, Formal analysis, Data curation, Writing – original draft, Writing – review & editing. Shilu He: Methodology, Software, Writing – review & editing. Junjie Hu: Conceptualization, Methodology, Software, Supervision, Writing – review & editing. Guanhua Chen: Conceptualization, Methodology, Formal analysis, Supervision, Funding acquisition, Resources, Writing – original draft, Writing – review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

Guanhua Chen and Junjie Hu are co-senior authors of this paper who jointly supervised the project. This project was partially supported by the University of Wisconsin School of Medicine and Public Health from the Wisconsin Partnership Program (the Protocol Development, Informatics, and Biostatistics Module).

Code availability

The codes that implement our system are available at https://github.com/GGGGFan/n2c2_track3.

References (43)

  • StuppD. et al.

    Structured understanding of assessment and plans in clinical documentation

    (2022)
  • FanY. et al.

    Deep learning approaches for extracting adverse events and indications of dietary supplements from clinical text

    J. Am. Med. Inform. Assoc.

    (2021)
  • RamachandranG.K. et al.

    Extracting medication changes in clinical narratives using pre-trained language models

    (2022)
  • DevlinJ. et al.

    Bert: Pre-training of deep bidirectional transformers for language understanding

    (2018)
  • LiuY. et al.

    Roberta: A robustly optimized bert pretraining approach

    (2019)
  • A. Yates, R. Nogueira, J. Lin, Pretrained transformers for text ranking: Bert and beyond, in: Proceedings of the 14th...
  • GaneshP. et al.

    Compressing large-scale transformer-based models: A case study on bert

    Trans. Assoc. Comput. Linguist.

    (2021)
  • LiuN. et al.

    Med-bert: a pretraining framework for medical records named entity recognition

    IEEE Trans. Ind. Inform.

    (2021)
  • YangX. et al.

    Clinical relation extraction using transformer-based models

    (2021)
  • YangF. et al.

    Transformers-sklearn: a toolkit for medical language understanding with transformer-based models

    BMC Med. Inform. Decis. Mak.

    (2021)
  • S. Hebbar, Y. Xie, Covidbert-biomedical relation extraction for covid-19, in: The International FLAIRS Conference...
  • Cited by (0)

    View full text