Automatic generation of repeated patient information for tailoring clinical notes

https://doi.org/10.1016/j.ijmedinf.2005.03.008Get rights and content

Summary

Generating clear, readable, and accurate reports can be a time-consuming task for physicians. Clinical notes, which document patient encounters, often contain a certain set of patient information including demographics, medical history, surgical history, examination results or the current medical condition that is propagated from one clinical note to all subsequent clinical notes for the same patient. To this end, we present a system, which automatically generates this patient information for the creation of a new clinical note. We use semantic patterns and an approximate sequence matching algorithm for capturing the discourse role of sentences, which we show to be a useful feature for determining whether the sentence should be repeated. Our system is shown to perform better than a simple baseline metric using precision/recall results. We believe such a system would allow clinical notes to be more complete, timely, and accurate.

Introduction

Within a series of reports for a given patient, a percentage of the patient's information is often repeated, being carried over from one report in the series to the next. Physicians often spend much time and effort both determining what information needs to be repeated and re-generating (dictating or typing) this repeated information when creating a follow-up report for a given patient [1]. This paper describes a methodology to automatically generate repeated patient information for creating new clinical notes. We define a clinical note to be a report which documents encounters with patients, including information such as demographics, medical history, surgical history, examination results or the current medical condition. Our belief is that such a system can reduce the total amount of time needed to generate clinical notes and can also lead to more complete and accurate notes. Completeness is achieved because the system will always propagate to subsequent notes information designated by the physician to be repeated. The accuracy of the repeated information is dependent on the accuracy of the previous note(s) in the series but there is the ever-present danger of propagating erroneous information. With a user interface, which clearly presents the repeated information to the physician, the content of each clinical note can be thoroughly reviewed before being submitted to the patient's permanent record.

Our approach is to use the role a phrase or sentence plays within the document as a key feature in determining whether it should be repeated or not. The role a sentence plays, called a discourse role, is basically the author's intention in using that sentence. We have identified several discourse roles that occur within clinical notes in the pediatric urology domain and some were found to be repeated more readily than others. The discourse role and other words mentioned within the same sentence were shown to be good features for predicting repetition of sentences in clinical notes with high accuracy. Though we have designed the system with our particular clinical note in view, we believe that our methodology is applicable to other documents, which share similar characteristics.

Section snippets

Generating repeated patient information

Our system generates text for documents by extracting text segments from a previous document and inserting it into a new document. We believe copying text verbatim for generating repeated patient information is suitable in performing this task for the following two reasons:

  • (1)

    Physicians often use particular phrases to describe medical observations or events and a system, which generates its own language, may incorrectly convey this vital information.

  • (2)

    There is usually no need for the system to

Results

An analysis of the distribution of discourse roles in our corpus resulted in the graph shown in Fig. 7, where it can be seen that over half the sentences fell under the finding-abnorm or finding-norm tags. This corresponds to the fact that clinical notes are written mainly for documenting patient findings. Fig. 8 shows the repeat percentage of sentences using just the discourse role as a determining feature, which turns out to be not very indicative of repeatability by itself. Consequently, we

Discussion

Work in structured reporting [11], [12], [13], [14] has addressed the issue of reducing the amount of time for a physician to generate routine patient reports. Many of these systems utilize templates, which provide a general structure of the report and allow the physician to fill in the details, and macros, which allow the physician to use a type of shorthand to generate text for the report [15]. Though these systems can allow faster generation of reports, they are good only for generating

Future work

Future work will consist of improving the matching process between two semantic patterns. Work is currently being done to utilize the syntactic structure of language to prune out unnecessary phrases before a match score is calculated. This will help focus the similarity metric. There is also work currently underway which is focusing on determining semantic equivalence on the phrase level, which will allow the system to correlate semantic patterns that do not look alike on the surface but have

Conclusion

We presented a methodology for automatically generating repeated patient information in a series of clinical notes using semantic patterns and approximate sequence matching. Semantic patterns were used to determine discourse roles for sentences in the clinical notes, and based on the discourse role and other features, the system determined whether the sentence should be repeated in a subsequent note or not. Our system was trained on a corpus of pediatric urology clinical notes, and it was able

Acknowledgements

This work was supported in part by a grant from the National Institute of Biomedical Imaging and Bioengineering (NIBIB) PO1-EB00216, and from the National Library of Medicine, T15-LM07356.

References (21)

  • T.F. Smith et al.

    Identification of common molecular subsequences

    J. Mol. Biol.

    (1981)
  • D. Huske-Kraus

    Text generation in clinical medicine—a review

    Methods Inf. Med.

    (2003)
  • E. Riloff

    Automatically constructing a dictionary for information extraction tasks

  • E. Riloff

    Automatically generating extraction patterns from untagged text

  • I. Muslea, Extraction patterns for information extraction tasks: a survey. The AAAI-99 Workshop on Machine Learning for...
  • S.B. Huffman, Learning information extraction patterns from examples, in: Proceedings of the 1995 IJCAI Workshop on New...
  • R. Barzilay et al.

    Learning to paraphrase: an unsupervised approach using multiple-sequence alignment

  • R.K. Taira et al.

    Automatic structuring of radiology free-text reports

    Radiology

    (2001)
  • V.I. Levenshtein

    Binary codes capable of correcting deletions, insertions and reversals

    Soviet Phys. Doklady

    (1966)
  • R. Durbin et al.

    Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids

    (1998)
There are more references available in the full text version of this article.

Cited by (7)

  • Pilot trial of semi-automated medical note writing using lexeme hypotheses

    2020, International Journal of Medical Informatics
    Citation Excerpt :

    Most lexeme queries need only a handful of responses to cover the needed answers. This approach allows us to examine how we generate notes is much more rigorous fashion than previously possible [10], and it generates three useful hypotheses that we use to predict what issue (or query) a clinician will need to address next when writing a note. These hypotheses assume that we have constructed a large library of lexemes and their associated responses in a lexicon.

  • Automatic extraction and assessment of lifestyle exposures for Alzheimer's disease using natural language processing

    2019, International Journal of Medical Informatics
    Citation Excerpt :

    EHRs refer to the comprehensive records of a patient health care history that resides in digital format [12,13]. Clinical notes are free-text EHRs that contain textual descriptions of physician-patient encounters and capture the information that the author intended to collect concerning a certain medical topic, offering valuable resources for identifying lifestyle exposures that physicians believed to be clinically important [14,15]. However, since clinical notes are free-text narratives lacking a standardized structure, searching for simple keywords may result in low sensitivity [16,17].

  • Comparison of automatic summarisation methods for clinical free text notes

    2016, Artificial Intelligence in Medicine
    Citation Excerpt :

    Van Vleck et al. [2] performed structured interviews to identify and classify phrases that clinicians considered relevant to explaining a patient's history. Meng et al. [6] used an annotated training corpus together with tailored semantic patterns to determine what information should be repeated in a new clinical note or summary. Velupillai and Kvist [23] focused on recognising diagnostic statements in clinical text, learning from an annotated training corpus, and classifying these based on the level of certainty they have in them.

  • Data science techniques, tools and algorithms

    2021, SpringerBriefs in Applied Sciences and Technology
View all citing articles on Scopus
View full text