1 Introduction

It has been shown that accessing the patients’ own electronic health records (EHR) can enhance medical understanding and provide clinically relevant benefits, including adherence (Delbanco et al. 2012). Several studies have found that providing knowledge can improve diabetes-related health outcomes (Krishna and Boren 2008). Patients have also expressed interests in accessing their own records (Wiljer et al. 2006). However, EHR notes present unique challenges to the average patients. Since these notes are not usually targeted at the patients (Mossanen et al. 2014), languages that may be difficult for non-medical professionals to comprehend are prevalent, including medical terms, abbreviations, and medical domain-specific language patterns.

To address the aforementioned challenge, we are developing a NoteAid system. In a previous work, NoteAid links medical jargon to definitions (Polepalli Ramesh et al. 2013). In this work, we develop NoteAid to link medical notes to targeted easy-to-understand education materials from trusted resources. Since EHR notes are patient-specific, the linking provides a means of tailored education for the patient who otherwise may not comprehend his or her medical causes. We speculate that NoteAid has the potential to improve patients’ understanding of their own clinical conditions and health knowledge, which in turn can lead to improved self-managed care and clinical outcome.

The rest of the paper is organized as follows. Section 2 provides related work, including health literacy and comprehension, domain-specific information retrieval, and biomedical information retrieval. Section 3 covers the education materials, the evaluation data, and evaluation metrics. Section 4 presents the methods to construct queries from EHR notes. Experiment results and discussions are presented in Sect. 5. We finally conclude with limitations and future plans in Sect. 6.

2 Related work

There is a wealth of work in health literacy and comprehension. The medical jargon, which is prevalent in the EHR notes, is one evident difficulty in patients’ understanding (Keselman et al. 2007). Zeng and Tse (2006) and Smith and Fellbaum (2004) created mappings between the medical and consumer terminologies. Unsupervised methods are employed to identify difficult terms and definitions are retrieved using commercial search engines (Elhadad 2006). Kandula et al. (2010) developed tools to simplify difficult terms. Grabar and Hamon (2014) used morphological analysis and text mining to collect paraphrases for medical terms. Providing definitions of medical jargon is also shown to improve EHR notes’ readability. For instance, our NoteAid system (Polepalli Ramesh et al. 2013) identifies medical concepts and fetches definitions from UMLS, Medline Plus, and Wikipedia and evaluation has shown significant improvement in self-reported comprehension.

Improving readability does not fully exploit the benefits of access to EHR notes. High quality information obtained through education materials can potentially lead to better outcomes (Bodenheimer et al. 2002). The Patient Clinical Information System (Cimino et al. 2002) provides patients with online information resources and educational aides, and evaluations by patients have been positive. However, no automated systems have been reported. The Infobutton Manager Project (Cimino 2006; Cimino et al. 1997) links EHR notes to other information resources (e.g., drug databases, Google, PubMed, AskHERMES (Cao et al. 2011)). However, Infobuttons were developed mainly to assist physicians, and were not designed for patients. PERSIVAL is another physician-centric system that accepts user provided queries to retrieve personalized results from a patient care library (McKeown et al. 2001). EHR notes are used to build topics from consumer health texts. Probabilistic topic modeling is also utilized to recommend education materials to patients with diabetes (Kandula et al. 2011). Education materials are ranked according to frequencies of terms and topics in a given EHR note. The authors show that the top two recommended documents are significantly more relevant than a randomly selected document from the same domain. Doupi and van der Lei (2002) use structured data from EHR to retrieve health-related information. An early system was designed to provide personalized health information from a knowledge base by filling manually created templates (Cawsey et al. 2000).

Research in domain-specific information retrieval is closely related to our work as well. In these searches, it is common to use a document as the base for queries. Patent retrieval (Fujii et al. 2007), as an example, has been widely studied. Patent documents are generally long and complex, necessitating methods to generate shorter queries. For example, words in the summary section of a patent document can be ranked by TFIDF scores and extracted to form a query (Xue and Croft 2009). Sentences that are similar to pseudo-relevant documents according to a language model are also used to reduce query length (Ganguly et al. 2011). Other similarity measures such as Kullback–Leibler divergence are used to extract terms, which are expanded to generate queries in the patent retrieval domain (Mahdabi et al. 2012). However, the patent retrieval domain is recall-driven, while in our scenario, patients are generally not expected to read relevant education documents exhaustively.

Other than retrieving patents, various methods have been proposed to retrieve documents relevant to passages of text or web documents. A model extended from CRF is proposed to identify noun phrases and named entities from a user-selected passage as queries (Lee and Croft 2012). Similarly, noun phrases in a verbose query are also used as candidates for key concepts (Bendersky and Croft 2008). Other related work that reduces long queries includes ranking all subsets of the original query (Kumaran and Carvalho 2009). However, the passages and verbose queries in these systems are shorter than typical EHR notes, which makes the graphical model and other learning based models less efficient. Moreover, parsers and Named Entity Recognizers for the medical domain are less effective than the general domain. Pseudo-relevant documents are exploited to identify concepts for query generation (Kim 2014).

Information retrieval in the biomedical domain is also related to this work. WRAPIN is a system that analyzes web pages and retrieves related health documents (Gaudinat et al. 2006). The system is limited by the design that the health document sources are only indexed by MeSH terms and their synonyms. Our system does not require indexing the document collection with ontology sources, thus eliminating the computationally expensive extraction of the MeSH terms. More IR systems in the biomedical domain are developed to help physicians and researchers. In a review article by Pluye et al. (2005), it states that one third of searches may have a positive impact on physicians. A full text index of EHR notes and query-based IR allowed healthcare providers to perform tasks such as medical management of patients, medical research, and improving the traceability of medical care in medical records (Biron et al. 2014). A life science IR system LAILAPS utilizes query expansion and suggestion to improve retrieval results (Esch et al. 2014). Another study also found query expansion helpful in retrieving biomedical documents from a subset of MEDLINE (Rivas et al. 2014). Query expansion using a large, in-domain clinical corpus is reported to be useful for patient cohort identification (Zhu et al. 2014). The CLEF eHealth (Kelly et al. 2014) challenge includes a task to retrieve information to address questions patients may have when reading clinical reports. This task provides participants with expert-formulated concise queries for one central disorder in discharge summaries (Goeuriot et al. 2014). In our study, we aim to generate queries from long EHR notes without the help of experts. TREC Clinical Decision Support Track is another information retrieval challenge involving EHR notes. A number of participants extracted terms from the query descriptions exhaustively using external knowledge bases, and expanded them with synonyms defined in medical ontologies. Relevance feedback is also a popular technique among the participating systems. Unlike our method that filters the pseudo-relevant documents, some systems use manual judgments or the top documents. The task is designed to address the physicians’ information needs of diagnosing the condition, further testing, and treating the patients, rather than the patients’ needs of education materials. Case reports are provided as query descriptions, which can be shorter and more focused than an EHR note.

Contributions of this work are: 1) we designed approaches to generate effective queries from the long and noisy EHR data, resulting in significant performance improvement over the baseline, and 2) we built a corpus of EHR note specific education materials.

3 Materials

3.1 Education materials

MedlinePlusFootnote 1 provides current and reliable information about over 900 health topics to users in consumer-oriented lay language. Additionally, the medical encyclopedia section includes over 7000 articles about diseases, tests, symptoms, injuries, and surgeries. We include in this study the textual narratives in the “health topics”, “drugs, supplements, and herbal information”, and “medical encyclopedia” sections of the MedlinePlus as the collection of educational materials. There are a total of approximately 9400 articles in this collection, which we designate as MedlinePlus. Table 1 summarizes the characteristics of the collection.

Table 1 MedlinePlus collection

We index the MedlinePlus documents with Galago (Croft et al. 2010) (version 3.5), an advanced open source search engine. Galago implements the inference network retrieval model (Turtle and Croft 1991). This model calculates the probability of the user’s information needs being satisfied given a document in a directed acyclic graph. This framework is applied in many information retrieval tasks, and shown to be successful (Metzler et al. 2004).

3.2 De-identified EHR corpus

Some of the models in our system were trained with de-identified EHR data. This corpus is a collection of 6718 progress notes. The documents contain 1 million tokens. This corpus is used to learn topic models.

3.3 Evaluation data and metrics

Twenty de-identified EHR progress notes are randomly selected to test our systems’ performance. Each note contains on average 261 tokens, with a standard deviation of 133. A physician read each note, and manually identified relevant education materials from the MedlinePlus documents, with the help of the builtin search function on the MedlinePlus website. Each EHR note is linked to 22 education material documents on average. This collection of 20 annotated EHR-note-education-materials is then used as the gold standard for evaluating our NoteAid IR systems. For example, Table 2 shows the summary of one EHR note and some of its relevant MedlinePlus documents.

Table 2 Example EHR Note and its relevant documents

To evaluate the IR systems, we use the Mean Average Precision (MAP) metric, a common standard in the IR community to evaluate ranked retrieval results. Set-based measures such as precision and recall metrics cannot distinguish the order the results are presented in a ranked retrieval context.

Average Precision (AveP) for each test document is the average of precision at each point where a relevant document is found:

$$AveP(R,D_k) = \frac{\sum _{d_i \in D_k \cap R}P(R,D_i)}{|R|},$$
(1)

where R is the gold standard, \(D_k\) is the top k retrieved results, \(d_i\) is the ith ranked result in \(D_k\), and \(D_i\) is the results from 1 to ith ranked document. \(P(R,D_i)\) is the precision score for the \(D_i\) documents:

$$P(R,D_i) = \frac{|R \cap D_i|}{|D_i|}.$$
(2)

Then, for a given set of queries Q, MAP can be calculated as

$$MAP(Q) = \frac{\sum _{q \in Q} AveP(R_q, D_{k,q})}{|Q|},$$
(3)

where q is a query in Q, \(R_q\) is the gold standard results of q, and \(D_{k,q}\) is the top k retrieved results of q.

Another metric that we use to evaluate our system performance is precision at 10. This metric measures the precision of the top 10 retrieval results. Although it does not distinguish the order of the results, it is still a useful metric as patients are less likely to read more than a few related documents.

4 Methods

We develop two strategies to link EHR notes to external education materials.

4.1 Baseline approaches

4.1.1 Baseline

The first is based on traditional IR in which we use the entire EHR notes to retrieve relevant education materials. This strategy is used as a baseline system.

4.1.2 Full text with CHV

Since EHR text is not patient-oriented, to narrow the gap between medical language and lay language, we substituted the medical jargon with the consumer-oriented counterparts created by the Consumer Health Vocabulary (CHV) Initiative (Zeng and Tse 2006). The EHR notes were first processed by MetaMap (Aronson 2001) to recognize medical concepts. Those recognized concepts that have a corresponding layman term in CHV were subsequently replaced. In order to limit concepts to domain-specific medical terms, we filtered the MetaMap recognized concepts to the following semantic types, as defined in Unified Medical Language System (UMLS) (Bodenreider 2004): acquired abnormality, antibiotic, cell or molecular dysfunction, clinical attribute, diagnostic procedure, disease or syndrome, experimental model of disease, finding, laboratory procedure, laboratory or test result, organ or tissue function, pathologic function, physiologic function, pharmacologic substance, sign or symptom and therapeutic or preventive procedure. The substituted EHR notes were issued as queries to the system.

4.2 Topic models

In the second strategy, we investigate several query generation approaches, whereby short queries are built from an EHR note to retrieve relevant education materials.

In these approaches and the ones described in the following sections, sequential dependence model (Metzler and Croft 2005) was used to capture the dependencies in a multi-word query term. In this model, given a query, documents are ranked based on features of documents containing a single query term, two query terms sequentially appearing in the query, and two query terms in any order. This model has been shown to be effective in many applications (Balasubramanian et al. 2007; Cartright et al. 2011; Bendersky et al. 2009).

4.2.1 LDA

Full EHR notes typically discuss several aspects of the patient’s conditions, including diagnoses, medication, procedures, etc. We trained Latent Dirichlet Allocation (LDA) topic models (Blei et al. 2003) from over 6000 de-identified EHR notes to infer topics from the test notes. Three models were learned with 20, 50, and 100 topics, of which the one with 100 topics shows the highest performance.

Traditional LDA models extract distributions over individual word tokens for each topic. However, medical concepts often contain more than one token. We employed turbo topics (Blei and Lafferty 2009) to find phrases from these topics. This method builds significant n-grams based on a language model of arbitrary length expressions. To translate the topics into queries, we first performed inference on the test notes to find the topic mixture, and then took the top 5 phrases from the most likely topics whose combined probability is over 80 %.

4.2.2 LDA on concepts

To concentrate on medical terms, we trained another LDA model solely from the medical concepts contained in the EHR notes. The same de-identified EHR notes used to train LDA models in the LDA approach were first processed to find medical terms, in the same way as described in Full text with CHV. The notes were then converted to collections of the UMLS Concept Unique Identifiers (CUIs), corresponding to the medical terms recognized by MetaMap, disregarding the textual content. These converted notes were training documents for the new LDA model. Topics were inferred on the 20 EHR test notes, after being processed similarly. The top 5 CUIs from the most likely topics (with combined probability of over 80 %) are mapped back to phrases in UMLS, and are generated as queries.

4.3 Key concept identification

4.3.1 IDF-filtered concepts

We also more directly focus on the medical concepts by selecting the top concepts based on their inverse document frequency (IDF) from the EHR note corpus we used to learn LDA models. In a large corpus in general, concepts that occur in a small number of documents are more unique to the document being analyzed. In an EHR note, these concepts are presumably more important for the patient. Therefore, we selected 10 concepts that have the lowest IDF from each note to construct a query.

4.3.2 Key concepts (semantic relations)

Although an EHR note may discuss several disorders, symptoms, and other medical concepts, all are not equally important. There are usually a few key concepts, and several related concepts. To identify the key concepts, we adopted an unsupervised approach that exploits the semantic relations among the medical concepts to identify the few that are related to many of the concepts. We first built a graph of the semantic relations among the medical concepts in a note. The relations are defined in the UMLS Metathesaurus. The 5 most connected concepts were selected and used as query terms.

4.3.3 Key concepts (CRF)

We also explored two supervised methods in order to identify key concept from the EHR notes. A physician independently assigned relevant MedlinePlus documents to 20 EHR notes (details are described in Sect. 3.3). Training data was generated by marking as key concepts those phrases that match the title of any relevant MedlinePlus documents. A Conditional Random Fields (CRF) model (Lafferty et al. 2001) was then trained to predict the key concepts using leave-one-out cross validation. We explored lexical, morphological, word shape, UMLS semantic type, and section information as learning features.

4.3.4 Key concepts (MetaMap + SVM)

A binary Support Vector Machine (SVM) model (Cortes and Vapnik 1995) was also learned to classify each of the concepts identified by MetaMap (as in Full text with CHV) as key or non-key concept. Features of this model included detailed semantic type, tree level of the semantic type in the UMLS semantic network hierarchy, coarse semantic type, inverse document frequency of the concept learned from a medical note corpus, and section information that the concept occurs in. Since the positive instances are far outnumbered by the negative ones, the penalty for misclassifying a positive example was tuned to maximize the F1 score.

4.3.5 All concepts

Due to the sparseness of the automatically predicted key concepts, many relevant education documents may be missed. We thus extracted all concepts from an EHR note using MetaMap and constructed a query using all concepts.

4.4 Two stage queries

We evaluated a two-stage design that first retrieves a small number of documents (we empirically set the threshold to 20) using the key concepts before complementing them with more documents from the full set of concepts for broader coverage. The second query results are ranked behind those from the first query with duplicates removed. This approach combines the ability of the full-set approach to cover more concepts and the advantage of the key concept approach to identify relevant documents more precisely.

4.5 Query expansion

We explored query expansion methods using co-occurrences. The most co-occurring concepts for the key concepts identified are included in the queries as expansions. The co-occurrence data is extracted from patient records in The Canonical Clinical Problem Statement System by UMLS.Footnote 2

Finally, we experimented with pseudo-relevance feedback where the top retrieval results from the key concept query are analyzed and concepts from these documents are ranked to expand a query. The initial queries are generated using methods described previously. Among the top 20 retrieval results, those whose titles match at least one of the identified key concepts are considered pseudo-relevant documents. This additional requirement is to ensure that the expanded concepts do not drift from the main topic of the medical notes. From these pseudo-relevant documents, medical concepts and their corresponding CUIs are extracted. The CUIs are ranked by their TF-IDF score, and the concept names of the top ranked CUIs are added to the original query as expansions.

5 Experiment results and discussion

5.1 Baseline approaches

As described in Sect. 3, our traditional IR-based approach issues each of the 20 test EHR notes as the query to the MedlinePlus document index and retrieves top 500 relevant documents. The performance is shown in Table 3, row 1.

Table 3 System performance

The result using Full text with CHV as shown in Table 3, row 2 more than doubled. The gap between medical language and lay language highlights the issue that patients may have difficulty finding relevant health information without assistance.

We found that the top 10 retrieved results of the baseline system for each of the EHR note are nearly identical, with minimal order variations. We found that none of the top-10 retrievals is a true relevant document according to our gold standard. The results are not surprising. EHR notes are written by physicians, containing domain-specific medical jargon. In contrast, consumer-oriented education materials are written in lay language, a different text genre. In addition, the full text of an EHR note may contain noise to the extent that distinguishing content is drown out. For example, an EHR sentence “I am glad to see Ms. Smith today” provides little information other than the gender of the patient, which may still be identified from other parts of the note. Search engines are not optimized to process queries as long as over 500 tokens, and cannot automatically filter out the noise without significant adaptations. The unique language and style in these medical notes makes the filtering all the more difficult.

5.2 Topic models

The results of LDA and LDA on concepts are shown in Table 3, rows 3 and 4. The improvement of LDA over the baseline is statistically significant using a paired t-test with \(p<0.05\). Performance of LDA on concepts is also statistically significant over the baseline system, using the same test.

Table 4 shows the top 10 n-grams from 7 topics trained on the medical text. It is clear that while topics like the first one capture medical concepts, others like the second one do not. The LDA results also highlight the noisy nature of the EHR notes. Queries formed by including the generic or noisy terms such as “continue on” will not benefit retrieval results. Examining the retrieval results, we found that when the most prominent topics (the topics with a combined probability over 80 %) include medical concepts, the top 10 results usually contain at least one relevant document (sometimes as the first result). When only generic topics are identified, relevant documents are absent in the top 10 results. For instance, in the example EHR note shown in Table 2, topic 1 from Table 4 is one of the prominent topics. On the other hand, none of the top 10 results is relevant for a note for which only topic 2 is identified as prominent.

Figure 1 shows, for each topic, the proportion of documents in which the topic is identified as prominent. It is clear that many highly-prevalent topics that are prominent in the MedlinePlus collection appear much less often in the EHR notes. For example, one topic that is present in over 90 % of the MedlinePlus documents includes phrases such as “history of”, “therapy”, “left”, “time”, “just”, “day”. Another topic that is present in nearly 20 % of the EHR notes but less than 1 % in MedlinePlus includes phrases such as “bactrim”, “hyperglycemia”, “bronchitis”, “asthmatic”.

Table 4 Top 10 n-grams from seven topics using the LDA model trained on 6000 EHR notes
Fig. 1
figure 1

Proportion of prominent topics in EHR notes and MedlinePlus documents

As shown in Table 3, the LDA model trained from the UMLS concepts did not outperform one trained from the full note text. This drop in performance may be in part due to the noise introduced by the MetaMap system. Analysis of the retrieval results show that this model identified 2 more topics per EHR note on average, or 8 more terms in the resulting query than did the textual LDA model. This increase in the number of query terms could dilute the probability mass assigned to the terms, rendering the query string less effective. In the example document from Table 2, the textual LDA model generated 17 terms whereas the UMLS concept based LDA model generated 32 terms. In addition, some query concepts, such as Carney’s Syndrome, are not present in our MedlinePlus document collection.

5.3 Key concept identification

The system performance using IDF-filtered concepts and Key concepts (semantic relations) are shown in Table 3, rows 5 and 6. Compared to the baseline and the topic model based methods, these experiments show that medical concepts are very effective query terms.

The performance using leave-one-out cross validation of the two supervised methods to identify key concepts are shown in Table 5. Retrieval performance using these generated queries are shown in Table 3, rows 7 and 8, while performance using All concepts is shown in row 9.

Table 5 Key concept identification performance

We note that the performance of Key concepts (semantic relations) is lower than using either All concepts or the IDF-filtered concepts. This can be attributed to two reasons. The relationships as defined in UMLS are rare in the medical notes. There are many singleton nodes and subgraphs with few relations in the concept graph. The second reason is that generic terms are often highly connected. For example, “complaints” is related to many symptoms, including “nausea”, “vomiting”, “abdominal pain”, etc. The query necessarily includes “complaints” as one of the terms, resulting in a low precision retrieval.

Using the Key concepts (CRF) shows a significant improvement over the LDA on concepts system. The improvement over the LDA system is smaller. Although this improvement is not statistically significant, this model achieves nearly double the performance of the LDA model.

One drawback of this approach is that the identified key phrases fail to cover the scope of the medical concepts contained in an EHR note. First, since the key phrases are rather sparse in the EHR notes, the CRF model cannot learn from enough examples, thus only approximately a quarter of them are extracted. Secondly, the sparseness leads to a low coverage of the entire set of concepts in an EHR note. In the example document in Table 2, only four out of the seven key concepts are identified by the CRF model. In fact, on average only 23.6 % of the concepts are annotated as key phrases in each note.

Comparing the two machine-learned based approaches, the SVM model outperforms the CRF model by 19.5 % in terms of F1 score (Table 5). Part of the reason is that the CRF model needs to correctly identify the concept boundaries, while the SVM classifier does not, making the task slightly easier. The increase in F1 is largely due to the higher recall of the SVM model. Although the precision of the SVM model is lower than the CRF model, since the actual queries rarely contain more than 6 concepts, one error can produce over 15 percentage points difference in the precision score.

The All concepts system outperformed the semantic relations based system, and obtained slightly better MAP score than the CRF based system. It is not surprising that all concept queries performed better than the semantic relations based queries because the semantic relations did not capture the key concepts. To the contrary, many of the selected concepts are generic terms, such as “discomfort”, “issues”, “illness”, “history”.

For the CRF model based approach, the full set of medical concepts enables the system to achieve a slightly better performance by retrieving more relevant documents, albeit at low ranks in the retrieval results. The MetaMap + SVM based approach outperformed the all concept approach by 30 %. Despite the lower MAP score, this method was able to link more relevant topics at the top of the retrieval results as indicated by the higher P@10 score.

5.4 Two stage queries

The Two stage system results are shown in Table 3, rows 10 and 11. This approach outperforms all the other approaches. Using Key concepts (MetaMap + SVM) as the initial query performs better than using the Key concepts (CRF) system. The improvement is statistically significant over systems not using key concept identification.

5.5 Query expansion

The expansion technique using co-occurring concepts did not prove beneficial as the MAP score of 0.0971 is comparable to using all the concepts. The main reason for the low performance is due to the nature of the co-occurrence data. For instance, symptoms can be observed for a multitude of different diseases, symptoms and diseases can occur at various locations in the human body. These different relationships are grouped as one co-occurrence relationship, making the expansion too diverse to be relevant. Examples of suboptimal expansions include “malnutrition” expanded with “anemia”, “nausea” expanded with “constipation”, and “bronchitis” expanded with “AIDS”.

Incorporating pseudo relevance feedback, on the other hand, provides improvements for some of the previous approaches. System performance is shown in Table 6. This approach when using initial queries from SVM outperforms all the other methods, both in the MAP score and the P@10 score.

Table 6 System performance with pseudo-relevance feedback

6 Conclusions, limitations and future plan

It has been shown that accessing the patients’ own EHR can enhance their understanding and provide clinically relevant benefits. However, the difficult language in EHR notes and limited average health literacy present a challenge. To address these problems, we are developing NoteAid to retrieve EHR note-specific and therefore patient-tailored online consumer-oriented health education materials.

In our experiments, we have shown that using the full text of an EHR note is ineffective at retrieving relevant education materials. Identifying key concepts of an EHR note and then using the key concepts as query terms results in significantly improved performance. Furthermore, a query expansion approach in which key concepts are complemented by other medical concepts from pseudo-relevant documents outperforms other approaches, such as topic models or simple aggregation of medical concepts as queries.

One limitation of our design is that only one physician provided relevancy judgments. Additional annotators would provide a more rigorous set of gold standard, allowing us to measure inter-annotator agreement. We would also benefit from pooling results from multiple search engines to improve the gold standard. Secondly, our education material collection only includes the MedlinePlus documents, which may not cover all the topics that a user needs.

There are several directions we can explore in our future research. Firstly, our key concept identification methods are not optimized for the retrieval results, but for the identification subtask only. We hypothesize that directly optimizing the key concept identifier for retrieval would lead to better performance. In addition, we would explore better query expansion methods to improve the current query generation methods. Finally, we also plan to evaluate whether patients’ comprehension improves when their EHR notes are linked to education materials.