Evaluating automated entity extraction with respect to drug and non-drug treatment strategies

https://doi.org/10.1016/j.jbi.2019.103177Get rights and content
Under an Elsevier user license
open archive

Highlights

  • MetaMap performs better on drug than non-drug treatments (F1 = 0.77 vs. 0.64).

  • Combination approach improves MetaMap for total treatments (F1 = 0.76 vs. 0.72).

  • Non-drug treatments benefit most from the combination approach.

Abstract

Objectives

Treatment used in a randomized clinical trial is a critical data element both for physicians at the point of care and reviewers who are evaluating different interventions. Much of existing work on treatment extraction from the biomedical literature has focused on the extraction of pharmacological interventions. However, non-pharmacological interventions (e.g., exercise, diet, etc.) that are frequently used to address chronic conditions are less well studied. The goal of this study is to compare knowledge-based and machine learning strategies for the extraction of both drug and non-drug treatments.

Methods

We collected 800 randomized clinical trial abstracts each for breast cancer and diabetes from PubMed. The treatments in the result/conclusion sentences of the abstracts were manually annotated and marked as drug/non-drug treatments. We then designed three methods to identify the treatments and evaluated the systems with respect to drug/non-drug treatments. The first method is solely based on knowledge base (here we used MetaMap). The second method is based on a machine learning model trained mainly on contextual features (ML_only). The third method is a combination approach that integrates the previous two approaches.

Results/discussion

Results show that MetaMap, when used with high precision semantic types, has better performance for drug compared to non-drug treatments (F1 = 0.77 vs. 0.64). The ML_only approach has smaller performance difference between drug and non-drug treatments compared with the KB-based approach (F1 = 0.02 vs. 0.05, 0.07, and 0.13). The combination approach achieves significantly better performance than all MetaMap approaches alone for total treatments (F1 = 0.76 vs. 0.72, p < 0.001). The performance gain mainly comes from the non-drug treatments (0.03–0.08 improvement in F1), while the drug treatments do not benefit much from the combination approach (0–0.03 improvement in F1).

Conclusion

These results suggest that a knowledge-based approach should be employed for medical conditions that are primarily treated with drugs whereas conditions that are treated with either a combination of drug and non-drug interventions or primarily non-drug interventions should use automated tools that combine machine learning and a knowledge-based approach to achieve optimal performance.

Keywords

Treatment extraction
Entity recognition
MetaMap
Machine learning

Cited by (0)