Semi-Supervised Information Extraction for Cancer Pathology Reports
- ORNL
- LSUHSC-Louisiana Tumor Registry
Pathology reports are a main source of data for cancer surveillance programs. Manual coding of pathology reports is labor-intensive but necessary for obtaining labeled data to train automated information extraction systems. In this study, we investigated semi-supervised deep learning, improving the performance of a multitask information extraction system for automated annotation of pathology reports. We used a set of over 374,000 pathology reports from the Louisiana Tumor Registry and a novel convolutional attention-based auto-encoder. We performed a set of experiments comparing supervised training augmented with unlabeled data at 1%, 5%, 10%, and 50% of the original data size. We also compared the impact of extending text processing to include unlabeled tokens. We find that semi-supervised training consistently improved individual performance with increased micro-averaged F-scores between 0.012 and 0.064 and increased macro-averaged F-scores of up to 0.158. This demonstrates that semantic information learned via unsupervised learning can be used to improve supervised clinical task performance.
- Research Organization:
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- AC05-00OR22725
- OSTI ID:
- 1564225
- Resource Relation:
- Conference: IEEE EMBS International Conference on Biomedical & Health Informatics (IEEE-EMBS BHI 2019) - Chicago, Illinois, United States of America - 5/19/2019 8:00:00 AM-5/22/2019 8:00:00 AM
- Country of Publication:
- United States
- Language:
- English
Hierarchical attention networks for information extraction from cancer pathology reports
|
journal | November 2017 |
Scalable deep text comprehension for Cancer surveillance on high-performance computing
|
journal | December 2018 |
Explainable Prediction of Medical Codes from Clinical Text
|
conference | January 2018 |
Deep learning
|
journal | May 2015 |
Natural Language Processing in Medicine: An Overview
|
journal | September 1996 |
Similar Records
Extraction of Tumor Site from Cancer Pathology Reports using Deep Filters
Semi-supervised learning approach for crack detection and identification in advanced gas-cooled reactor graphite bricks - 111