Abstract
One of the primary challenges in the development of Chest X-Ray (CXR) interpretation models has been the lack of large datasets with multilabel image annotations extracted from radiology reports. This paper proposes a CXR labeler that can simultaneously extracts fourteen observations from free-text radiology reports as positive or negative, abbreviated as CXRlabeler. It fine-tunes a pre-trained language model, AWD-LSTM, to the corpus of CXR radiology impressions and then uses it as the base of the multilabel classifier. Experimentation demonstrates that a language model fine-tuning increases the classifier F1 score by 12.53%. Overall, CXRlabeler achieves a 96.17% F1 score on the MIMIC-CXR dataset. To further test the generalization of the CXRlabeler model, it is tested on the PadChest dataset. This testing shows that the CXRlabeler approach is helpful in a different language environment, and the model (available at https://github.com/MaramMonshi/CXRlabeler) can assist researchers in labeling CXR datasets with fourteen observations.
This material is based upon work supported by Google Cloud Research credit program.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Peng, Y., Wang, X., Lu, L., Bagheri, M., Summers, R., Lu, Z.: NegBio: a high-performance tool for negation and uncertainty detection in radiology reports. AMIA Summits Trans. Sci. Proc. 2018, 188 (2018)
Irvin, J., et al.: CheXpert: a large chest radiograph dataset with uncertainty labels and expert comparison. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 590–597 (2019)
McDermott, M.B.A., Hsu, T.M.H., Weng, W.-H.,M. Ghassemi, Szolovits, P.: CheXpert++: approximating the CheXpert labeler for speed, differentiability, and probabilistic output. In: Machine Learning for Healthcare Conference, pp. 913–927. PMLR (2020)
Demner-Fushman, D., et al.: Preparing a collection of radiology examinations for distribution and retrieval. J. Am. Med. Inf. Assoc. 23(2), 304–310 (2016)
Bustos, A., Pertusa, A., Salinas, J.-M., de la Iglesia-Vayá, M.: PadChest: a large chest x-ray image dataset with multi-label annotated reports. Med. Image Anal. 66, 101797 (2020)
Johnson, A.E.W., et al.: MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Sci. Data 6, 1–8 (2019)
Mańdziuk, J., Żychowski, A.: Dimensionality reduction in multilabel classification with neural networks. In: 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2019)
Monshi, M.M.A., Poon, J., Chung, V.: Deep learning in generating radiology reports: a survey. Artif. Intell. Med. 106, 101878 (2020)
Merity, S., Keskar, N.S., Socher, R.: Regularizing and optimizing LSTM language models. arXiv preprint arXiv:1708.02182 (2017)
Smit, A., Jain, S., Rajpurkar, P., Pareek, A., Ng, A.Y., Lungren, M.P.: CheXbert: combining automatic labelers and expert annotations for accurate radiology report labeling using BERT. arXiv preprint arXiv:2004.09167 (2020)
Aronson, A.R., Lang, F.-M.: An overview of MetaMap: historical perspective and recent advances. J. Am. Med. Inform. Assoc. 17(3), 229–236 (2010)
Wang, X., Peng, Y., Lu, L., Lu, Z., Bagheri, M., Summers, R.M.: ChestX-ray8: hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2097–2106 (2017)
Oakden-Rayner, L.: Exploring large-scale public medical image datasets. Acad. Radiol. 27(1), 106–112 (2020)
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Alsentzer, E., et al.: Publicly available clinical BERT embeddings. arXiv preprint arXiv:1904.03323 (2019)
Mullenbach, J., Wiegreffe, S., Duke, J., Sun, J., Eisenstein, J.: Explainable prediction of medical codes from clinical text. In: NAACL-HLT (2018)
Liventsev, V., Fedulova, I., Dylov, D.: Deep text prior: weakly supervised learning for assertion classification. In: Tetko, I.V., Kůrková, V., Karpov, P., Theis, F. (eds.) ICANN 2019. LNCS, vol. 11731, pp. 243–257. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30493-5_26
Bodenreider, O.: The unified medical language system (UMLS) integrating biomedical terminology. Nucleic Acids Res. 32(suppl\(\_\)1), D267–D270 (2004)
Merity, S., Xiong, C., Bradbury, J., Socher, R.: Pointer sentinel mixture models. arXiv preprint arXiv:1609.07843 (2016)
Howard, J., Ruder, S.: Universal language model fine-tuning for text classification. arXiv preprint arXiv:1801.06146 (2018)
Ruder. S.: Neural transfer learning for natural language processing (2019)
Becker, C.: Chapter 7 transfer learning for NLP I | modern approaches in natural language processing (2020). https://compstat-lmu.github.io/seminar_nlp_ss20/transfer-learning-for-nlp-i.html#sequential-inductive-transfer-learning
Ketkar, N.: Introduction to PyTorch. In: Deep Learning with Python, pp. 195–208. Apress, Berkeley (2017). https://doi.org/10.1007/978-1-4842-2766-4_12
Howard, J., Gugger, S.: fastai: a layered API for deep learning. Information 11(2), 108 (2020)
Harsha Kadam, S., Paniskaki, K.: Text analysis for email multi label classification. Open Digital Repository (2020)
Jain, S., Smit, A., Ng, A.Y., Rajpurkar, P.: Effect of radiology report labeler quality on deep learning models for chest X-ray interpretation. arXiv preprint arXiv:2104.00793 (2021)
Jain, S., et al.: VisualCheXbert: addressing the discrepancy between radiology report labels and image labels. arXiv preprint arXiv:2102.11467 (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Monshi, M.M.A., Poon, J., Chung, V., Monshi, F.M. (2021). Labeling Chest X-Ray Reports Using Deep Learning. In: Farkaš, I., Masulli, P., Otte, S., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2021. ICANN 2021. Lecture Notes in Computer Science(), vol 12893. Springer, Cham. https://doi.org/10.1007/978-3-030-86365-4_55
Download citation
DOI: https://doi.org/10.1007/978-3-030-86365-4_55
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86364-7
Online ISBN: 978-3-030-86365-4
eBook Packages: Computer ScienceComputer Science (R0)