Abstract
Self-supervised pre-training on unlabeled images has shown promising results in the medical domain. Recently, methods using text-supervision from companion text like radiological reports improved upon these results even further. However, most works in the medical domain focus on image classification downstream tasks and do not study more localized tasks like semantic segmentation or object detection. We therefore propose a novel evaluation framework consisting of 18 localized tasks, including semantic segmentation and object detection, on five public chest radiography datasets. Using our proposed evaluation framework, we study the effectiveness of existing text-supervised methods and compare them with image-only self-supervised methods and transfer from classification in more than 1200 evaluation runs. Our experiments show that text-supervised methods outperform all other methods on 13 out of 18 tasks making them the preferred method. In conclusion, image-only contrastive methods provide a strong baseline if no reports are available while transfer from classification, even in-domain, does not perform well in pre-training for localized tasks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Note that we do only contribute the selection of these datasets and the definition of tasks on them while we do not contribute any new datasets or ground truth labels.
- 2.
Note that NIH CXR is a small subset of the Chestx-ray8 [36] dataset that contains detection targets.
References
Bardes, A., Ponce, J., LeCun, Y.: VICReg: variance-invariance-covariance regularization for self-supervised learning. arXiv:2105.04906 (2021)
Caron, M., Touvron, H., Misra, I., et al.: Emerging properties in self-supervised vision transformers. In: ICCV, pp. 9630–9640 (2021). https://doi.org/10.1109/ICCV48922.2021.00951
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: ICML, pp. 1597–1607 (2020)
Chen, X., He, K.: Exploring simple Siamese representation learning. arXiv:2011.10566 (2020)
Desai, K., Johnson, J.: VirTex: learning visual representations from textual annotations. arXiv:2006.06666 (2020)
Desai, S., Baghal, A., Wongsurawat, T., et al.: Data from chest imaging with clinical and genomic correlates representing a rural COVID-19 positive population. Cancer Imaging Arch. (2020). https://doi.org/10.7937/tcia.2020.py71-5978
Ermolov, A., Siarohin, A., Sangineto, E., Sebe, N.: Whitening for self-supervised representation learning. arXiv:2007.06346 (2020)
Gazda, M., Gazda, J., Plavka, J., Drotar, P.: Self-supervised deep convolutional neural network for chest X-ray classification. arXiv:2103.03055 (2021)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., et al.: PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation 101(23), 215–220 (2000)
Grill, J.B., Strub, F., Altché, F., et al.: Bootstrap your own latent - a new approach to self-supervised learning. In: NeurIPS, pp. 21271–21284 (2020)
He, K., Fan, H., Wu, Y., et al.: Momentum contrast for unsupervised visual representation learning. In: CVPR, pp. 9726–9735 (2020). https://doi.org/10.1109/CVPR42600.2020.00975
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90
Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., et al.: Learning deep representations by mutual information estimation and maximization. arXiv:1808.06670 (2019)
Hénaff, O.J., Srinivas, A., et al.: Data-efficient image recognition with contrastive predictive coding. In: ICML, pp. 4182–4192 (2020)
Irvin, J., Rajpurkar, P., Ko, M., et al.: CheXpert: a large chest radiograph dataset with uncertainty labels and expert comparison. In: AAAI, pp. 590–597 (2019)
JF-Healthcare: object-CXR - automatic detection of foreign objects on chest X-rays. MIDL (2020). https://jfhealthcare.github.io/object-CXR/
Jia, C., Yang, Y., Xia, Y., et al.: Scaling up visual and vision-language representation learning with noisy text supervision. In: ICML, pp. 4904–4916 (2021)
Johnson, A., Lungren, M., Peng, Y., et al.: MIMIC-CXR-JPG - chest radiographs with structured labels (version 2.0.0). PhysioNet (2019). https://doi.org/10.13026/8360-t248
Johnson, A., Pollard, T., Berkowitz, S., et al.: MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Sci. Data 6(317), 1–8 (2019). https://doi.org/10.1038/s41597-019-0322-0
Johnson, A., Pollard, T., Mark, R., Berkowitz, S., Horng, S.: MIMIC-CXR database (version 2.0.0). PhysioNet (2019). https://doi.org/10.13026/C2JT1Q
Li, J., Zhou, P., Xiong, C., Hoi, S.C.H.: Prototypical contrastive learning of unsupervised representations. arXiv:2005.04966 (2021)
Liao, R., et al.: Multimodal representation learning via maximization of local mutual information. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12902, pp. 273–283. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87196-3_26
Liu, Z., Stent, S., Li, J., et al.: LocTex: learning data-efficient visual representations from localized textual supervision. arXiv:2108.11950 (2021)
Misra, I., van der Maaten, L.: Self-supervised learning of pretext-invariant representations. arXiv:1912.01991 (2019)
van den Oord, A., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. arXiv:1807.03748 (2019)
Radford, A., Kim, J.W., Hallacy, C., et al.: Learning transferable visual models from natural language supervision. arXiv:2103.00020 (2021)
Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. arXiv:1804.02767 (2018)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y
Sariyildiz, M.B., Perez, J., Larlus, D.: Learning visual representations with caption annotations. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12353, pp. 153–170. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58598-3_10
Shih, G., Wu, C.C., Halabi, S.S., et al.: Augmenting the national institutes of health chest radiograph dataset with expert annotations of possible pneumonia. Radiol. Artif. Intell. 1 (2019). https://doi.org/10.1148/ryai.2019180041
Society for Imaging Informatics in Medicine: SIIM-ACR pneumothorax segmentation (2019). https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation
Sowrirajan, H., Yang, J., Ng, A.Y., Rajpurkar, P.: MoCo-CXR: MoCo pretraining improves representation and transferability of chest X-ray models. arXiv:2010.05352 (2021)
Sriram, A., Muckley, M., Sinha, K., et al.: COVID-19 prognosis via self-supervised representation learning and multi-image prediction. arXiv:2101.04909 (2021)
Tang, H., Sun, N., Li, Y.: Segmentation model of the opacity regions in the chest X-rays of the COVID-19 patients in the us rural areas and the application to the disease severity. medRxiv (2020). https://doi.org/10.1101/2020.10.19.20215483
Wang, X., Peng, Y., Lu, L., et al.: ChestX-ray8: hospital-scale chest X-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In: CVPR, pp. 3462–3471 (2017). https://doi.org/10.1109/CVPR.2017.369
Wu, Z., Xiong, Y., Yu, S., Lin, D.: Unsupervised feature learning via non-parametric instance discrimination. In: CVPR, pp. 3733–3742 (2018). https://doi.org/10.1109/CVPR.2018.00393
Xie, Z., Lin, Y., Zhang, Z., et al.: Propagate yourself: exploring pixel-level consistency for unsupervised visual representation learning. arXiv:2011.10043 (2020)
Zbontar, J., Jing, L., Misra, I., et al.: Barlow twins: self-supervised learning via redundancy reduction. arXiv:2103.03230 (2021)
Zhang, Y., Jiang, H., Miura, Y., et al.: Contrastive learning of medical visual representations from paired images and text. arXiv:2010.00747 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Müller, P., Kaissis, G., Zou, C., Rueckert, D. (2022). Radiological Reports Improve Pre-training for Localized Imaging Tasks on Chest X-Rays. In: Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds) Medical Image Computing and Computer Assisted Intervention – MICCAI 2022. MICCAI 2022. Lecture Notes in Computer Science, vol 13435. Springer, Cham. https://doi.org/10.1007/978-3-031-16443-9_62
Download citation
DOI: https://doi.org/10.1007/978-3-031-16443-9_62
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16442-2
Online ISBN: 978-3-031-16443-9
eBook Packages: Computer ScienceComputer Science (R0)