Contrastive Learning with Counterfactual Explanations for Radiology Report Generation

Li, Mingjie; Lin, Haokun; Qiu, Liang; Liang, Xiaodan; Chen, Ling; Elsaddik, Abdulmotaleb; Chang, Xiaojun

doi:10.1007/978-3-031-72775-7_10

Mingjie Li¹³,
Haokun Lin¹⁴,
Liang Qiu¹³,
Xiaodan Liang¹⁴,
Ling Chen¹⁵,
Abdulmotaleb Elsaddik¹⁴ &
…
Xiaojun Chang^14,16

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15101))

Included in the following conference series:

European Conference on Computer Vision

394 Accesses

Abstract

Due to the common content of anatomy, radiology images with their corresponding reports exhibit high similarity. Such inherent data bias can predispose automatic report generation models to learn entangled and spurious representations resulting in misdiagnostic reports. To tackle these, we propose a novel CounterFactual Explanations-based framework (CoFE) for radiology report generation. Counterfactual explanations serve as a potent tool for understanding how decisions made by algorithms can be changed by asking “what if” scenarios. By leveraging this concept, CoFE can learn non-spurious visual representations by contrasting the representations between factual and counterfactual images. Specifically, we derive counterfactual images by swapping a patch between positive and negative samples until a predicted diagnosis shift occurs. Here, positive and negative samples are the most semantically similar but have different diagnosis labels. Additionally, CoFE employs a learnable prompt to efficiently fine-tune the pre-trained large language model, encapsulating both factual and counterfactual content to provide a more generalizable prompt representation. Extensive experiments on two benchmarks demonstrate that leveraging the counterfactual explanations enables CoFE to generate semantically coherent and factually complete reports and outperform in terms of language generation and clinical efficacy metrics.

Code is available at: https://github.com/mlii0117/CoFE.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Diagnose with Uncertainty Awareness: Diagnostic Uncertainty Encoding Framework for Radiology Report Generation

A Self-guided Framework for Radiology Report Generation

Preliminary assessment of automated radiology report generation with generative pre-trained transformers: comparing results to radiologist-generated reports

Article Open access 15 September 2023

Notes

1.
https://pubmed.ncbi.nlm.nih.gov.

References

Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. arXiv preprint arXiv:1607.06450 (2016)
Banerjee, S., Lavie, A.: Meteor: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pp. 65–72 (2005)
Google Scholar
Chen, Z., Shen, Y., Song, Y., Wan, X.: Cross-modal memory networks for radiology report generation. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 5904–5914 (2021)
Google Scholar
Chen, Z., Song, Y., Chang, T., Wan, X.: Generating radiology reports via memory-driven transformer. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (2020)
Google Scholar
Dai, X., Keane, M.T., Shalloo, L., Ruelle, E., Byrne, R.M.J.: Counterfactual explanations for prediction and diagnosis in XAI. In: Conitzer, V., Tasioulas, J., Scheutz, M., Calo, R., Mara, M., Zimmermann, A. (eds.) AIES 2022: AAAI/ACM Conference on AI, Ethics, and Society, Oxford, United Kingdom, 19–21 May 2021, pp. 215–226. ACM (2022). https://doi.org/10.1145/3514094.3534144
Demner-Fushman, D., et al.: Preparing a collection of radiology examinations for distribution and retrieval. J. Am. Med. Inform. Assoc. 23(2), 304–310 (2016)
Article Google Scholar
Dosovitskiy, A., et al.: An image is worth 16 $\times $ 16 words: transformers for image recognition at scale. In: International Conference on Learning Representations (2020)
Google Scholar
Fang, Z., Kong, S., Fowlkes, C., Yang, Y.: Modularized textual grounding for counterfactual resilience. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6378–6388 (2019)
Google Scholar
Fischer, M., Bartler, A., Yang, B.: Prompt tuning for parameter-efficient medical image segmentation. CoRR abs/2211.09233 (2022). https://doi.org/10.48550/arXiv.2211.09233
Gu, Y., et al.: Domain-specific language model pretraining for biomedical natural language processing (2020)
Google Scholar
Guo, H., Tan, B., Liu, Z., Xing, E.P., Hu, Z.: Efficient (soft) Q-learning for text generation with limited good data. In: Goldberg, Y., Kozareva, Z., Zhang, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2022, Abu Dhabi, United Arab Emirates, 7–11 December 2022, pp. 6969–6991. Association for Computational Linguistics (2022). https://doi.org/10.18653/v1/2022.findings-emnlp.518
He, X., et al.: CPL: counterfactual prompt learning for vision and language models. In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp. 3407–3418 (2022)
Google Scholar
Irvin, J., et al.: CheXpert: a large chest radiograph dataset with uncertainty labels and expert comparison. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 590–597 (2019)
Google Scholar
Jain, S., et al.: RadGraph: extracting clinical entities and relations from radiology reports. In: Thirty-Fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1) (2021)
Google Scholar
Ji, X., Chen, J., Wu, X.: Counterfactual inference for visual relationship detection in videos. In: 2023 IEEE International Conference on Multimedia and Expo (ICME), pp. 162–167. IEEE (2023)
Google Scholar
Jin, H., Che, H., Lin, Y., Chen, H.: PromptMRG: diagnosis-driven prompts for medical report generation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 3, pp. 2607–2615 (2024)
Google Scholar
Jing, B., Xie, P., Xing, E.: On the automatic generation of medical imaging reports. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 2577–2586 (2018)
Google Scholar
Johnson, A.E.W., et al.: MIMIC-CXR: a large publicly available database of labeled chest radiographs. arXiv preprint arXiv:1901.07042 (2019)
Kim, J., Kim, M., Ro, Y.M.: Interpretation of lesional detection via counterfactual generation. In: 2021 IEEE International Conference on Image Processing (ICIP), pp. 96–100. IEEE (2021)
Google Scholar
Li, C.Y., Liang, X., Hu, Z., Xing, E.P.: Knowledge-driven encode, retrieve, paraphrase for medical image report generation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 6666–6673 (2019)
Google Scholar
Li, J., Li, D., Xiong, C., Hoi, S.C.H.: BLIP: bootstrapping language-image pre-training for unified vision-language understanding and generation. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17–23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 12888–12900. PMLR (2022). https://proceedings.mlr.press/v162/li22n.html
Li, M., Cai, W., Verspoor, K., Pan, S., Liang, X., Chang, X.: Cross-modal clinical graph transformer for ophthalmic report generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20656–20665 (2022)
Google Scholar
Li, M., Lin, B., Chen, Z., Lin, H., Liang, X., Chang, X.: Dynamic graph enhanced contrastive learning for chest X-ray report generation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, 17–24 June 2023, pp. 3334–3343. IEEE (2023). https://doi.org/10.1109/CVPR52729.2023.00325
Li, M., Liu, R., Wang, F., Chang, X., Liang, X.: Auxiliary signal-guided knowledge encoder-decoder for medical report generation. In: World Wide Web, pp. 1–18 (2022)
Google Scholar
Lin, C.Y.: ROUGE: a package for automatic evaluation of summaries. In: Text Summarization Branches Out. Association for Computational Linguistics, July 2004
Google Scholar
Liu, F., Ge, S., Wu, X.: Competence-based multimodal curriculum learning for medical report generation. arXiv preprint arXiv:2206.14579 (2022)
Liu, F., Wu, X., Ge, S., Fan, W., Zou, Y.: Exploring and distilling posterior and prior knowledge for radiology report generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13753–13762 (2021)
Google Scholar
Liu, F., Yin, C., Wu, X., Ge, S., Zhang, P., Sun, X.: Contrastive attention for automatic chest X-ray report generation. In: Findings of the Association for Computational Linguistics, pp. 269–280 (2021)
Google Scholar
Liu, X., Ji, K., Fu, Y., Du, Z., Yang, Z., Tang, J.: P-Tuning v2: prompt tuning can be comparable to fine-tuning universally across scales and tasks. CoRR abs/2110.07602 (2021). https://arxiv.org/abs/2110.07602
Lu, J., Xiong, C., Parikh, D., Socher, R.: Knowing when to look: adaptive attention via a visual sentinel for image captioning. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2017, pp. 3242–3250. IEEE Computer Society (2017). https://doi.org/10.1109/CVPR.2017.345
Mohsan, M.M., Akram, M.U., Rasool, G., Alghamdi, N.S., Abdullah-Al-Wadud, M., Abbas, M.: Vision transformer and language model based radiology report generation. IEEE Access 11, 1814–1824 (2023). https://doi.org/10.1109/ACCESS.2022.3232719
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, July 2002
Google Scholar
Peng, Z., Hui, K.M., Liu, C., Zhou, B.: Learning to simulate self-driven particles system with coordinated policy optimization. In: Advances in Neural Information Processing Systems, vol. 34 (2021)
Google Scholar
Radford, A., et al.: Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9 (2019)
Google Scholar
Shin, T., Razeghi, Y., Logan IV, R.L., Wallace, E., Singh, S.: AutoPrompt: eliciting knowledge from language models with automatically generated prompts. arXiv preprint arXiv:2010.15980 (2020)
Tanida, T., Müller, P., Kaissis, G., Rueckert, D.: Interactive and explainable region-guided radiology report generation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, 17–24 June 2023, pp. 7433–7442. IEEE (2023). https://doi.org/10.1109/CVPR52729.2023.00718
Tanyel, T., Ayvaz, S., Keserci, B.: Beyond known reality: exploiting counterfactual explanations for medical research. CoRR abs/2307.02131 (2023). https://doi.org/10.48550/arXiv.2307.02131
Touvron, H., et al.: LLaMA: open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023)
Tu, T., et al.: Towards generalist biomedical AI. CoRR abs/2307.14334 (2023). https://doi.org/10.48550/arXiv.2307.14334
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Vedantam, R., Lawrence Zitnick, C., Parikh, D.: CIDEr: consensus-based image description evaluation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)
Google Scholar
Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, 7–12 June 2015, pp. 3156–3164. IEEE Computer Society (2015). https://doi.org/10.1109/CVPR.2015.7298935
Virgolin, M., Fracaros, S.: On the robustness of sparse counterfactual explanations to adverse perturbations. Artif. Intell. 316, 103840 (2023). https://doi.org/10.1016/j.artint.2022.103840
Voutharoja, B.P., Wang, L., Zhou, L.: Automatic radiology report generation by learning with increasingly hard negatives. CoRR abs/2305.07176 (2023). https://doi.org/10.48550/arXiv.2305.07176
Wang, Z., Liu, L., Wang, L., Zhou, L.: METransformer: radiology report generation by transformer with multiple learnable expert tokens. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, 17–24 June 2023, pp. 11558–11567. IEEE (2023). https://doi.org/10.1109/CVPR52729.2023.01112
Wang, Z., Zhou, L., Wang, L., Li, X.: A self-boosting framework for automated radiographic report generation. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, Virtual, 19–25 June 2021, pp. 2433–2442. Computer Vision Foundation/IEEE (2021). https://doi.org/10.1109/CVPR46437.2021.00246, https://openaccess.thecvf.com/content/CVPR2021/html/Wang_A_Self-Boosting_Framework_for_Automated_Radiographic_Report_Generation_CVPR_2021_paper.html
Xu, D., et al.: Vision-knowledge fusion model for multi-domain medical report generation. Inf. Fusion 97, 101817 (2023). https://doi.org/10.1016/j.inffus.2023.101817
Yang, S., Wu, X., Ge, S., Zhou, S., Xiao, L.: Knowledge matters: chest radiology report generation with general and specific knowledge. Med. Image Anal. 80, 102510 (2022)
Article Google Scholar
Yang, S., Wu, X., Ge, S., Zheng, Z., Zhou, S.K., Xiao, L.: Radiology report generation with a learned knowledge base and multi-modal alignment. Med. Image Anal. 86, 102798 (2023). https://doi.org/10.1016/j.media.2023.102798
Yang, Z., Liu, Y., Ouyang, C., Ren, L., Wen, W.: Counterfactual can be strong in medical question and answering. Inf. Process. Manag. 60(4), 103408 (2023). https://doi.org/10.1016/j.ipm.2023.103408
Zhang, Y., Wang, X., Xu, Z., Yu, Q., Yuille, A., Xu, D.: When radiology report generation meets knowledge graph. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 12910–12917 (2020)
Google Scholar
Zhou, K., Yang, J., Loy, C.C., Liu, Z.: Conditional prompt learning for vision-language models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16816–16825 (2022)
Google Scholar
Zhou, K., Yang, J., Loy, C.C., Liu, Z.: Learning to prompt for vision-language models. Int. J. Comput. Vision 130(9), 2337–2348 (2022)
Article Google Scholar

Download references

Acknowledge

This work is supported by ARC DP210101347.

Author information

Authors and Affiliations

Stanford University, Palo Alto, CA, 94305, USA
Mingjie Li & Liang Qiu
MBZUAI, Abu Dhabi, UAE
Haokun Lin, Xiaodan Liang, Abdulmotaleb Elsaddik & Xiaojun Chang
University of Technology Sydney, Ultimo, NSW, 2007, Australia
Ling Chen
School of Information Science and Technology, University of Science and Technology of China, Hefei, China
Xiaojun Chang

Authors

Mingjie Li
View author publications
You can also search for this author in PubMed Google Scholar
Haokun Lin
View author publications
You can also search for this author in PubMed Google Scholar
Liang Qiu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaodan Liang
View author publications
You can also search for this author in PubMed Google Scholar
Ling Chen
View author publications
You can also search for this author in PubMed Google Scholar
Abdulmotaleb Elsaddik
View author publications
You can also search for this author in PubMed Google Scholar
Xiaojun Chang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaodan Liang .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Hessen, Germany
Stefan Roth
Princeton University, Palo Alto, CA, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, M. et al. (2025). Contrastive Learning with Counterfactual Explanations for Radiology Report Generation. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15101. Springer, Cham. https://doi.org/10.1007/978-3-031-72775-7_10

Download citation

DOI: https://doi.org/10.1007/978-3-031-72775-7_10
Published: 30 September 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72774-0
Online ISBN: 978-3-031-72775-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Contrastive Learning with Counterfactual Explanations for Radiology Report Generation