Skip to main content

Contrastive Learning with Counterfactual Explanations for Radiology Report Generation

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15101))

Included in the following conference series:

  • 394 Accesses

Abstract

Due to the common content of anatomy, radiology images with their corresponding reports exhibit high similarity. Such inherent data bias can predispose automatic report generation models to learn entangled and spurious representations resulting in misdiagnostic reports. To tackle these, we propose a novel CounterFactual Explanations-based framework (CoFE) for radiology report generation. Counterfactual explanations serve as a potent tool for understanding how decisions made by algorithms can be changed by asking “what if” scenarios. By leveraging this concept, CoFE can learn non-spurious visual representations by contrasting the representations between factual and counterfactual images. Specifically, we derive counterfactual images by swapping a patch between positive and negative samples until a predicted diagnosis shift occurs. Here, positive and negative samples are the most semantically similar but have different diagnosis labels. Additionally, CoFE employs a learnable prompt to efficiently fine-tune the pre-trained large language model, encapsulating both factual and counterfactual content to provide a more generalizable prompt representation. Extensive experiments on two benchmarks demonstrate that leveraging the counterfactual explanations enables CoFE to generate semantically coherent and factually complete reports and outperform in terms of language generation and clinical efficacy metrics.

Code is available at: https://github.com/mlii0117/CoFE.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://pubmed.ncbi.nlm.nih.gov.

References

  1. Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. arXiv preprint arXiv:1607.06450 (2016)

  2. Banerjee, S., Lavie, A.: Meteor: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pp. 65–72 (2005)

    Google Scholar 

  3. Chen, Z., Shen, Y., Song, Y., Wan, X.: Cross-modal memory networks for radiology report generation. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 5904–5914 (2021)

    Google Scholar 

  4. Chen, Z., Song, Y., Chang, T., Wan, X.: Generating radiology reports via memory-driven transformer. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (2020)

    Google Scholar 

  5. Dai, X., Keane, M.T., Shalloo, L., Ruelle, E., Byrne, R.M.J.: Counterfactual explanations for prediction and diagnosis in XAI. In: Conitzer, V., Tasioulas, J., Scheutz, M., Calo, R., Mara, M., Zimmermann, A. (eds.) AIES 2022: AAAI/ACM Conference on AI, Ethics, and Society, Oxford, United Kingdom, 19–21 May 2021, pp. 215–226. ACM (2022). https://doi.org/10.1145/3514094.3534144

  6. Demner-Fushman, D., et al.: Preparing a collection of radiology examinations for distribution and retrieval. J. Am. Med. Inform. Assoc. 23(2), 304–310 (2016)

    Article  Google Scholar 

  7. Dosovitskiy, A., et al.: An image is worth 16 \(\times \) 16 words: transformers for image recognition at scale. In: International Conference on Learning Representations (2020)

    Google Scholar 

  8. Fang, Z., Kong, S., Fowlkes, C., Yang, Y.: Modularized textual grounding for counterfactual resilience. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6378–6388 (2019)

    Google Scholar 

  9. Fischer, M., Bartler, A., Yang, B.: Prompt tuning for parameter-efficient medical image segmentation. CoRR abs/2211.09233 (2022). https://doi.org/10.48550/arXiv.2211.09233

  10. Gu, Y., et al.: Domain-specific language model pretraining for biomedical natural language processing (2020)

    Google Scholar 

  11. Guo, H., Tan, B., Liu, Z., Xing, E.P., Hu, Z.: Efficient (soft) Q-learning for text generation with limited good data. In: Goldberg, Y., Kozareva, Z., Zhang, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2022, Abu Dhabi, United Arab Emirates, 7–11 December 2022, pp. 6969–6991. Association for Computational Linguistics (2022). https://doi.org/10.18653/v1/2022.findings-emnlp.518

  12. He, X., et al.: CPL: counterfactual prompt learning for vision and language models. In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp. 3407–3418 (2022)

    Google Scholar 

  13. Irvin, J., et al.: CheXpert: a large chest radiograph dataset with uncertainty labels and expert comparison. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 590–597 (2019)

    Google Scholar 

  14. Jain, S., et al.: RadGraph: extracting clinical entities and relations from radiology reports. In: Thirty-Fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1) (2021)

    Google Scholar 

  15. Ji, X., Chen, J., Wu, X.: Counterfactual inference for visual relationship detection in videos. In: 2023 IEEE International Conference on Multimedia and Expo (ICME), pp. 162–167. IEEE (2023)

    Google Scholar 

  16. Jin, H., Che, H., Lin, Y., Chen, H.: PromptMRG: diagnosis-driven prompts for medical report generation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 3, pp. 2607–2615 (2024)

    Google Scholar 

  17. Jing, B., Xie, P., Xing, E.: On the automatic generation of medical imaging reports. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 2577–2586 (2018)

    Google Scholar 

  18. Johnson, A.E.W., et al.: MIMIC-CXR: a large publicly available database of labeled chest radiographs. arXiv preprint arXiv:1901.07042 (2019)

  19. Kim, J., Kim, M., Ro, Y.M.: Interpretation of lesional detection via counterfactual generation. In: 2021 IEEE International Conference on Image Processing (ICIP), pp. 96–100. IEEE (2021)

    Google Scholar 

  20. Li, C.Y., Liang, X., Hu, Z., Xing, E.P.: Knowledge-driven encode, retrieve, paraphrase for medical image report generation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 6666–6673 (2019)

    Google Scholar 

  21. Li, J., Li, D., Xiong, C., Hoi, S.C.H.: BLIP: bootstrapping language-image pre-training for unified vision-language understanding and generation. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17–23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 12888–12900. PMLR (2022). https://proceedings.mlr.press/v162/li22n.html

  22. Li, M., Cai, W., Verspoor, K., Pan, S., Liang, X., Chang, X.: Cross-modal clinical graph transformer for ophthalmic report generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20656–20665 (2022)

    Google Scholar 

  23. Li, M., Lin, B., Chen, Z., Lin, H., Liang, X., Chang, X.: Dynamic graph enhanced contrastive learning for chest X-ray report generation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, 17–24 June 2023, pp. 3334–3343. IEEE (2023). https://doi.org/10.1109/CVPR52729.2023.00325

  24. Li, M., Liu, R., Wang, F., Chang, X., Liang, X.: Auxiliary signal-guided knowledge encoder-decoder for medical report generation. In: World Wide Web, pp. 1–18 (2022)

    Google Scholar 

  25. Lin, C.Y.: ROUGE: a package for automatic evaluation of summaries. In: Text Summarization Branches Out. Association for Computational Linguistics, July 2004

    Google Scholar 

  26. Liu, F., Ge, S., Wu, X.: Competence-based multimodal curriculum learning for medical report generation. arXiv preprint arXiv:2206.14579 (2022)

  27. Liu, F., Wu, X., Ge, S., Fan, W., Zou, Y.: Exploring and distilling posterior and prior knowledge for radiology report generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13753–13762 (2021)

    Google Scholar 

  28. Liu, F., Yin, C., Wu, X., Ge, S., Zhang, P., Sun, X.: Contrastive attention for automatic chest X-ray report generation. In: Findings of the Association for Computational Linguistics, pp. 269–280 (2021)

    Google Scholar 

  29. Liu, X., Ji, K., Fu, Y., Du, Z., Yang, Z., Tang, J.: P-Tuning v2: prompt tuning can be comparable to fine-tuning universally across scales and tasks. CoRR abs/2110.07602 (2021). https://arxiv.org/abs/2110.07602

  30. Lu, J., Xiong, C., Parikh, D., Socher, R.: Knowing when to look: adaptive attention via a visual sentinel for image captioning. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2017, pp. 3242–3250. IEEE Computer Society (2017). https://doi.org/10.1109/CVPR.2017.345

  31. Mohsan, M.M., Akram, M.U., Rasool, G., Alghamdi, N.S., Abdullah-Al-Wadud, M., Abbas, M.: Vision transformer and language model based radiology report generation. IEEE Access 11, 1814–1824 (2023). https://doi.org/10.1109/ACCESS.2022.3232719

  32. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, July 2002

    Google Scholar 

  33. Peng, Z., Hui, K.M., Liu, C., Zhou, B.: Learning to simulate self-driven particles system with coordinated policy optimization. In: Advances in Neural Information Processing Systems, vol. 34 (2021)

    Google Scholar 

  34. Radford, A., et al.: Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9 (2019)

    Google Scholar 

  35. Shin, T., Razeghi, Y., Logan IV, R.L., Wallace, E., Singh, S.: AutoPrompt: eliciting knowledge from language models with automatically generated prompts. arXiv preprint arXiv:2010.15980 (2020)

  36. Tanida, T., Müller, P., Kaissis, G., Rueckert, D.: Interactive and explainable region-guided radiology report generation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, 17–24 June 2023, pp. 7433–7442. IEEE (2023). https://doi.org/10.1109/CVPR52729.2023.00718

  37. Tanyel, T., Ayvaz, S., Keserci, B.: Beyond known reality: exploiting counterfactual explanations for medical research. CoRR abs/2307.02131 (2023). https://doi.org/10.48550/arXiv.2307.02131

  38. Touvron, H., et al.: LLaMA: open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023)

  39. Tu, T., et al.: Towards generalist biomedical AI. CoRR abs/2307.14334 (2023). https://doi.org/10.48550/arXiv.2307.14334

  40. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  41. Vedantam, R., Lawrence Zitnick, C., Parikh, D.: CIDEr: consensus-based image description evaluation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)

    Google Scholar 

  42. Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, 7–12 June 2015, pp. 3156–3164. IEEE Computer Society (2015). https://doi.org/10.1109/CVPR.2015.7298935

  43. Virgolin, M., Fracaros, S.: On the robustness of sparse counterfactual explanations to adverse perturbations. Artif. Intell. 316, 103840 (2023). https://doi.org/10.1016/j.artint.2022.103840

  44. Voutharoja, B.P., Wang, L., Zhou, L.: Automatic radiology report generation by learning with increasingly hard negatives. CoRR abs/2305.07176 (2023). https://doi.org/10.48550/arXiv.2305.07176

  45. Wang, Z., Liu, L., Wang, L., Zhou, L.: METransformer: radiology report generation by transformer with multiple learnable expert tokens. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, 17–24 June 2023, pp. 11558–11567. IEEE (2023). https://doi.org/10.1109/CVPR52729.2023.01112

  46. Wang, Z., Zhou, L., Wang, L., Li, X.: A self-boosting framework for automated radiographic report generation. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, Virtual, 19–25 June 2021, pp. 2433–2442. Computer Vision Foundation/IEEE (2021). https://doi.org/10.1109/CVPR46437.2021.00246, https://openaccess.thecvf.com/content/CVPR2021/html/Wang_A_Self-Boosting_Framework_for_Automated_Radiographic_Report_Generation_CVPR_2021_paper.html

  47. Xu, D., et al.: Vision-knowledge fusion model for multi-domain medical report generation. Inf. Fusion 97, 101817 (2023). https://doi.org/10.1016/j.inffus.2023.101817

  48. Yang, S., Wu, X., Ge, S., Zhou, S., Xiao, L.: Knowledge matters: chest radiology report generation with general and specific knowledge. Med. Image Anal. 80, 102510 (2022)

    Article  Google Scholar 

  49. Yang, S., Wu, X., Ge, S., Zheng, Z., Zhou, S.K., Xiao, L.: Radiology report generation with a learned knowledge base and multi-modal alignment. Med. Image Anal. 86, 102798 (2023). https://doi.org/10.1016/j.media.2023.102798

  50. Yang, Z., Liu, Y., Ouyang, C., Ren, L., Wen, W.: Counterfactual can be strong in medical question and answering. Inf. Process. Manag. 60(4), 103408 (2023). https://doi.org/10.1016/j.ipm.2023.103408

  51. Zhang, Y., Wang, X., Xu, Z., Yu, Q., Yuille, A., Xu, D.: When radiology report generation meets knowledge graph. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 12910–12917 (2020)

    Google Scholar 

  52. Zhou, K., Yang, J., Loy, C.C., Liu, Z.: Conditional prompt learning for vision-language models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16816–16825 (2022)

    Google Scholar 

  53. Zhou, K., Yang, J., Loy, C.C., Liu, Z.: Learning to prompt for vision-language models. Int. J. Comput. Vision 130(9), 2337–2348 (2022)

    Article  Google Scholar 

Download references

Acknowledge

This work is supported by ARC DP210101347.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaodan Liang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, M. et al. (2025). Contrastive Learning with Counterfactual Explanations for Radiology Report Generation. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15101. Springer, Cham. https://doi.org/10.1007/978-3-031-72775-7_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72775-7_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72774-0

  • Online ISBN: 978-3-031-72775-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics