Skip to main content
Log in

Prior tissue knowledge-driven contrastive learning for brain CT report generation

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

Writing medical reports for brain computed tomography (CT) is essential for radiologists to diagnose cerebrovascular diseases. Recent advances in medical report generation have driven significant progress in producing accurate descriptions of radiology imaging, especially for chest X-rays. Different from the mainstream chest X-ray report generation task, producing a brain CT report faces extreme challenges for language models: (1) Severe visual data bias led by multiple serialized images and sparse lesions, and (2) serious textual data bias led by unbalanced distributions of pathological words. To alleviate the significant visual and textual data bias, we propose a prior tissue knowledge-driven contrastive learning model to improve brain CT report generation. Specifically, we first summarize prior tissue knowledge from the perspectives of visual and textual modalities, including Scan-Tissue and Report-Tissue labels, to depict the clinical experience of brain specialists and enhance the feature representations. Then, driven by prior tissue knowledge, a multi-label retrieval-based contrastive learning module is proposed to effectively separate positive and negative imaging-report pairs by decreasing the disturbance made by hard-negative samples. In this way, the model can learn the essential and generalized consistency between visual and textual features, which is able to relieve the multimodal data bias and boost the generation of high-quality reports. We comprehensively compare the model with previous state-of-the-art methods on the BCT-CHR dataset. The remarkable performance of our model demonstrates that our knowledge-aware contrastive learning paradigm can effectively benefit the brain CT report generation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data availability

The dataset is not publicly available due to we do not have permission to make it public, but is available from the corresponding author’s reasonable request.

Notes

  1. https://github.com/fxsjy/jieba.

References

  1. Jing, B., Xie, P., Xing, E.P.: On the automatic generation of medical imaging reports. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL (2018)

  2. Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator.In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2015)

  3. Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A.C., Salakhutdinov, R., Zemel, R.S., Bengio, Y.: Show, attend and tell: neural image caption generation with visual attention. In: Proceedings of the 32nd International Conference on Machine Learning, ICML (2015)

  4. Anderson, P., He, X., Buehler, C., Teney, D., Johnson, M., Gould, S., Zhang, L.: Bottom-up and top-down attention for image captioning and visual question answering. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2018)

  5. Cornia, M., Stefanini, M., Baraldi, L., Cucchiara, R.: Meshed-memory transformer for image captioning. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR (2020)

  6. Luo, Y., Ji, J., Sun, X., Cao, L., Wu, Y., Huang, F., Lin, C., Ji, R.: Dual-level collaborative transformer for image captioning. In: Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021 (2021)

  7. Wang, P., Yang, A., Men, R., Lin, J., Bai, S., Li, Z., Ma, J., Zhou, C., Zhou, J., Yang, H.: OFA: Unifying architectures, tasks, and modalities through a simple sequence-to-sequence learning framework. In: International Conference on Machine Learning, ICML (2022)

  8. Li, J., Li, D., Xiong, C., Hoi, S.C.H.: BLIP: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In: International Conference on Machine Learning, ICML (2022)

  9. Li, Y., Liang, X., Hu, Z., Xing, E.P.: Hybrid Retrieval-generation reinforced agent for medical image report generation. Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems, NeurIPS (2018)

  10. Yuan, J., Liao, H., Luo, R., Luo, J.: Automatic radiology report generation based on multi-view image fusion and medical concept enrichment. Medical Image Computing and Computer Assisted Intervention, MICCAI (2019)

  11. Chen, Z., Song, Y., Chang, T., Wan, X.: Generating radiology reports via memory-driven transformer. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP (2020)

  12. Chen, Z., Shen, Y., Song, Y., Wan, X.: Cross-modal memory networks for radiology report generation. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP (2021)

  13. Yang, S., Ji, J., Zhang, X., Liu, Y., Wang, Z.: Weakly Guided Hierarchical Encoder-Decoder Network for Brain CT Report Generation. In: IEEE International Conference on Bioinformatics and Biomedicine, BIBM (2021)

  14. Yan, A., He, Z., Lu, X., Du, J., Chang, E.Y., Gentili, A., McAuley, J.J., Hsu, C.: Weakly supervised contrastive learning for chest x-ray report generation. EMNLP, Findings of the Association for Computational Linguistics (2021)

  15. Wang, J., Bhalerao, A., He, Y.: Cross-modal prototype driven network for radiology report generation. In: European Conference on Computer Vision ECCV (2022)

  16. Qin, H., Song, Y.: Reinforced cross-modal alignment for radiology report generation. ACL, Findings of the Association for Computational Linguistics (2022)

  17. Song, X., Zhang, X., Ji, J., Liu, Y., Wei, P.: Cross-modal contrastive attention model for medical report generation. In: Proceedings of the 29th International Conference on Computational Linguistics, COLING (2022)

  18. Zhang, Y., Wang, X., Xu, Z., Yu, Q., Yuille, A.L., Xu, D.: When radiology report generation meets knowledge graph. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020 (2020)

  19. Liu, F., Wu, X., Ge, S., Fan, W., Zou, Y.: Exploring and distilling posterior and prior knowledge for radiology report generation. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2021)

  20. Shi, Y., Ji, J., Zhang, X., Qu, L., Liu, Y.: Granularity matters: pathological graph-driven cross-modal alignment for brain ct report generation. In: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP (2023)

  21. Zhang, X., Yang, S., Shi, Y., Ji, J., Liu, Y., Wang, Z., Xu, H.: Weakly guided attention model with hierarchical interaction for brain CT report generation. Comput. Biol. Med. 167, 107650 (2023)

    Article  Google Scholar 

  22. Xue, Y., Xu, T., Long, L.R., Xue, Z., Antani, S.K., Thoma, G.R., Huang, X.: Multimodal recurrent model with attention for automated radiology report generation. In: Medical Image Computing and Computer Assisted Intervention MICCAI (2018)

  23. Yang, S., Wu, X., Ge, S., Zhou, S.K., Xiao, L.: Knowledge matters: chest radiology report generation with general and specific knowledge. Med. Image Anal. 80, 102510 (2022)

    Article  Google Scholar 

  24. Zhang, K., Jiang, H., Zhang, J., Huang, Q., Fan, J., Yu, J., Han, W.: Semi-supervised medical report generation via graph-guided hybrid feature consistency. In: IEEE Transactions on Multimedia (2023)

  25. Li, M., Lin, B., Chen, Z., Lin, H., Liang, X., Chang, X.: Dynamic graph enhanced contrastive learning for chest x-ray report generation. Preprint at https://doi.org/10.48550/arXiv.2303.10323 (2023)

  26. Wang, Z., Zhou, L., Wang, L., Li, X.: A self-boosting framework for automated radiographic report generation. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2021)

  27. Chen, Y., Shen, W., Chung, H., Chiu, J., Juan, D., Ho, T., Cheng, C., Li, M., Ho, T.: Representative image feature extraction via contrastive learning pretraining for chest x-ray report generation. Preprint at https://doi.org/10.48550/arXiv.2209.01604 (2022)

  28. Wu, X., Li, J., Wang, J., Qian, Q.: Multimodal contrastive learning for radiology report generation. J. Ambient Intell. Hum. Comput. 14(8), 11185–11194 (2023)

    Article  Google Scholar 

  29. Pampari, A., Raghavan, P., Liang, J.J., Peng, J.: emrQA: A large corpus for question answering on electronic medical records. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP (2018)

  30. Zhang, Y., Ding, D.Y., Qian, T., Manning, C.D., Langlotz, C.P.: Learning to summarize radiology findings. In: Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis, Louhi@EMNLP (2018)

  31. Wang, N., Song, Y., Xia, F.: Coding structures and actions with the COSTA scheme in medical conversations. In: Proceedings of the BioNLP workshop (2018)

  32. Tian, Y., Ma, W., Xia, F., Song, Y.: ChiMed: A chinese medical corpus for question answering. In: Proceedings of the 18th BioNLP Workshop and Shared Task, BioNLP@ACL (2019)

  33. Wu, Z., Xiong, Y., Yu, S.X., Lin, D.: Unsupervised feature learning via non-parametric instance discrimination. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2018)

  34. Oord, A., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. Preprint at http://arxiv.org/abs/1807.03748 (2018)

  35. Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition CVPR (2005)

  36. Xu, R., Luo, F., Wang, C., Chang, B., Huang, J., Huang, S., Huang, F.: From dense to sparse: contrastive pruning for better pre-trained language model compression. In: Thirty-Sixth AAAI Conference on Artificial Intelligence, AAAI 2022, Thirty-Fourth Conference on Innovative Applications of Artificial Intelligence, IAAI 2022, The Twelveth Symposium on Educational Advances in Artificial Intelligence, EAAI 2022 Virtual Event, February 22 - March 1, 2022 (2022)

  37. Rani, A., Yadav, P., Verma, Y.: Early-stage autism diagnosis using action videos and contrastive feature learning. Multimedia Syst. 29(5), 2603–2614 (2023)

    Article  Google Scholar 

  38. Zhang, Z., Ding, J., Yu, J., Yuan, Y., Fan, J.: Import vertical characteristic of rain streak for single image deraining. Multimedia Syst. 29(1), 105–115 (2023)

    Article  Google Scholar 

  39. Zhang, H., Si, N., Chen, Y., Zhang, W., Yang, X., Qu, D., Zhang, W.: Improving speech translation by cross-modal multi-grained contrastive learning. IEEE ACM Trans. Audio Speech Lang. Process. 31, 1075–1086 (2023)

    Article  Google Scholar 

  40. Chen, Q., Li, F., Xu, G., Yan, M., Zhang, J., Zhang, Y.: DictBERT: dictionary description knowledge enhanced language model pre-training via contrastive learning. In: Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI (2022)

  41. Xu, S., Zhang, X., Wu, Y., Wei, F.: Sequence level contrastive learning for text summarization. thirty-sixth aaai conference on artificial intelligence, AAAI 2022, In: Thirty-Fourth Conference on Innovative Applications of Artificial Intelligence, IAAI 2022, The Twelveth Symposium on Educational Advances in Artificial Intelligence, EAAI 2022 Virtual Event, February 22 - March 1, 2022 (2022)

  42. Yang, S., Wu, X., Ge, S., Zheng, Z., Zhou, S.K., Xiao, L.: Radiology report generation with a learned knowledge base and multi-modal alignment. Med. Image Anal. 86, 102798 (2023)

    Article  Google Scholar 

  43. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2016)

  44. MacQueen, J.: Classification and analysis of multivariate observations. In: 5th Berkeley Symp. Math. Statist. Probability, pp. 281–297 (1967)

  45. Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR (2009)

  46. Chilamkurthy, S., Ghosh, R., Tanamala, S., Biviji, M., Campeau, N.G., Venugopal, V.K., Mahajan, V., Rao, P., Warier, P.: Development and validation of deep learning algorithms for detection of critical findings in head CT scans. Preprint at http://arxiv.org/abs/1803.05854 (2018)

  47. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. 3rd International Conference on Learning Representations, ICLR (2015)

  48. Papineni, K., Roukos, S., Ward, T., Zhu, W.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, ACL (2002)

  49. Lavie, A., Agarwal, A.: METEOR: An automatic metric for MT evaluation with high levels of correlation with human judgments. In: Proceedings of the Second Workshop on Statistical Machine Translation, WMT@ACL (2007)

  50. Lin, C.-Y.: Rouge: a package for automatic evaluation of summaries. In: Text Summarization Branches Out, pp. 74–81 (2004)

  51. Vedantam, R., Zitnick, C.L., Parikh, D.: CIDEr: consensus-based image description evaluation. IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2015)

  52. Krause, J., Johnson, J., Krishna, R., Fei-Fei, L.: A hierarchical approach for generating descriptive image paragraphs. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2017)

  53. Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., Sutskever, I.: Learning transferable visual models from natural language supervision. In: Proceedings of the 38th International Conference on Machine Learning, ICML (2021)

Download references

Author information

Authors and Affiliations

Authors

Contributions

Yanzhao Shi, Xiaodan Zhang and Ying Liu contributed to the study conception and design. Yanzhao Shi wrote the first draft of the manuscript, and Xiaodan Zhang and Junzhong Ji performed the review and editing. Ying Liu, Zheng Wang, and Huimin Xu performed data collection and clinical analysis. Ying Liu contributed crucial medical knowledge to the work. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Xiaodan Zhang or Ying Liu.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Communicated by B. Bao.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shi, Y., Ji, J., Zhang, X. et al. Prior tissue knowledge-driven contrastive learning for brain CT report generation. Multimedia Systems 30, 98 (2024). https://doi.org/10.1007/s00530-024-01289-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00530-024-01289-w

Keywords

Navigation