Skip to main content

Medical Report Generation Based on Segment-Enhanced Contrastive Representation Learning

  • Conference paper
  • First Online:
Natural Language Processing and Chinese Computing (NLPCC 2023)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14303))

Abstract

Automated radiology report generation has the potential to improve radiology reporting and alleviate the workload of radiologists. However, the medical report generation task poses unique challenges due to the limited availability of medical data and the presence of data bias. To maximize the utility of available data and reduce data bias, we propose MSCL (Medical image Segmentation with Contrastive Learning), a framework that utilizes the Segment Anything Model (SAM) to segment organs, abnormalities, bones, etc., and can pay more attention to the meaningful ROIs in the image to get better visual representations. Then we introduce a supervised contrastive loss that assigns more weight to reports that are semantically similar to the target while training. The design of this loss function aims to mitigate the impact of data bias and encourage the model to capture the essential features of a medical image and generate high-quality reports. Experimental results demonstrate the effectiveness of our proposed model, where we achieve state-of-the-art performance on the IU X-Ray public dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Banerjee, S., Lavie, A.: Meteor: an automatic metric for MT evaluation with improved correlation with human judgments. In: IEEvaluation@ACL (2005)

    Google Scholar 

  2. Chen, T., Kornblith, S., Norouzi, M., Hinton, G.E.: A simple framework for contrastive learning of visual representations. CoRR abs/2002.05709 (2020). https://arxiv.org/abs/2002.05709

  3. Chen, Y.J., et al.: Representative image feature extraction via contrastive learning pretraining for chest x-ray report generation (2023)

    Google Scholar 

  4. Chen, Z., Song, Y., Chang, T.H., Wan, X.: Generating radiology reports via memory-driven transformer. ArXiv abs/2010.16056 (2020)

    Google Scholar 

  5. Demner-Fushman, D., et al.: Preparing a collection of radiology examinations for distribution and retrieval. J. Am. Med. Inform. Assoc. JAMIA 23(2), 304–10 (2015)

    Article  Google Scholar 

  6. He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.B.: Momentum contrast for unsupervised visual representation learning. CoRR abs/1911.05722 (2019). http://arxiv.org/abs/1911.05722

  7. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2261–2269 (2017). https://doi.org/10.1109/CVPR.2017.243

  8. Irvin, J.A., et al.: Chexpert: a large chest radiograph dataset with uncertainty labels and expert comparison. In: AAAI Conference on Artificial Intelligence (2019)

    Google Scholar 

  9. Jing, B., Xie, P., Xing, E.P.: On the automatic generation of medical imaging reports. In: Annual Meeting of the Association for Computational Linguistics (2017)

    Google Scholar 

  10. Kirillov, A., et al.: Segment anything. arXiv preprint arXiv:2304.02643 (2023)

  11. Li, P., Zhang, H., Liu, X., Shi, S.: Rigid formats controlled text generation. In: ACL, pp. 742–751 (2020)

    Google Scholar 

  12. Li, Y., Liang, X., Hu, Z., Xing, E.P.: Hybrid retrieval-generation reinforced agent for medical image report generation. ArXiv abs/1805.08298 (2018)

    Google Scholar 

  13. Lin, C.Y.: Rouge: A package for automatic evaluation of summaries. In: Annual Meeting of the Association for Computational Linguistics (2004)

    Google Scholar 

  14. Liu, F., Ge, S., Wu, X.: Competence-based multimodal curriculum learning for medical report generation. In: Annual Meeting of the Association for Computational Linguistics (2022)

    Google Scholar 

  15. Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)

  16. Ma, J., Wang, B.: Segment anything in medical images. arXiv preprint arXiv:2304.12306 (2023)

  17. Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: International Conference on Machine Learning (2010)

    Google Scholar 

  18. Nguyen, H.T., Nie, D., Badamdorj, T., Liu, Y., Zhu, Y., Truong, J., Cheng, L.: Automated generation of accurate & fluent medical x-ray reports. ArXiv abs/2108.12126 (2021)

    Google Scholar 

  19. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Annual Meeting of the Association for Computational Linguistics (2002)

    Google Scholar 

  20. Shin, H.C., Roberts, K., Lu, L., Demner-Fushman, D., Yao, J., Summers, R.M.: Learning to read chest x-rays: Recurrent neural cascade model for automated image annotation. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2497–2506 (2016)

    Google Scholar 

  21. Srinivasan, P., Thapar, D., Bhavsar, A., Nigam, A.: Hierarchical x-ray report generation via pathology tags and multi head attention. In: Ishikawa, H., Liu, C.L., Pajdla, T., Shi, J. (eds.) Computer Vision - ACCV 2020, pp. 600–616. Springer, Cham (2021)

    Chapter  Google Scholar 

  22. Su, H., Maji, S., Kalogerakis, E., Learned-Miller, E.: Multi-view convolutional neural networks for 3d shape recognition. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 945–953 (2015). https://doi.org/10.1109/ICCV.2015.114

  23. Vaswani, A., et al.: Attention is all you need. Advances in neural information processing systems 30 (2017)

    Google Scholar 

  24. Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3156–3164 (2015). https://doi.org/10.1109/CVPR.2015.7298935

  25. Wang, X., Peng, Y., Lu, L., Lu, Z., Summers, R.M.: Tienet: text-image embedding network for common thorax disease classification and reporting in chest x-rays. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9049–9058 (2018)

    Google Scholar 

  26. Xue, Y., Xu, T., Long, L.R., Xue, Z., Antani, S.K., Thoma, G.R., Huang, X.: Multimodal recurrent model with attention for automated radiology report generation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention (2018)

    Google Scholar 

  27. Yin, C., Li, P., Ren, Z.: Ctrlstruct: Dialogue structure learning for open-domain response generation. In: Proceedings of the ACM Web Conference 2023, WWW 2023, pp. 1539–1550. Association for Computing Machinery, New York (2023). https://doi.org/10.1145/3543507.3583285

  28. Zhang, Y., Wang, X., Xu, Z., Yu, Q., Yuille, A.L., Xu, D.: When radiology report generation meets knowledge graph. CoRR abs/2002.08277 (2020). https://arxiv.org/abs/2002.08277

Download references

Acknowledgements

This research is supported by the National Key Research and Development Program of China (No. 2021ZD0113203), the National Natural Science Foundation of China (No. 62106105), the CCF-Tencent Open Research Fund (No. RAGR20220122), the CCF-Zhipu AI Large Model Fund (No. CCF-Zhipu202315), the Scientific Research Starting Foundation of Nanjing University of Aeronautics and Astronautics (No. YQR21022), and the High Performance Computing Platform of Nanjing University of Aeronautics and Astronautics.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Piji Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhao, R., Wang, X., Dai, H., Gao, P., Li, P. (2023). Medical Report Generation Based on Segment-Enhanced Contrastive Representation Learning. In: Liu, F., Duan, N., Xu, Q., Hong, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2023. Lecture Notes in Computer Science(), vol 14303. Springer, Cham. https://doi.org/10.1007/978-3-031-44696-2_65

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-44696-2_65

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-44695-5

  • Online ISBN: 978-3-031-44696-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics