Medical Report Generation Based on Segment-Enhanced Contrastive Representation Learning

Zhao, Ruoqing; Wang, Xi; Dai, Hongliang; Gao, Pan; Li, Piji

doi:10.1007/978-3-031-44696-2_65

Ruoqing Zhao¹¹,
Xi Wang¹¹,
Hongliang Dai¹¹,
Pan Gao¹¹ &
…
Piji Li¹¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14303))

Included in the following conference series:

CCF International Conference on Natural Language Processing and Chinese Computing

1540 Accesses
1 Citations

Abstract

Automated radiology report generation has the potential to improve radiology reporting and alleviate the workload of radiologists. However, the medical report generation task poses unique challenges due to the limited availability of medical data and the presence of data bias. To maximize the utility of available data and reduce data bias, we propose MSCL (Medical image Segmentation with Contrastive Learning), a framework that utilizes the Segment Anything Model (SAM) to segment organs, abnormalities, bones, etc., and can pay more attention to the meaningful ROIs in the image to get better visual representations. Then we introduce a supervised contrastive loss that assigns more weight to reports that are semantically similar to the target while training. The design of this loss function aims to mitigate the impact of data bias and encourage the model to capture the essential features of a medical image and generate high-quality reports. Experimental results demonstrate the effectiveness of our proposed model, where we achieve state-of-the-art performance on the IU X-Ray public dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Prior tissue knowledge-driven contrastive learning for brain CT report generation

Article 27 March 2024

Chest X-Ray Report Generation Through Fine-Grained Label Learning

Enhancing representation in radiography-reports foundation model: a granular alignment algorithm using masked contrastive learning

Article Open access 02 September 2024

References

Banerjee, S., Lavie, A.: Meteor: an automatic metric for MT evaluation with improved correlation with human judgments. In: IEEvaluation@ACL (2005)
Google Scholar
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.E.: A simple framework for contrastive learning of visual representations. CoRR abs/2002.05709 (2020). https://arxiv.org/abs/2002.05709
Chen, Y.J., et al.: Representative image feature extraction via contrastive learning pretraining for chest x-ray report generation (2023)
Google Scholar
Chen, Z., Song, Y., Chang, T.H., Wan, X.: Generating radiology reports via memory-driven transformer. ArXiv abs/2010.16056 (2020)
Google Scholar
Demner-Fushman, D., et al.: Preparing a collection of radiology examinations for distribution and retrieval. J. Am. Med. Inform. Assoc. JAMIA 23(2), 304–10 (2015)
Article Google Scholar
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.B.: Momentum contrast for unsupervised visual representation learning. CoRR abs/1911.05722 (2019). http://arxiv.org/abs/1911.05722
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2261–2269 (2017). https://doi.org/10.1109/CVPR.2017.243
Irvin, J.A., et al.: Chexpert: a large chest radiograph dataset with uncertainty labels and expert comparison. In: AAAI Conference on Artificial Intelligence (2019)
Google Scholar
Jing, B., Xie, P., Xing, E.P.: On the automatic generation of medical imaging reports. In: Annual Meeting of the Association for Computational Linguistics (2017)
Google Scholar
Kirillov, A., et al.: Segment anything. arXiv preprint arXiv:2304.02643 (2023)
Li, P., Zhang, H., Liu, X., Shi, S.: Rigid formats controlled text generation. In: ACL, pp. 742–751 (2020)
Google Scholar
Li, Y., Liang, X., Hu, Z., Xing, E.P.: Hybrid retrieval-generation reinforced agent for medical image report generation. ArXiv abs/1805.08298 (2018)
Google Scholar
Lin, C.Y.: Rouge: A package for automatic evaluation of summaries. In: Annual Meeting of the Association for Computational Linguistics (2004)
Google Scholar
Liu, F., Ge, S., Wu, X.: Competence-based multimodal curriculum learning for medical report generation. In: Annual Meeting of the Association for Computational Linguistics (2022)
Google Scholar
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)
Ma, J., Wang, B.: Segment anything in medical images. arXiv preprint arXiv:2304.12306 (2023)
Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: International Conference on Machine Learning (2010)
Google Scholar
Nguyen, H.T., Nie, D., Badamdorj, T., Liu, Y., Zhu, Y., Truong, J., Cheng, L.: Automated generation of accurate & fluent medical x-ray reports. ArXiv abs/2108.12126 (2021)
Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Annual Meeting of the Association for Computational Linguistics (2002)
Google Scholar
Shin, H.C., Roberts, K., Lu, L., Demner-Fushman, D., Yao, J., Summers, R.M.: Learning to read chest x-rays: Recurrent neural cascade model for automated image annotation. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2497–2506 (2016)
Google Scholar
Srinivasan, P., Thapar, D., Bhavsar, A., Nigam, A.: Hierarchical x-ray report generation via pathology tags and multi head attention. In: Ishikawa, H., Liu, C.L., Pajdla, T., Shi, J. (eds.) Computer Vision - ACCV 2020, pp. 600–616. Springer, Cham (2021)
Chapter Google Scholar
Su, H., Maji, S., Kalogerakis, E., Learned-Miller, E.: Multi-view convolutional neural networks for 3d shape recognition. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 945–953 (2015). https://doi.org/10.1109/ICCV.2015.114
Vaswani, A., et al.: Attention is all you need. Advances in neural information processing systems 30 (2017)
Google Scholar
Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3156–3164 (2015). https://doi.org/10.1109/CVPR.2015.7298935
Wang, X., Peng, Y., Lu, L., Lu, Z., Summers, R.M.: Tienet: text-image embedding network for common thorax disease classification and reporting in chest x-rays. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9049–9058 (2018)
Google Scholar
Xue, Y., Xu, T., Long, L.R., Xue, Z., Antani, S.K., Thoma, G.R., Huang, X.: Multimodal recurrent model with attention for automated radiology report generation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention (2018)
Google Scholar
Yin, C., Li, P., Ren, Z.: Ctrlstruct: Dialogue structure learning for open-domain response generation. In: Proceedings of the ACM Web Conference 2023, WWW 2023, pp. 1539–1550. Association for Computing Machinery, New York (2023). https://doi.org/10.1145/3543507.3583285
Zhang, Y., Wang, X., Xu, Z., Yu, Q., Yuille, A.L., Xu, D.: When radiology report generation meets knowledge graph. CoRR abs/2002.08277 (2020). https://arxiv.org/abs/2002.08277

Download references

Acknowledgements

This research is supported by the National Key Research and Development Program of China (No. 2021ZD0113203), the National Natural Science Foundation of China (No. 62106105), the CCF-Tencent Open Research Fund (No. RAGR20220122), the CCF-Zhipu AI Large Model Fund (No. CCF-Zhipu202315), the Scientific Research Starting Foundation of Nanjing University of Aeronautics and Astronautics (No. YQR21022), and the High Performance Computing Platform of Nanjing University of Aeronautics and Astronautics.

Author information

Authors and Affiliations

College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics MIIT Key Laboratory of Pattern Analysis and Machine Intelligence, Nanjing, China
Ruoqing Zhao, Xi Wang, Hongliang Dai, Pan Gao & Piji Li

Authors

Ruoqing Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Xi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Hongliang Dai
View author publications
You can also search for this author in PubMed Google Scholar
Pan Gao
View author publications
You can also search for this author in PubMed Google Scholar
Piji Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Piji Li .

Editor information

Editors and Affiliations

Emory University, Atlanta, GA, USA
Fei Liu
Microsoft Research Asia, Beijing, China
Nan Duan
Soochow University, Suzhou, China
Qingting Xu
Soochow University, Suzhou, China
Yu Hong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhao, R., Wang, X., Dai, H., Gao, P., Li, P. (2023). Medical Report Generation Based on Segment-Enhanced Contrastive Representation Learning. In: Liu, F., Duan, N., Xu, Q., Hong, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2023. Lecture Notes in Computer Science(), vol 14303. Springer, Cham. https://doi.org/10.1007/978-3-031-44696-2_65

Download citation

DOI: https://doi.org/10.1007/978-3-031-44696-2_65
Published: 08 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44695-5
Online ISBN: 978-3-031-44696-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)

Medical Report Generation Based on Segment-Enhanced Contrastive Representation Learning