Reinforced Transformer for Medical Image Captioning

Xiong, Yuxuan; Du, Bo; Yan, Pingkun

doi:10.1007/978-3-030-32692-0_77

Reinforced Transformer for Medical Image Captioning

Yuxuan Xiong¹²,
Bo Du¹² &
Pingkun Yan¹³

Conference paper
First Online: 10 October 2019

6679 Accesses
33 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11861))

Abstract

Computerized medical image report generation is of great significance in automating the workflow of medical diagnosis and treatment for reducing health disparities. However, this task presents several challenges, where the generated medical image report should be precise, coherent and contain heterogeneous information. Current deep learning based medical image captioning models rely on recurrent neural networks and only extract top-down visual features, which make them slow and prone to generate incoherent and hard to comprehend reports. To tackle this challenging problem, this paper proposes a hierarchical Transformer based medical imaging report generation model. Our proposed model consists of two parts: (1) An Image Encoder extracts heuristic visual features by a bottom-up attention mechanism; (2) a non-recurrent Captioning Decoder improves the computational efficiency by parallel computation. The former identifies regions of interest via a bottom-up attention module and extracts top-down visual features. Then the Transformer based captioning decoder generates a coherent paragraph of medical imaging report. The proposed model is trained by using a self-critical reinforcement learning method. We evaluate the proposed model on publicly available datasets of IU X-ray. The experiment results show that our proposed model has improved the performance in BLEU-1 by more than 50% compared with other state-of-the-art image captioning methods.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Ba, L.J., Kiros, R., Hinton, G.E.: Layer normalization. CoRR (2016)
Google Scholar
Demner-Fushman, D., et al.: Preparing a collection of radiology examinations for distribution and retrieval. JAMIA 23(2), 304–310 (2016)
Google Scholar
Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: CVPR (2017)
Google Scholar
Jing, B., Xie, P., Xing, E.P.: On the automatic generation of medical imaging reports. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018 (2018)
Google Scholar
Li, Y., Liang, X., Hu, Z., Xing, E.P.: Hybrid retrieval-generation reinforced agent for medical image report generation. NeurIPS 2018, 1537–1547 (2018)
Google Scholar
Rajpurkar, P., et al.: CheXNet: radiologist-level pneumonia detection on chest x-rays with deep learning. CoRR abs/1711.05225 (2017)
Google Scholar
Rennie, S.J., Marcheret, E., Mroueh, Y., Ross, J., Goel, V.: Self-critical sequence training for image captioning. In: CVPR (2017)
Google Scholar
Shin, H., Roberts, K., Lu, L., Demner-Fushman, D., Yao, J., Summers, R.M.: Learning to read chest x-rays: recurrent neural cascade model for automated image annotation. In: CVPR, pp. 2497–2506 (2016)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: NeruIPS (2017)
Google Scholar
Wang, X., Peng, Y., Lu, L., Lu, Z., Bagheri, M., Summers, R.M.: ChestX-ray8: hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In: CVPR (2017)
Google Scholar
Zhang, Z., Xie, Y., Xing, F., McGough, M., Yang, L.: MDNet: a semantically and visually interpretable medical image diagnosis network. In: CVPR (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, Wuhan University, Wuhan, China
Yuxuan Xiong & Bo Du
Department of Biomedical Engineering, Rensselaer Polytechnic Institute, Troy, NY, 12180, USA
Pingkun Yan

Authors

Yuxuan Xiong
View author publications
You can also search for this author in PubMed Google Scholar
Bo Du
View author publications
You can also search for this author in PubMed Google Scholar
Pingkun Yan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bo Du .

Editor information

Editors and Affiliations

Korea University, Seoul, Korea (Republic of)
Heung-Il Suk
University of North Carolina, Chapel Hill, NC, USA
Mingxia Liu
Rensselaer Polytechnic Institute, Troy, NY, USA
Pingkun Yan
University of North Carolina, Chapel Hill, NC, USA
Chunfeng Lian

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xiong, Y., Du, B., Yan, P. (2019). Reinforced Transformer for Medical Image Captioning. In: Suk, HI., Liu, M., Yan, P., Lian, C. (eds) Machine Learning in Medical Imaging. MLMI 2019. Lecture Notes in Computer Science(), vol 11861. Springer, Cham. https://doi.org/10.1007/978-3-030-32692-0_77

Download citation

DOI: https://doi.org/10.1007/978-3-030-32692-0_77
Published: 10 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32691-3
Online ISBN: 978-3-030-32692-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)