Skip to main content

Reinforced Transformer for Medical Image Captioning

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11861))

Abstract

Computerized medical image report generation is of great significance in automating the workflow of medical diagnosis and treatment for reducing health disparities. However, this task presents several challenges, where the generated medical image report should be precise, coherent and contain heterogeneous information. Current deep learning based medical image captioning models rely on recurrent neural networks and only extract top-down visual features, which make them slow and prone to generate incoherent and hard to comprehend reports. To tackle this challenging problem, this paper proposes a hierarchical Transformer based medical imaging report generation model. Our proposed model consists of two parts: (1) An Image Encoder extracts heuristic visual features by a bottom-up attention mechanism; (2) a non-recurrent Captioning Decoder improves the computational efficiency by parallel computation. The former identifies regions of interest via a bottom-up attention module and extracts top-down visual features. Then the Transformer based captioning decoder generates a coherent paragraph of medical imaging report. The proposed model is trained by using a self-critical reinforcement learning method. We evaluate the proposed model on publicly available datasets of IU X-ray. The experiment results show that our proposed model has improved the performance in BLEU-1 by more than 50% compared with other state-of-the-art image captioning methods.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Ba, L.J., Kiros, R., Hinton, G.E.: Layer normalization. CoRR (2016)

    Google Scholar 

  2. Demner-Fushman, D., et al.: Preparing a collection of radiology examinations for distribution and retrieval. JAMIA 23(2), 304–310 (2016)

    Google Scholar 

  3. Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: CVPR (2017)

    Google Scholar 

  4. Jing, B., Xie, P., Xing, E.P.: On the automatic generation of medical imaging reports. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018 (2018)

    Google Scholar 

  5. Li, Y., Liang, X., Hu, Z., Xing, E.P.: Hybrid retrieval-generation reinforced agent for medical image report generation. NeurIPS 2018, 1537–1547 (2018)

    Google Scholar 

  6. Rajpurkar, P., et al.: CheXNet: radiologist-level pneumonia detection on chest x-rays with deep learning. CoRR abs/1711.05225 (2017)

    Google Scholar 

  7. Rennie, S.J., Marcheret, E., Mroueh, Y., Ross, J., Goel, V.: Self-critical sequence training for image captioning. In: CVPR (2017)

    Google Scholar 

  8. Shin, H., Roberts, K., Lu, L., Demner-Fushman, D., Yao, J., Summers, R.M.: Learning to read chest x-rays: recurrent neural cascade model for automated image annotation. In: CVPR, pp. 2497–2506 (2016)

    Google Scholar 

  9. Vaswani, A., et al.: Attention is all you need. In: NeruIPS (2017)

    Google Scholar 

  10. Wang, X., Peng, Y., Lu, L., Lu, Z., Bagheri, M., Summers, R.M.: ChestX-ray8: hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In: CVPR (2017)

    Google Scholar 

  11. Zhang, Z., Xie, Y., Xing, F., McGough, M., Yang, L.: MDNet: a semantically and visually interpretable medical image diagnosis network. In: CVPR (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bo Du .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Xiong, Y., Du, B., Yan, P. (2019). Reinforced Transformer for Medical Image Captioning. In: Suk, HI., Liu, M., Yan, P., Lian, C. (eds) Machine Learning in Medical Imaging. MLMI 2019. Lecture Notes in Computer Science(), vol 11861. Springer, Cham. https://doi.org/10.1007/978-3-030-32692-0_77

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-32692-0_77

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-32691-3

  • Online ISBN: 978-3-030-32692-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics