research-article

Automatic Caption Generation for Medical Images

Authors:

Imane Allaouzi,

M. OuardouzAuthors Info & Claims

SCA '18: Proceedings of the 3rd International Conference on Smart City Applications

Article No.: 86, Pages 1 - 6

https://doi.org/10.1145/3286606.3286863

Published: 10 October 2018 Publication History

Abstract

With the increasing availability of medical images coming from different modalities (X-Ray, CT, PET, MRI, ultrasound, etc.), and the huge advances in the development of incredibly fast, accurate and enhanced computing power with the current graphics processing units. The task of automatic caption generation from medical images became a new way to improve healthcare and the key method for getting better results at lower costs. In this paper, we give a comprehensive overview of the task of image captioning in the medical domain, covering: existing models, the benchmark medical image-caption datasets, and evaluation metrics that have been used to measure the quality of the generated captions.

References

[1]

Kulkarni, G., et al. 2013. Babytalk: Understanding and generating simple image descriptions. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 12, 2891--2903.

Digital Library

[2]

Siming, Li., Kulkarni, Girish, Berg, Tamara, L., Berg, Alexander, C., and Choi, Yejin. 2011. Composing simple image descriptions using web-scale n-grams. In Computational Natural Language Learning. ACL.

Digital Library

[3]

Yang, Yezhou, Teo, Ching Lik, Daume III, Hal, and Aloimonos, Yiannis. 2011. Corpus-guided sentence generation of natural images. In EMNLP, ACL, 444--454.

Digital Library

[4]

Mitchell, Margaret, Han, Xufeng, Dodge, Jesse, Mensch, Alyssa, Goyal, Amit, Berg, Alex, Yamaguchi, Kota, Berg, Tamara, Stratos, Karl, and Daume III, Hal. 2012. Midge: Generating image descriptions from computer vision detections. In European Chapter of the Association for Computational Linguistics. ACL, 747--756.

Digital Library

[5]

Elliott, Desmond and Keller, Frank. 2013. Image description using visual dependency representations. In EMNLP.

[6]

Hodosh, M., Young, P., and Hockenmaier, J. 2013. Framing image description as a ranking task: Data, models and evaluation metrics. Journal of Artificial Intelligence Research, vol. 47, 853--899.

Digital Library

[7]

Socher, R., Karpathy, A., Le, Q. V., Manning, C. D., and Ng, A. Y. 2014. Grounded compositional semantics for finding and describing images with sentences. Transactions of the Association for Computational Linguistics, vol. 2, 207--218.

[8]

Farhadi, A., et al. 2010. Every picture tells a story: Generating sentences from images, in European Conference on Computer Vision, 15--29.

Digital Library

[9]

Kuznetsova, P., Ordonez, V., Berg, T. L., and Choi, Y. 2014. TREETALK: Composition and Compression of Trees for Image Descriptions, TACL, vol. 2, no. 10, 351--362.

[10]

Cho, K., Merrienboer, B., Gulcehre, C., Bougares, F., Schwenk, H., and Bengio, Y. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In EMNLP.

[11]

Sutskever, I., Vinyals, O., and Le, Q. V. 2014. Sequence to sequence learning with neural networks, in NIPS'14 Proceedings of the 27th International Conference on Neural Information Processing Systems, vol. 2, 3104--3112.

Digital Library

[12]

Krizhevsky, A., Sutskever, I., and Hinton, G. E. 2012. Imagenet classification with deep convolutional neural networks, in NIPS'12 Proceedings of the 25th International Conference on Neural Information Processing Systems, vol. 1, 1097--1105.

Digital Library

[13]

Simonyan, K., and Zisserman, A. 2014. Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv:1409.1556

[14]

Murthy, V. N., Maji, S., Manmatha, R. 2015. Automatic image annotation using deep learning representations, in: Proc. of ACM ICMR.

Digital Library

[15]

Gong, Y., Jia, Y., Leung, T., Toshev, A., Ioffe, S. 2013. Deep convolutional ranking for multilabel image annotation, arXiv:1312.4894.

[16]

Wang, J., Yang, Y., Mao, J., Huang, Z., Huang, C., Xu, W. 2016. Cnn-rnn: a unified framework for multi-label image classification, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2285--2294.

[17]

Donahue, J., et al. 2015. Long-term recurrent convolutional networks for visual recognition and description, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2625--2634.

[18]

Vinyals, O., Toshev, A., Bengio, S., Erhan, D. 2015. Show and tell: a neural image caption generator, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, Massachusetts, 2015, pp. 3156--3164.

[19]

Hodosh, M., Young, P., Hockenmaier, Framing, J. 2013. Image description as a ranking task: data, models and evaluation metrics. J Artif Intell Res. 47:853--899.

Digital Library

[20]

Young, P., Lai, A., Hodosh, M., Hockenmaier, J. 2014. From image descriptions to visual denotations: new similarity metrics for semantic inference over event descriptions. Trans Assoc Comput Linguist. 2:67--78.

[21]

Lin, TY., Maire, M., Belongie, S, Hays, J., Perona, P., Ramanan, D., et al. 2014.Microsoft COCO: common objects in context. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T, editors. Computer Vision-ECCV 2014; New York. Springer, 740--755.

[22]

Kumar, G., and Bhatia, P. K. 2014. A detailed review of feature extraction in image processing systems, in Fourth International Conference on Advanced Computing and Communication Technologies. IEEE Computer Society, 5--12.

Digital Library

[23]

Lecun, Y., and Bottou, L., and Bengio, Y., and Haffner, P. 1998. Gradient-based learning applied to document recognition, Proceedings of the IEEE, 2278--2324.

[24]

Simonyan, K., and Zisserman, A. 2014. 'Very Deep Convolutional Networks for Large-Scale Image Recognition'. arXiv preprint arXiv:1409.1556.

[25]

Szegedy, C., et al. 2014. Going Deeper with Convolutions, arXiv preprint arXiv:1409.4886.

[26]

Kaiming, He., Zhang, X., Ren, S., and Sun, J. 2015. Deep Residual Learning for Image Recognition. arXiv preprint arXiv:1512.03385.

[27]

Hochreiter, S., and Schmidhuber, J. 1997.Long short-term memory, Neural Computation, vol. 9, no. 8, 1735--1780.

Digital Library

[28]

Jozefowicz, R., Zaremba, W., and Sutskever, I. 2015. An empirical exploration of recurrent network architectures, in Proceedings of the 32nd International Conference on Machine Learning (ICML-15), 2342--2350.

Digital Library

[29]

Wu, L., Wan, C., Wu, Y., and Liu, J. 2017. Generative caption for diabetic retinopathy images, International Conference on Security, Pattern Analysis, and Cybernetics (SPAC), Shenzhen, 515--519.

[30]

Rahman, M.M. 2018. A cross modal deep learning based approach for caption prediction and concept detection by cs morgan state. In: CLEF2018 Working Notes. CEUR Workshop Proceedings, Avignon, France, CEUR-WS.org

[31]

Lyndon, D., Kumar, A., Kim, J. 2017. Neural captioning for the ImageCLEF 2017 medical image challenges.

[32]

Hasan, S.A., Ling, Y., Liu, J., Sreenivasan, R., Anand, S., Arora, T., Datla, V.V., Lee, K., Qadir, A., Swisher, C., Farri, O. 2017. PRNA at ImageCLEF 2017 caption prediction and concept detection tasks.

[33]

Su, Y., Liu, F. 2018. UMass at ImageCLEF caption prediction 2018 task. In: CLEF2018 Working Notes. CEUR Workshop Proceedings, Avignon, France, CEUR-WS.org

[34]

Jing, B., Xie, P., Xing, E. 2017. On the automatic generation of medical imaging reports. arXiv preprint arXiv:1711.08195

[35]

Wang, X., Peng, Y., Lu, L., Lu, Z., and Summers, R. M. 2018. Tienet: Text-image embedding network for common thorax disease classification and reporting in chest x-rays. arXiv preprint arXiv:1801.04334, CVPR.

[36]

Ben Abacha, A., García Seco de Herrera, A., Gayen, S., Demner-Fushman, D., Antani, S. 2017. NLM at ImageCLEF 2017 caption task. CLEF2017 working notes, CEUR.

[37]

Homepage, https://www.ncbi.nlm.nih.gov/pmc/, last accessed 2018/5/30.

[38]

Wang, X., Zhang, Y., Guo, Z., Li, J. 2018. ImageSem at ImageCLEF 2018 caption task: Image retrieval and transfer learning. In: CLEF2018 Working Notes. CEUR Workshop Proceedings, Avignon, France, CEUR-WS.org.

[39]

Rahman, M., Lagree, T., Taylor, M. 2017. A cross-modal concept detection and caption prediction approach in ImageCLEFcaption track of ImageCLEF 2017.

[40]

Liang, S., Li, X., Zhu, Y., Li, X., Jiang, S. 2017. ISIA at ImageCLEF 2017 image caption task.

[41]

Demner-Fushman, D., Kohli, M. D., Rosenman, M. B., Shooshan, S. E., Rodriguez, L., Antani, S., Thoma, G. R., and McDonald, C. J. 2015. Preparing a collection of radiology examinations for distribution and retrieval. Journal of the American Medical Informatics Association, 23(2):304--310.

[42]

Zhang, Z., Chen, P., Sapkota, M., and Yang., L. 2017. Tandemnet: Distilling knowledge from medical images using diagnostic reports as optional semantic references. In International Conference on Medical Image Computing and ComputerAssisted Intervention, 320--328. Springer.

Digital Library

[43]

Eickhoff, C., Schwall, I., García Seco de Herrera, A., and Müller, H. 2017. Overview of ImageCLEFcaption 2017 - the Image Caption Prediction and Concept Extraction Tasks to Understand Biomedical Images, CLEF working notes, CEUR.

[44]

Garcia Seco de Herrera, A., Eickhoff, C., Andrearczyk, V., Müller, H. 2018. Overview of the ImageCLEF 2018 Caption Prediction tasks. In: CLEF2018 Working Notes. CEUR-WS.org, Avignon, France.

[45]

Papineni, K., Roukos, S., Ward, T., and Zhu, W. J. 2002. BLEU: a method for automatic evaluation of machine translation. in Proceedings of the 40th annual meeting on association for computational linguistics, 311--318

Digital Library

[46]

Lin, C. Y. 2004. Rouge: A package for automatic evaluation of summaries, in Text summarization branches out: Proceedings of the ACL-04 workshop, vol. 8.

[47]

Denkowski, M., and Lavie, A. 2014. Meteor universal: Language specific translation evaluation for any target language, in Proceedings of the ninth Workshop on Statistical Machine Translation, 376--380.

[48]

Vedantam, R., Zitnick, C. L., and Parikh, D. 2015. Cider: Consensus-based image description evaluation, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4566--4575.

[49]

Anderson, P., Fernando, B., Johnson, M., and Gould, S. 2016. Spice: Semantic propositional image caption evaluation, in Computer Vision - ECCV 2016, 382--398.

[50]

Kilickaya, M., Erdem, A., Ikizler-Cinbis, N., and Erdem, E. 2016. Re-evaluating automatic metrics for image captioning, arXiv preprint arXiv:1612.07600.

Cited By

Iqbal SQureshi AKhan FAurangzeb KAzeem Akbar M(2024)From Data to Diagnosis: Enhancing Radiology Reporting With Clinical Features Encoding and Cross-Modal CoherenceIEEE Access10.1109/ACCESS.2024.344992912(127341-127356)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3449929
Veras Magalhães Gde S. Santos RVogado LCardoso de Paiva Ade Alcântara dos Santos Neto P(2024)XRaySwinGen: Automatic Medical Reporting for X-ray Exams with Multimodal ModelHeliyon10.1016/j.heliyon.2024.e27516(e27516)Online publication date: Mar-2024
https://doi.org/10.1016/j.heliyon.2024.e27516
Sun QZhang JFang ZGao Y(2024)Self-Enhanced Attention for Image CaptioningNeural Processing Letters10.1007/s11063-024-11527-x56:2Online publication date: 1-Apr-2024
https://doi.org/10.1007/s11063-024-11527-x
Show More Cited By

Recommendations

A Survey on Deep Learning and Explainability for Automatic Report Generation from Medical Images
Every year physicians face an increasing demand of image-based diagnosis from patients, a problem that can be addressed with recent artificial intelligence methods. In this context, we survey works in the area of automatic report generation from medical ...
A Comprehensive Survey of Deep Learning for Image Captioning

Generating a description of an image is called image captioning. Image captioning requires recognizing the important objects, their attributes, and their relationships in an image. It also needs to generate syntactically and semantically correct ...
A survey on automatic image caption generation
Abstract
Image captioning means automatically generating a caption for an image. As a recently emerged research area, it is attracting more and more attention. To achieve the goal of image captioning, semantic information of images needs to be ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

SCA '18: Proceedings of the 3rd International Conference on Smart City Applications

October 2018

580 pages

ISBN:9781450365628

DOI:10.1145/3286606

Editors:
Ben Ahmed Mohamed
UAE University, FST of Tangier, Morocco
,
Boudhir Anouar Abdelhakim
UAE University, FST of Tangier, Morocco
,
Younes Ali
UAE University, FS of Tetouan, Morocco

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 October 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

SCA '18

SCA '18: 3rd International Conference on Smart City Applications

October 10 - 11, 2018

Tetouan, Morocco

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

15
Total Citations
View Citations
628
Total Downloads

Downloads (Last 12 months)70
Downloads (Last 6 weeks)8

Reflects downloads up to 15 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Iqbal SQureshi AKhan FAurangzeb KAzeem Akbar M(2024)From Data to Diagnosis: Enhancing Radiology Reporting With Clinical Features Encoding and Cross-Modal CoherenceIEEE Access10.1109/ACCESS.2024.344992912(127341-127356)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3449929
Veras Magalhães Gde S. Santos RVogado LCardoso de Paiva Ade Alcântara dos Santos Neto P(2024)XRaySwinGen: Automatic Medical Reporting for X-ray Exams with Multimodal ModelHeliyon10.1016/j.heliyon.2024.e27516(e27516)Online publication date: Mar-2024
https://doi.org/10.1016/j.heliyon.2024.e27516
Sun QZhang JFang ZGao Y(2024)Self-Enhanced Attention for Image CaptioningNeural Processing Letters10.1007/s11063-024-11527-x56:2Online publication date: 1-Apr-2024
https://doi.org/10.1007/s11063-024-11527-x
Shahzadi IMadni TJanjua UBatool GNaz BAli M(2024)CSAMDT: Conditional Self Attention Memory-Driven Transformers for Radiology Report Generation from Chest X-RayJournal of Imaging Informatics in Medicine10.1007/s10278-024-01126-637:6(2825-2837)Online publication date: 3-Jun-2024
https://doi.org/10.1007/s10278-024-01126-6
Sharma NChaurasia S(2024)Current Approaches and Challenges in Medical Image Analysis and Visually Explainable Artificial Intelligence as Future OpportunitiesThe Future of Artificial Intelligence and Robotics10.1007/978-3-031-60935-0_69(796-811)Online publication date: 20-Aug-2024
https://doi.org/10.1007/978-3-031-60935-0_69
Tang YHe ZWu QWang XWang Y(2023)Lenke Classification Report Generation Method for Scoliosis Based on Spatial and Context Dual AttentionApplied Sciences10.3390/app1313798113:13(7981)Online publication date: 7-Jul-2023
https://doi.org/10.3390/app13137981
Derkar SBiranje DThakare LParaskar SAgrawal R(2023)CaptionGenX: Advancements in Deep Learning for Automated Image Captioning2023 3rd Asian Conference on Innovation in Technology (ASIANCON)10.1109/ASIANCON58793.2023.10270020(1-8)Online publication date: 25-Aug-2023
https://doi.org/10.1109/ASIANCON58793.2023.10270020
Mohsan MAkram MRasool GAlghamdi NBaqai MAbbas M(2023)Vision Transformer and Language Model Based Radiology Report GenerationIEEE Access10.1109/ACCESS.2022.323271911(1814-1824)Online publication date: 2023
https://doi.org/10.1109/ACCESS.2022.3232719
He SLi QLi XZhang M(2023)Automatic aid diagnosis report generation for lumbar disc MR image based on lightweight artificial neural networksBiomedical Signal Processing and Control10.1016/j.bspc.2023.10527586(105275)Online publication date: Sep-2023
https://doi.org/10.1016/j.bspc.2023.105275
Zhang XLu XZhou XShen C(2022)Reconsidering Tourism Destination Images by Exploring Similarities between Travelogue Texts and PhotographsISPRS International Journal of Geo-Information10.3390/ijgi1111055311:11(553)Online publication date: 8-Nov-2022
https://doi.org/10.3390/ijgi11110553
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents