Abstract
The deep neural networks have facilitated the radiologists to large extent by automating the process of radiological report generation. Majority of the researchers have focussed on improving the learning focus of the model using attention mechanism, reinforcement learning and other techniques. Most of them, have not considered the textual information present in the ground truth radiological reports. In downstream language tasks like text classification, word embedding has played vital role in extracting textual features. Inspired from the same, we empirically study the impact of different word embedding techniques on radiological report generation tasks. In this work, we have used a convolutional neural network and large language model to extract visual and textual features, respectively. Recurrent neural network is used to generate the reports. The proposed method outperforms most of the state-of-the-art methods by achieving following evaluation metrics scores: BLEU-1 = 0.612, BLEU-2 = 0.610, BLEU-3 = 0.608, BLEU-4 = 0.606, ROUGE = 0.811, and CIDEr = 0.317. This work confirms that pre-trained large language model gives significantly better results that other word embedding techniques.
Similar content being viewed by others
Data Availability
We used a standard publically available dataset, and is available at https://openi.nlm.nih.gov/faq#collection
Code Availability
The code for this internal research study is available upon request.
References
Tubiana, M. (1996). Wilhelm conrad röntgen and the discovery of x-rays. Bulletin de l’Academie nationale de medecine, 180(1), 97–108.
NHS England and NHS Improvement. (2021). Performance analysis team. Diagnostic imaging dataset statistical release.
Pennington, J., Socher, R., & Manning, C. D. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532–1543).
Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5, 135–146.
Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. In Proceedings of the 2018 conference of the North American chapter of the Association for Computational Linguistics: Human language technologies (Vol. 1, pp. 2227–2237).
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805
Demner-Fushman, D., Antani, S., Simpson, M., & Thoma, G. R. (2012). Design and development of a multimodal biomedical information retrieval system. Journal of Computing Science and Engineering, 6(2), 168–177.
Kaur, N., Mittal, A., & Singh, G. (2021). Methods for automatic generation of radiological reports of chest radiographs: A comprehensive survey. Multimedia Tools and Applications, 81, 1–31.
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078
Shin, H.-C., Roberts, K., Lu, L., Demner-Fushman, D., Yao, J., & Summers, R. M. (2016). Learning to read chest X-rays: Recurrent neural cascade model for automated image annotation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2497–2506).
Vinyals, O., Toshev, A., Bengio, S., & Erhan, D. (2015). Show and tell: A neural image caption generator. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3156–3164).
Krause, J., Johnson, J., Krishna, R., & Fei-Fei, L. (2017). A hierarchical approach for generating descriptive image paragraphs. In Computer vision and pattern recognition (CVPR).
Yin, C., Qian, B., Wei, J., Li, X., Zhang, X., Li, Y., & Zheng, Q. (2019). Automatic generation of medical imaging diagnostic report with hierarchical recurrent neural network. In 2019 IEEE international conference on data mining (ICDM) (pp. 728–737). IEEE.
Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., & Bengio, Y. (2015). Show, attend and tell: Neural image caption generation with visual attention. In International conference on machine learning (pp. 2048–2057). PMLR.
Zhang, Z., Xie, Y., Xing, F., McGough, M., & Yang, L. (2017). MDNet: A semantically and visually interpretable medical image diagnosis network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6428–6436).
Jing, B., Xie, P., & Xing, E. (2017). On the automatic generation of medical imaging reports. arXiv preprint arXiv:1711.08195
Rennie, S. J., Marcheret, E., Mroueh, Y., Ross, J. & Goel, V. (2017). Self-critical sequence training for image captioning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7008–7024).
Li, Y., Liang, X., Hu, Z., & Xing, E. P. (2018). Hybrid retrieval-generation reinforced agent for medical image report generation. In Advances in neural information processing systems (pp. 1530–1540).
Xiong, Y., Du, B., & Yan, P. (2019). Reinforced transformer for medical image captioning. In International workshop on machine learning in medical imaging (pp. 673–680). Springer.
Jing, B., Wang, Z., & Xing, E. (2020). Show, describe and conclude: On exploiting the structure information of chest X-ray reports. arXiv preprint arXiv:2004.12274
Liu, G., Hsu, T.-M. H., McDermott, M., Boag, W., Weng, W.-H., Szolovits, P., & Ghassemi, M. (2019). Clinically accurate chest X-ray report generation. arXiv preprint arXiv:1904.02633
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł, & Polosukhin, I. (2017). Attention is all you need. Adv. Neural Inf. Process. Syst., 30, 789.
Chen, Z., Song, Y., Chang, T.-H., & Wan, X. (2020). Generating radiology reports via memory-driven transformer. arXiv preprint arXiv:2010.16056
Nooralahzadeh, F., Gonzalez, N. P., Frauenfelder, T., Fujimoto, K., & Krauthammer, M. (2021). Progressive transformer-based generation of radiology reports. arXiv preprint arXiv:2102.09777
Alfarghaly, O., Khaled, R., Elkorany, A., Helal, M., & Fahmy, A. (2021). Automated radiology report generation using conditioned transformers. Informatics in Medicine Unlocked, 24, 100557.
Wang, Y., Liu, S., Afzal, N., Rastegar-Mojarad, M., Wang, L., Shen, F., Kingsbury, P., & Liu, H. (2018). A comparison of word embeddings for the biomedical natural language processing. Journal of Biomedical Informatics, 87, 12–20.
Kalyan, K. S., & Sangeetha, S. (2020). SECNLP: A survey of embeddings in clinical natural language processing. Journal of Biomedical Informatics, 101, 103323.
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781
Banerjee, I., Chen, M. C., Lungren, M. P., & Rubin, D. L. (2018). Radiology report annotation using intelligent word embeddings: Applied to multi-institutional chest CT cohort. Journal of Biomedical Informatics, 77, 11–20.
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Harzig, P., Chen, Y.-Y., Chen, F., & Lienhart, R. (2019). Addressing data bias problems for chest X-ray image report generation. arXiv preprint arXiv:1908.02123
Huang, X., Yan, F., Wei, X., & Li, M. (2019). Multi-attention and incorporating background information model for chest X-ray image report generation. IEEE Access, 7, 154808–154817.
Kaur, N., & Mittal, A. (2022). CADxReport: Chest X-ray report generation using co-attention mechanism and reinforcement learning. Computers in Biology and Medicine, 145, 105498.
Kaur, N., & Mittal, A. (2022). RadioBERT: A deep learning-based system for medical report generation from chest X-ray images using contextual embeddings. Journal of Biomedical Informatics, 135, 104220.
Li, X., Cao, R., & Zhu, D. (2019). Vispi: Automatic Visual Perception and Interpretation of Chest X-rays. arXiv preprint arXiv:1906.05190
Li, C.Y., Liang, X., Hu, Z., & Xing, E. P. (2019). Knowledge-driven encode, retrieve, paraphrase for medical image report generation. arXiv preprint arXiv:1903.10122
Wang, X., Peng, Y., Lu, L., Lu, Z., & Summers, R. M. (2018). Tienet: Text-image embedding network for common thorax disease classification and reporting in chest X-rays. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 9049–9058).
Yuan, J., Liao, H., Luo, R., & Luo, J. (2019). Automatic radiology report generation based on multi-view image fusion and medical concept enrichment. In International conference on medical image computing and computer-assisted intervention (pp. 721–729). Springer.
Zhang, Y., Wang, X., Xu, Z., Yu, Q., Yuille, A., & Xu, D. (2020). When radiology report generation meets knowledge graph. arXiv preprint arXiv:2002.08277
Funding
This work is supported by the funds received from King Abdulaziz University, Jeddah, Saudi Arabia
Author information
Authors and Affiliations
Contributions
FSA: Methodology, Software, Writing and Revising the manuscript critically for important intellectual content, Supervision. NK: Conception and Design of Study, Acquisition of Data, Analysis and interpretation of Data, Methodology, Software, Writing and Revising the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors affirm that none of their known financial or personal relationships or conflicts of interest may have seemed to have an impact on the work presented in this paper.
Consent to Participate
This study uses standard publically available dataset, thus no consent to Participate is required.
Consent to Publish
This study uses standard publically available dataset, thus consent to Publish is required.
Ethical Approval
This study uses standard publically available dataset, thus no ethical approval is required.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Alotaibi, F.S., Kaur, N. Radiological Report Generation from Chest X-ray Images Using Pre-trained Word Embeddings. Wireless Pers Commun 133, 2525–2540 (2023). https://doi.org/10.1007/s11277-024-10886-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11277-024-10886-x