research-article

EfficienTransNet: An Automated Chest X-ray Report Generation Paradigm

Authors:
Chayan Mondal

Curtin University, Bentley, WA, Australia

Curtin University, Bentley, WA, Australia

0000-0002-3871-1065
View Profile

,
Duc-Son Pham

Curtin University, Perth, WA, Australia

Curtin University, Perth, WA, Australia

0000-0002-4006-7803
View Profile

,
Ashu Gupta

Fiona Stanely Hospital, Murdoch, WA, Australia

Fiona Stanely Hospital, Murdoch, WA, Australia

0009-0004-3500-0841
View Profile

,
Shreya Ghosh

Curtin University, Bentley, WA, Australia

Curtin University, Bentley, WA, Australia

0000-0002-2639-8374
View Profile

,
Tele Tan

Curtin University, Bentley, WA, Australia

Curtin University, Bentley, WA, Australia

0000-0003-3195-3480
View Profile

,
Tom Gedeon

Curtin University, Bentley, WA, Australia

Curtin University, Bentley, WA, Australia

0000-0001-8356-4909
View Profile

MRAC '23: Proceedings of the 1st International Workshop on Multimodal and Responsible Affective ComputingOctober 2023Pages 59–66https://doi.org/10.1145/3607865.3616174

Published:29 October 2023Publication History

MRAC '23: Proceedings of the 1st International Workshop on Multimodal and Responsible Affective Computing

Pages 59–66

ABSTRACT

The significance of chest X-ray imaging in diagnosing chest diseases is well-established in clinical and research domains. The automation of generating X-ray reports can address various challenges associated with manual diagnosis by speeding up the report generation system, becoming the perfect assistant for radiologists, and reducing their tedious workload. But, this automation's key challenge is to accurately capture the abnormal findings and produce a fluent as well as natural report. In this paper, we introduce EfficienTransNet, an automatic chest X-ray report generation approach based on CNN-Transformers. EfficienTransNet prioritizes clinical accuracy and demonstrates improved text generation metrics. Our model incorporates clinical history or indications to enhance the report generation process and align with radiologists' workflow, which is mostly overlooked in recent research. On two publicly available X-ray report generation datasets, MIMIC-CXR, and IU X-ray, our model yields promising results on natural language evaluation and clinical accuracy metrics. Qualitative results, demonstrated with Grad-CAM, provide disease location information for radiologists' better understanding. Our proposed model emphasizes radiologists' workflow, enhancing the explainability, transparency, and trustworthiness of radiologists in the report generation process.

References

Brandon Abela, Jumana Abu-Khalaf, Chi-Wei Robin Yang, Martin Masek, and Ashu Gupta. 2022. Automated Radiology Report Generation Using a Transformer-Template System: Improved Clinical Accuracy and an Assessment of Clinical Safety. In Proceedings of the Australasian Joint Conference on Artificial Intelligence. Springer, 530--543. https://doi.org/10.1007/978--3-031--22695--3_37Google ScholarDigital Library
Omar Alfarghaly, Rana Khaled, Abeer Elkorany, Maha Helal, and Aly Fahmy. 2021. Automated radiology report generation using conditioned transformers. Informatics in Medicine Unlocked , Vol. 24 (2021), 100557. https://doi.org/10.1016/j.imu.2021.100557Google ScholarCross Ref
Andy Brock, Soham De, Samuel L Smith, and Karen Simonyan. 2021. High-performance large-scale image recognition without normalization. In Proceedings of the International Conference on Machine Learning. PMLR, 1059--1071.Google Scholar
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. Advances in neural information processing systems , Vol. 33 (2020), 1877--1901.Google Scholar
Zhihong Chen, Yaling Shen, Yan Song, and Xiang Wan. 2022. Cross-modal memory networks for radiology report generation. In 11th International Joint Conference on Natural Language Processing. ACL. https://doi.org/10.18653/v1/2021.acl-long.459Google ScholarCross Ref
Zhihong Chen, Yan Song, Tsung-Hui Chang, and Xiang Wan. 2020. Generating radiology reports via memory-driven transformer. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). ACL, 1439--1449.Google ScholarCross Ref
Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014). https://doi.org/10.48550/arXiv.1406.1078Google ScholarCross Ref
Dina Demner-Fushman, Marc D Kohli, Marc B Rosenman, Sonya E Shooshan, Laritza Rodriguez, Sameer Antani, George R Thoma, and Clement J McDonald. 2016. Preparing a collection of radiology examinations for distribution and retrieval. Journal of the American Medical Informatics Association, Vol. 23, 2 (2016), 304--310. https://doi.org/10.1093/jamia/ocv080Google ScholarCross Ref
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).Google Scholar
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2021. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2021). https://doi.org/10.48550/arXiv.2010.11929Google ScholarCross Ref
Gabriel Forgues, Joelle Pineau, Jean-Marie Larchevêque, and Réal Tremblay. 2014. Bootstrapping dialog systems with word embeddings. In Proceedings of the modern machine learning and natural language processing workshop, Vol. 2. 168.Google Scholar
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, Vol. 9, 8 (1997), 1735--1780. https://doi.org/10.1007/978--3--642--24797--2_4Google ScholarCross Ref
Jeremy Irvin, Pranav Rajpurkar, Michael Ko, Yifan Yu, Silviana Ciurea-Ilcus, Chris Chute, Henrik Marklund, Behzad Haghgoo, Robyn Ball, Katie Shpanskaya, et al. 2019. Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison. In Proceedings of the AAAI conference on artificial intelligence, Vol. 33. 590--597. https://doi.org/10.1609/aaai.v33i01.3301590Google ScholarDigital Library
Baoyu Jing, Pengtao Xie, and Eric Xing. July 2018. On the Automatic Generation of Medical Imaging Reports. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Melbourne, Australia, 2577--2586. https://doi.org/10.18653/v1/P18--1240Google ScholarCross Ref
Alistair EW Johnson, Tom J Pollard, Seth J Berkowitz, Nathaniel R Greenbaum, Matthew P Lungren, Chih-Ying Deng, Roger G Mark, and Steven Horng. 2019. MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Scientific data, Vol. 6, 1 (2019), 1--8. https://doi.org/10.1038/s41597-019-0322-0Google ScholarCross Ref
Ryan Kiros, Yukun Zhu, Russ R Salakhutdinov, Richard Zemel, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. 2015. Skip-thought vectors. Advances in neural information processing systems , Vol. 28 (2015).Google Scholar
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2017. Imagenet classification with deep convolutional neural networks. Commun. ACM, Vol. 60, 6 (2017), 84--90. https://doi.org/10.1145/3065386Google ScholarDigital Library
Mingjie Li, Bingqian Lin, Zicong Chen, Haokun Lin, Xiaodan Liang, and Xiaojun Chang. 2023. Dynamic Graph Enhanced Contrastive Learning for Chest X-ray Report Generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3334--3343.Google ScholarCross Ref
Min Lin, Qiang Chen, and Shuicheng Yan. 2013. Network in network. arXiv preprint arXiv:1312.4400 (2013).Google Scholar
Fenglin Liu, Changchang Yin, Xian Wu, Shen Ge, Ping Zhang, and Xu Sun. 2021. Contrastive Attention for Automatic Chest X-ray Report Generation. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. 269--280.Google ScholarCross Ref
Guanxiong Liu, Tzu-Ming Harry Hsu, Matthew McDermott, Willie Boag, Wei-Hung Weng, Peter Szolovits, and Marzyeh Ghassemi. 2019. Clinically accurate chest x-ray report generation. In Proceedings of the Machine Learning for Healthcare Conference. PMLR, 249--269.Google Scholar
Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017).Google Scholar
Chayan Mondal, Md Kamrul Hasan, Mohiuddin Ahmad, Md Abdul Awal, Md Tasnim Jawad, Aishwariya Dutta, Md Rabiul Islam, and Mohammad Ali Moni. 2021. Ensemble of convolutional neural networks to diagnose acute lymphoblastic leukemia from microscopic images. IMU , Vol. 27 (2021), 100794. https://doi.org/10.1016/j.imu.2021.100794Google ScholarCross Ref
Hoang TN Nguyen, Dong Nie, Taivanbat Badamdorj, Yujie Liu, Yingying Zhu, Jason Truong, and Li Cheng. 2021. Automated generation of accurate$backslash$& fluent medical x-ray reports. arXiv preprint arXiv:2108.12126 (2021). https://doi.org/10.48550/arXiv.2108.12126Google ScholarCross Ref
Aaron Nicolson, Jason Dowling, and Bevan Koopman. 2022. Improving Chest X-Ray Report Generation by Leveraging Warm-Starting. arXiv preprint arXiv:2201.09405 (2022).Google Scholar
Farhad Nooralahzadeh, Nicolas Perez Gonzalez, Thomas Frauenfelder, Koji Fujimoto, and Michael Krauthammer. 2021. Progressive transformer-based generation of radiology reports. arXiv preprint arXiv:2102.09777 (2021).Google Scholar
Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, et al. 2018. Improving language understanding by generative pre-training. OpenAI (2018).Google Scholar
Pranav Rajpurkar, Jeremy Irvin, Kaylie Zhu, Brandon Yang, Hershel Mehta, Tony Duan, Daisy Ding, Aarti Bagul, Curtis Langlotz, Katie Shpanskaya, et al. 2017. Chexnet: Radiologist-level pneumonia detection on chest x-rays with deep learning. arXiv preprint arXiv:1711.05225 (2017).Google Scholar
Andreas Rücklé , Steffen Eger, Maxime Peyrard, and Iryna Gurevych. 2018. Concatenated power mean word embeddings as universal cross-lingual sentence representations. arXiv preprint arXiv:1803.01400 (2018).Google Scholar
Vasile Rus and Mihai Lintean. 2012. An optimal assessment of natural language student input using word-to-word similarity metrics. In Proceedings of the International Conference on Intelligent Tutoring Systems. Springer, 675--676. https://doi.org/10.1007/978--3--642--30950--2_116Google ScholarDigital Library
Hoo-Chang Shin, Kirk Roberts, Le Lu, Dina Demner-Fushman, Jianhua Yao, and Ronald M Summers. 2016. Learning to read chest x-rays: Recurrent neural cascade model for automated image annotation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2497--2506.Google ScholarCross Ref
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1--9.Google ScholarCross Ref
Mingxing Tan and Quoc Le. 2019. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International conference on machine learning. PMLR, 6105--6114.Google Scholar
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems , Vol. 30 (2017).Google Scholar
Xiaosong Wang, Yifan Peng, Le Lu, Zhiyong Lu, Mohammadhadi Bagheri, and Ronald M Summers. 2017. Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2097--2106.Google ScholarCross Ref
Xiaosong Wang, Yifan Peng, Le Lu, Zhiyong Lu, and Ronald M Summers. 2018. Tienet: Text-image embedding network for common thorax disease classification and reporting in chest x-rays. In Proceedings of the IEEE conference on computer vision and pattern recognition. 9049--9058.Google ScholarCross Ref
Yuan Xue, Tao Xu, L Rodney Long, Zhiyun Xue, Sameer Antani, George R Thoma, and Xiaolei Huang. 2018. Multimodal recurrent model with attention for automated radiology report generation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 457--466. https://doi.org/10.1007/978--3-030-00928--1_52Google ScholarDigital Library
Jianbo Yuan, Haofu Liao, Rui Luo, and Jiebo Luo. 2019. Automatic radiology report generation based on multi-view image fusion and medical concept enrichment. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 721--729. https://doi.org/10.1007/978--3-030--32226--7_80 ioGoogle ScholarDigital Library

Index Terms

EfficienTransNet: An Automated Chest X-ray Report Generation Paradigm
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Interest point and salient region detections
    2. Natural language processing
      1. Natural language generation

Recommendations

Ensemble Stack Architecture for Lungs Segmentation from X-ray Images
Intelligent Data Engineering and Automated Learning – IDEAL 2022
Abstract
In healthcare, chest X-rays are an inexpensive medical imaging diagnostic tools. The lung images segmentation from chest X-rays (CXRs) is important for screening and diagnosing diseases. The lungs are opacified in many patients’ CXRs, making it ...
Read More
An Efficient Variant of Fully-Convolutional Network for Segmenting Lung Fields from Chest Radiographs

Automatic analysis of chest radiographs using computer-aided diagnosis (CAD) systems is pivotal to perform mass screening and detect early signs of various abnormalities in patients. In a chest radiographic CAD system, segmentation of lung fields is a ...
Read More
Automated brain tumour segmentation techniques- A review

Automatic segmentation of brain tumour is the process of separating abnormal tissues from normal tissues, such as white matter WM, gray matter GM, and cerebrospinal fluid CSF. The process of segmentation is still challenging due to the diversity of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MRAC '23: Proceedings of the 1st International Workshop on Multimodal and Responsible Affective Computing
October 2023
88 pages
ISBN:9798400702884
DOI:10.1145/3607865
Program Chairs:
Shreya Ghosh
Curtin University, Australia
,
Abhinav Dhall
IIT Ropar, India
,
Dimitrios Kollias
Queen Mary University of London, UK
,
Roland Goecke
University of Canberra, Australia
,
Tom Gedeon
Curtin University, Australia
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 29 October 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
chest x-rays
efficientnetb7
medical imaging
report generation
transformers
Qualifiers
- research-article
Conference
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 92
  Total Downloads
- Downloads (Last 12 months)92
- Downloads (Last 6 weeks)18
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

EfficienTransNet: An Automated Chest X-ray Report Generation Paradigm

MRAC '23: Proceedings of the 1st International Workshop on Multimodal and Responsible Affective Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Ensemble Stack Architecture for Lungs Segmentation from X-ray Images

An Efficient Variant of Fully-Convolutional Network for Segmenting Lung Fields from Chest Radiographs

Automated brain tumour segmentation techniques- A review