skip to main content
10.1145/3607865.3616174acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

EfficienTransNet: An Automated Chest X-ray Report Generation Paradigm

Published:29 October 2023Publication History

ABSTRACT

The significance of chest X-ray imaging in diagnosing chest diseases is well-established in clinical and research domains. The automation of generating X-ray reports can address various challenges associated with manual diagnosis by speeding up the report generation system, becoming the perfect assistant for radiologists, and reducing their tedious workload. But, this automation's key challenge is to accurately capture the abnormal findings and produce a fluent as well as natural report. In this paper, we introduce EfficienTransNet, an automatic chest X-ray report generation approach based on CNN-Transformers. EfficienTransNet prioritizes clinical accuracy and demonstrates improved text generation metrics. Our model incorporates clinical history or indications to enhance the report generation process and align with radiologists' workflow, which is mostly overlooked in recent research. On two publicly available X-ray report generation datasets, MIMIC-CXR, and IU X-ray, our model yields promising results on natural language evaluation and clinical accuracy metrics. Qualitative results, demonstrated with Grad-CAM, provide disease location information for radiologists' better understanding. Our proposed model emphasizes radiologists' workflow, enhancing the explainability, transparency, and trustworthiness of radiologists in the report generation process.

References

  1. Brandon Abela, Jumana Abu-Khalaf, Chi-Wei Robin Yang, Martin Masek, and Ashu Gupta. 2022. Automated Radiology Report Generation Using a Transformer-Template System: Improved Clinical Accuracy and an Assessment of Clinical Safety. In Proceedings of the Australasian Joint Conference on Artificial Intelligence. Springer, 530--543. https://doi.org/10.1007/978--3-031--22695--3_37Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Omar Alfarghaly, Rana Khaled, Abeer Elkorany, Maha Helal, and Aly Fahmy. 2021. Automated radiology report generation using conditioned transformers. Informatics in Medicine Unlocked , Vol. 24 (2021), 100557. https://doi.org/10.1016/j.imu.2021.100557Google ScholarGoogle ScholarCross RefCross Ref
  3. Andy Brock, Soham De, Samuel L Smith, and Karen Simonyan. 2021. High-performance large-scale image recognition without normalization. In Proceedings of the International Conference on Machine Learning. PMLR, 1059--1071.Google ScholarGoogle Scholar
  4. Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. Advances in neural information processing systems , Vol. 33 (2020), 1877--1901.Google ScholarGoogle Scholar
  5. Zhihong Chen, Yaling Shen, Yan Song, and Xiang Wan. 2022. Cross-modal memory networks for radiology report generation. In 11th International Joint Conference on Natural Language Processing. ACL. https://doi.org/10.18653/v1/2021.acl-long.459Google ScholarGoogle ScholarCross RefCross Ref
  6. Zhihong Chen, Yan Song, Tsung-Hui Chang, and Xiang Wan. 2020. Generating radiology reports via memory-driven transformer. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). ACL, 1439--1449.Google ScholarGoogle ScholarCross RefCross Ref
  7. Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014). https://doi.org/10.48550/arXiv.1406.1078Google ScholarGoogle ScholarCross RefCross Ref
  8. Dina Demner-Fushman, Marc D Kohli, Marc B Rosenman, Sonya E Shooshan, Laritza Rodriguez, Sameer Antani, George R Thoma, and Clement J McDonald. 2016. Preparing a collection of radiology examinations for distribution and retrieval. Journal of the American Medical Informatics Association, Vol. 23, 2 (2016), 304--310. https://doi.org/10.1093/jamia/ocv080Google ScholarGoogle ScholarCross RefCross Ref
  9. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).Google ScholarGoogle Scholar
  10. Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2021. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2021). https://doi.org/10.48550/arXiv.2010.11929Google ScholarGoogle ScholarCross RefCross Ref
  11. Gabriel Forgues, Joelle Pineau, Jean-Marie Larchevêque, and Réal Tremblay. 2014. Bootstrapping dialog systems with word embeddings. In Proceedings of the modern machine learning and natural language processing workshop, Vol. 2. 168.Google ScholarGoogle Scholar
  12. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, Vol. 9, 8 (1997), 1735--1780. https://doi.org/10.1007/978--3--642--24797--2_4Google ScholarGoogle ScholarCross RefCross Ref
  13. Jeremy Irvin, Pranav Rajpurkar, Michael Ko, Yifan Yu, Silviana Ciurea-Ilcus, Chris Chute, Henrik Marklund, Behzad Haghgoo, Robyn Ball, Katie Shpanskaya, et al. 2019. Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison. In Proceedings of the AAAI conference on artificial intelligence, Vol. 33. 590--597. https://doi.org/10.1609/aaai.v33i01.3301590Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Baoyu Jing, Pengtao Xie, and Eric Xing. July 2018. On the Automatic Generation of Medical Imaging Reports. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Melbourne, Australia, 2577--2586. https://doi.org/10.18653/v1/P18--1240Google ScholarGoogle ScholarCross RefCross Ref
  15. Alistair EW Johnson, Tom J Pollard, Seth J Berkowitz, Nathaniel R Greenbaum, Matthew P Lungren, Chih-Ying Deng, Roger G Mark, and Steven Horng. 2019. MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Scientific data, Vol. 6, 1 (2019), 1--8. https://doi.org/10.1038/s41597-019-0322-0Google ScholarGoogle ScholarCross RefCross Ref
  16. Ryan Kiros, Yukun Zhu, Russ R Salakhutdinov, Richard Zemel, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. 2015. Skip-thought vectors. Advances in neural information processing systems , Vol. 28 (2015).Google ScholarGoogle Scholar
  17. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2017. Imagenet classification with deep convolutional neural networks. Commun. ACM, Vol. 60, 6 (2017), 84--90. https://doi.org/10.1145/3065386Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Mingjie Li, Bingqian Lin, Zicong Chen, Haokun Lin, Xiaodan Liang, and Xiaojun Chang. 2023. Dynamic Graph Enhanced Contrastive Learning for Chest X-ray Report Generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3334--3343.Google ScholarGoogle ScholarCross RefCross Ref
  19. Min Lin, Qiang Chen, and Shuicheng Yan. 2013. Network in network. arXiv preprint arXiv:1312.4400 (2013).Google ScholarGoogle Scholar
  20. Fenglin Liu, Changchang Yin, Xian Wu, Shen Ge, Ping Zhang, and Xu Sun. 2021. Contrastive Attention for Automatic Chest X-ray Report Generation. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. 269--280.Google ScholarGoogle ScholarCross RefCross Ref
  21. Guanxiong Liu, Tzu-Ming Harry Hsu, Matthew McDermott, Willie Boag, Wei-Hung Weng, Peter Szolovits, and Marzyeh Ghassemi. 2019. Clinically accurate chest x-ray report generation. In Proceedings of the Machine Learning for Healthcare Conference. PMLR, 249--269.Google ScholarGoogle Scholar
  22. Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017).Google ScholarGoogle Scholar
  23. Chayan Mondal, Md Kamrul Hasan, Mohiuddin Ahmad, Md Abdul Awal, Md Tasnim Jawad, Aishwariya Dutta, Md Rabiul Islam, and Mohammad Ali Moni. 2021. Ensemble of convolutional neural networks to diagnose acute lymphoblastic leukemia from microscopic images. IMU , Vol. 27 (2021), 100794. https://doi.org/10.1016/j.imu.2021.100794Google ScholarGoogle ScholarCross RefCross Ref
  24. Hoang TN Nguyen, Dong Nie, Taivanbat Badamdorj, Yujie Liu, Yingying Zhu, Jason Truong, and Li Cheng. 2021. Automated generation of accurate$backslash$& fluent medical x-ray reports. arXiv preprint arXiv:2108.12126 (2021). https://doi.org/10.48550/arXiv.2108.12126Google ScholarGoogle ScholarCross RefCross Ref
  25. Aaron Nicolson, Jason Dowling, and Bevan Koopman. 2022. Improving Chest X-Ray Report Generation by Leveraging Warm-Starting. arXiv preprint arXiv:2201.09405 (2022).Google ScholarGoogle Scholar
  26. Farhad Nooralahzadeh, Nicolas Perez Gonzalez, Thomas Frauenfelder, Koji Fujimoto, and Michael Krauthammer. 2021. Progressive transformer-based generation of radiology reports. arXiv preprint arXiv:2102.09777 (2021).Google ScholarGoogle Scholar
  27. Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, et al. 2018. Improving language understanding by generative pre-training. OpenAI (2018).Google ScholarGoogle Scholar
  28. Pranav Rajpurkar, Jeremy Irvin, Kaylie Zhu, Brandon Yang, Hershel Mehta, Tony Duan, Daisy Ding, Aarti Bagul, Curtis Langlotz, Katie Shpanskaya, et al. 2017. Chexnet: Radiologist-level pneumonia detection on chest x-rays with deep learning. arXiv preprint arXiv:1711.05225 (2017).Google ScholarGoogle Scholar
  29. Andreas Rücklé , Steffen Eger, Maxime Peyrard, and Iryna Gurevych. 2018. Concatenated power mean word embeddings as universal cross-lingual sentence representations. arXiv preprint arXiv:1803.01400 (2018).Google ScholarGoogle Scholar
  30. Vasile Rus and Mihai Lintean. 2012. An optimal assessment of natural language student input using word-to-word similarity metrics. In Proceedings of the International Conference on Intelligent Tutoring Systems. Springer, 675--676. https://doi.org/10.1007/978--3--642--30950--2_116Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Hoo-Chang Shin, Kirk Roberts, Le Lu, Dina Demner-Fushman, Jianhua Yao, and Ronald M Summers. 2016. Learning to read chest x-rays: Recurrent neural cascade model for automated image annotation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2497--2506.Google ScholarGoogle ScholarCross RefCross Ref
  32. Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1--9.Google ScholarGoogle ScholarCross RefCross Ref
  33. Mingxing Tan and Quoc Le. 2019. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International conference on machine learning. PMLR, 6105--6114.Google ScholarGoogle Scholar
  34. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems , Vol. 30 (2017).Google ScholarGoogle Scholar
  35. Xiaosong Wang, Yifan Peng, Le Lu, Zhiyong Lu, Mohammadhadi Bagheri, and Ronald M Summers. 2017. Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2097--2106.Google ScholarGoogle ScholarCross RefCross Ref
  36. Xiaosong Wang, Yifan Peng, Le Lu, Zhiyong Lu, and Ronald M Summers. 2018. Tienet: Text-image embedding network for common thorax disease classification and reporting in chest x-rays. In Proceedings of the IEEE conference on computer vision and pattern recognition. 9049--9058.Google ScholarGoogle ScholarCross RefCross Ref
  37. Yuan Xue, Tao Xu, L Rodney Long, Zhiyun Xue, Sameer Antani, George R Thoma, and Xiaolei Huang. 2018. Multimodal recurrent model with attention for automated radiology report generation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 457--466. https://doi.org/10.1007/978--3-030-00928--1_52Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Jianbo Yuan, Haofu Liao, Rui Luo, and Jiebo Luo. 2019. Automatic radiology report generation based on multi-view image fusion and medical concept enrichment. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 721--729. https://doi.org/10.1007/978--3-030--32226--7_80 ioGoogle ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. EfficienTransNet: An Automated Chest X-ray Report Generation Paradigm

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        MRAC '23: Proceedings of the 1st International Workshop on Multimodal and Responsible Affective Computing
        October 2023
        88 pages
        ISBN:9798400702884
        DOI:10.1145/3607865

        Copyright © 2023 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 29 October 2023

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Upcoming Conference

        MM '24
        MM '24: The 32nd ACM International Conference on Multimedia
        October 28 - November 1, 2024
        Melbourne , VIC , Australia
      • Article Metrics

        • Downloads (Last 12 months)92
        • Downloads (Last 6 weeks)18

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader