FG-CXR: A Radiologist-Aligned Gaze Dataset for Enhancing Interpretability in Chest X-Ray Report Generation

Pham, Trong Thang; Ho, Ngoc-Vuong; Bui, Nhat-Tan; Phan, Thinh; Brijesh, Patel; Adjeroh, Donald; Doretto, Gianfranco; Nguyen, Anh; Wu, Carol C.; Nguyen, Hien; Le, Ngan

doi:10.1007/978-981-96-0960-4_5

Trong Thang Pham¹²,
Ngoc-Vuong Ho¹²,
Nhat-Tan Bui¹²,
Thinh Phan¹²,
Patel Brijesh¹⁴,
Donald Adjeroh¹⁴,
Gianfranco Doretto¹⁴,
Anh Nguyen¹⁵,
Carol C. Wu¹⁶,
Hien Nguyen¹³ &
…
Ngan Le¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15477))

Included in the following conference series:

Asian Conference on Computer Vision

164 Accesses
1 Citations

Abstract

Developing an interpretable system for generating reports in chest X-ray (CXR) analysis is becoming increasingly crucial in Computer-aided Diagnosis (CAD) systems, enabling radiologists to comprehend the decisions made by these systems. Despite the growth of diverse datasets and methods focusing on report generation, there remains a notable gap in how closely these models’s generated reports align with the interpretations of real radiologists. In this study, we tackle this challenge by initially introducing Fine-Grained CXR (FG-CXR) dataset, which provides fine-grained paired information between the captions generated by radiologists and the corresponding gaze attention heatmaps for each anatomy. Unlike existing datasets that include a raw sequence of gaze alongside a report, with significant misalignment between gaze location and report content, our FG-CXR dataset offers a more grained alignment between gaze attention and diagnosis transcript. Furthermore, our analysis reveals that simply applying black-box image captioning methods to generate reports cannot adequately explain which information in CXR is utilized and how long needs to attend to accurately generate reports. Consequently, we propose a novel explainable radiologist’s attention generator network (Gen-XAI) that mimics the diagnosis process of radiologists, explicitly constraining its output to closely align with both radiologist’s gaze attention and transcript. Finally, we perform extensive experiments to illustrate the effectiveness of our method. Our datasets and checkpoint is available at https://github.com/UARK-AICV/FG-CXR.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.99; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Hierarchical X-Ray Report Generation via Pathology Tags and Multi Head Attention

Multimodal Learning for Improving Performance and Explainability of Chest X-Ray Classification

ChEX: Interactive Localization and Region Description in Chest X-Rays

References

Bigolin Lanfredi, R., Zhang, M., et al.: Reflacx, a dataset of reports and eye-tracking data for localization of abnormalities in chest x-rays. Scientific data (2022)
Google Scholar
Bustos, A., Pertusa, A., Salinas, J.M., de la Iglesia-Vayá, M.: Padchest: A large chest x-ray image dataset with multi-label annotated reports. Medical image analysis (2020)
Google Scholar
Chen, Z., Shen, Y., Song, Y., Wan, X.: Cross-modal memory networks for radiology report generation. arXiv preprint arXiv:2204.13258 (2022)
Chen, Z., Song, Y., Chang, T.H., Wan, X.: Generating radiology reports via memory-driven transformer. arXiv preprint arXiv:2010.16056 (2020)
Coffman, E., Clark, R., Bui, N.T., Pham, T.T., Kegley, B., Powell, J.G., Zhao, J., Le, N.: Cattleface-rgbt: Rgb-t cattle facial landmark benchmark. arXiv preprint arXiv:2406.03431 (2024)
Cornia, M., Stefanini, M., Baraldi, L., Cucchiara, R.: Meshed-Memory Transformer for Image Captioning. In: CVPR (2020)
Google Scholar
Datta, S., Roberts, K.: A dataset of chest x-ray reports annotated with spatial role labeling annotations. Data in Brief (2020)
Google Scholar
Demner-Fushman, D., Kohli, M.D., Rosenman, M.B., Shooshan, S.E., Rodriguez, L., Antani, S., Thoma, G.R., McDonald, C.J.: Preparing a collection of radiology examinations for distribution and retrieval. Journal of the American Medical Informatics Association (2016)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: CVPR (2009)
Google Scholar
Filice, R.W., Stein, A., et al.: Crowdsourcing pneumothorax annotations using machine learning annotations on the nih chest x-ray dataset. Journal of digital imaging (2020)
Google Scholar
Geis, J.R., Brady, A.P., Wu, C.C., Spencer, J., Ranschaert, E., Jaremko, J.L., Langer, S.G., Borondy Kitts, A., Birch, J., Shields, W.F., et al.: Ethics of artificial intelligence in radiology: summary of the joint european and north american multisociety statement. Radiology 293(2), 436–440 (2019)
Google Scholar
Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., Pedreschi, D.: A survey of methods for explaining black box models. ACM computing surveys (CSUR) 51(5), 1–42 (2018)
Article Google Scholar
Irvin, J., Rajpurkar, P., Ko, M., Yu, Y., Ciurea-Ilcus, S., Chute, C., Marklund, H., Haghgoo, B., Ball, R., Shpanskaya, K., et al.: Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison. In: AAAI (2019)
Google Scholar
Jaeger, S., Candemir, S., Antani, S., Wáng, Y.X.J., Lu, P.X., Thoma, G.: Two public chest x-ray datasets for computer-aided screening of pulmonary diseases. Quantitative imaging in medicine and surgery (2014)
Google Scholar
Jing, B., Wang, Z., Xing, E.: Show, describe and conclude: On exploiting the structure information of chest x-ray reports. arXiv preprint arXiv:2004.12274 (2020)
Johnson, A.E., Pollard, T.J., Berkowitz, S.J., et al.: Mimic-cxr, a de-identified publicly available database of chest radiographs with free-text reports. Scientific data (2019)
Google Scholar
Karargyris, A., Kashyap, S., Lourentzou, I., Wu, J., Tong, M., Sharma, A., Abedin, S., Beymer, D., Mukherjee, V., Krupinski, E., et al.: Eye gaze data for chest x-rays. PhysioNet (2020)
Google Scholar
Kashyap, S., Karargyris, A., Wu, J., Gur, Y., Sharma, A., Wong, K.C., Moradi, M., Syeda-Mahmood, T.: Looking in the right place for anomalies: Explainable ai through automatic location learning. In: ISBI (2020)
Google Scholar
Khan, S., Naseer, M., Hayat, M., Zamir, S.W., Khan, F.S., Shah, M.: Transformers in vision: A survey. ACM computing surveys (CSUR) 54(10s), 1–41 (2022)
Google Scholar
Kim, B., Wattenberg, M., et al.: Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In: ICML (2018)
Google Scholar
Le, M.Q., Graikos, A., Yellapragada, S., Gupta, R., Saltz, J., Samaras, D.: $\infty $-brush: Controllable large image synthesis with diffusion models in infinite dimensions. arXiv preprint arXiv:2407.14709 (2024)
Le, N., Pham, T., Do, T., Tjiputra, E., Tran, Q.D., Nguyen, A.: Music-driven group choreography. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 8673–8682 (2023)
Google Scholar
Lei, B., Huang, S., et al.: Self-co-attention neural network for anatomy segmentation in whole breast ultrasound. Medical image analysis (2020)
Google Scholar
Li, Y., Liang, X., Hu, Z., Xing, E.P.: Hybrid retrieval-generation reinforced agent for medical image report generation. Advances in neural information processing systems (2018)
Google Scholar
Liu, F., Wu, X., Ge, S., Fan, W., Zou, Y.: Exploring and distilling posterior and prior knowledge for radiology report generation. In: CVPR (2021)
Google Scholar
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019)
Google Scholar
Miller, T.: Explanation in artificial intelligence: Insights from the social sciences. Artif. Intell. 267, 1–38 (2019)
Google Scholar
Nauta, M., Schlötterer, J., van Keulen, M., Seifert, C.: Pip-net: Patch-based intuitive prototypes for interpretable image classification. In: CVPR (2023)
Google Scholar
Nguyen, T.P., Pham, T.T., Nguyen, T., Le, H., Nguyen, D., Lam, H., Nguyen, P., Fowler, J., Tran, M.T., Le, N.: Embryosformer: Deformable transformer and collaborative encoding-decoding for embryos stage development classification. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. pp. 1981–1990 (2023)
Google Scholar
Nguyen, V.D., Khaldi, K., Nguyen, D., Mantini, P., Shah, S.: Contrastive viewpoint-aware shape learning for long-term person re-identification. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. pp. 1041–1049 (2024)
Google Scholar
Nguyen, V.D., Mantini, P., Shah, S.K.: Occluded cloth-changing person re-identification via occlusion-aware appearance and shape reasoning. In: 2024 IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). pp. 1–8. IEEE (2024)
Google Scholar
Nguyen, V.D., Mirza, S., Zakeri, A., Gupta, A., Khaldi, K., Aloui, R., Mantini, P., Shah, S.K., Merchant, F.: Tackling domain shifts in person re-identification: A survey and analysis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 4149–4159 (2024)
Google Scholar
Nicolson, A., Dowling, J., Koopman, B.: Improving chest X-ray report generation by leveraging warm starting. Artificial Intelligence in Medicine (2023)
Google Scholar
Pham, T.T., Brecheisen, J., Nguyen, A., Nguyen, H., Le, N.: I-ai: A controllable & interpretable ai system for decoding radiologists’ intense focus for accurate cxr diagnoses. In: WACV (2024)
Google Scholar
Pham, T.T., Do, T., Le, N., Le, N., Nguyen, H., Tjiputra, E., Tran, Q., Nguyen, A.: Style transfer for 2d talking head generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 7500–7509 (2024)
Google Scholar
Radford, A., Wu, J., et al.: Language models are unsupervised multitask learners. OpenAI blog (2019)
Google Scholar
Rudin, C.: Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature machine intelligence (2019)
Google Scholar
Rudin, C., Chen, C., Chen, Z., Huang, H., Semenova, L., Zhong, C.: Interpretable machine learning: Fundamental principles and 10 grand challenges. Statistics Surveys (2022)
Google Scholar
Sanh, V., Debut, L., Chaumond, J., Wolf, T.: Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019)
Selvaraju, R.R., Cogswell, M., et al.: Grad-cam: Visual explanations from deep networks via gradient-based localization. In: CVPR (2017)
Google Scholar
Shetty, R., Rohrbach, M., Anne Hendricks, L., Fritz, M., Schiele, B.: Speaking the same language: Matching machine to human captions by adversarial training. In: ICCV (2017)
Google Scholar
Shih, G., Wu, C.C., et al.: Augmenting the national institutes of health chest radiograph dataset with expert annotations of possible pneumonia. Artificial Intelligence, Radiology (2019)
Google Scholar
Tanida, T., Müller, P., Kaissis, G., Rueckert, D.: Interactive and explainable region-guided radiology report generation. In: CVPR (2023)
Google Scholar
Tanida, T., Müller, P., Kaissis, G., Rueckert, D.: Interactive and explainable region-guided radiology report generation. In: CVPR (2023)
Google Scholar
Team, P.P., Gohagan, J.K., Prorok, P.C., Hayes, R.B., Kramer, B.S.: The prostate, lung, colorectal and ovarian (plco) cancer screening trial of the national cancer institute: history, organization, and status. Controlled clinical trials (2000)
Google Scholar
Tran, M.T., Nguyen, T.V., Hoang, T.H., Le, T.N., Nguyen, K.T., Dinh, D.T., Nguyen, T.A., Nguyen, H.D., Hoang, X.N., Nguyen, T.T., et al.: itask-intelligent traffic analysis software kit. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. pp. 612–613 (2020)
Google Scholar
Ullah, I., Ali, F., Shah, B., El-Sappagh, S., Abuhmed, T., Park, S.H.: A deep learning based dual encoder–decoder framework for anatomical structure segmentation in chest x-ray images. Scientific Reports (2023)
Google Scholar
Vo, K., Pham, T.T., Yamazaki, K., Tran, M., Le, N.: Dna: Deformable neural articulations network for template-free dynamic 3d human reconstruction from monocular rgb-d video. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 3676–3685 (2023)
Google Scholar
Wang, X., Peng, Y., Lu, L., Lu, Z., Bagheri, M., Summers, R.M.: Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In: CVPR (2017)
Google Scholar
Wu, H., Xiao, B., Codella, N., Liu, M., Dai, X., Yuan, L., Zhang, L.: Cvt: Introducing convolutions to vision transformers. In: ICCV (2021)
Google Scholar
Wu, J.T., Agu, N.N., Lourentzou, I., Sharma, A., Paguio, J.A., Yao, J.S., Dee, E.C., Mitchell, W., Kashyap, S., Giovannini, A., et al.: Chest imagenome dataset (version 1.0. 0). PhysioNet (2021)
Google Scholar
Xiong, Y., Dai, B., Lin, D.: Move forward and tell: A progressive generator of video descriptions. In: ECCV (2018)
Google Scholar
You, D., Liu, F., Ge, S., Xie, X., Zhang, J., Wu, X.: Aligntransformer: Hierarchical alignment of visual regions and disease tags for medical report generation. In: MICCAI (2021)
Google Scholar
Zhang, S., Xu, Y., et al.: Biomedclip: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs. arXiv preprint arXiv:2303.00915 (2023)
Zhang, Y., Wang, X., Xu, Z., Yu, Q., Yuille, A., Xu, D.: When radiology report generation meets knowledge graph. In: AAAI (2020)
Google Scholar

Download references

Acknowledgments

This material is based upon work supported by the National Science Foundation (NSF) under Award No OIA-1946391, NSF 2223793 EFRI BRAID, National Institutes of Health (NIH) 1R01CA277739-01.

Author information

Authors and Affiliations

University of Arkansas, Fayetteville, AR, USA
Trong Thang Pham, Ngoc-Vuong Ho, Nhat-Tan Bui, Thinh Phan & Ngan Le
University of Houston, Houston, TX, USA
Hien Nguyen
West Virginia University, Morgantown, WV, USA
Patel Brijesh, Donald Adjeroh & Gianfranco Doretto
University of Liverpool, Liverpool, UK
Anh Nguyen
MD Anderson Cancer Center, Houston, TX, USA
Carol C. Wu

Authors

Trong Thang Pham
View author publications
You can also search for this author in PubMed Google Scholar
Ngoc-Vuong Ho
View author publications
You can also search for this author in PubMed Google Scholar
Nhat-Tan Bui
View author publications
You can also search for this author in PubMed Google Scholar
Thinh Phan
View author publications
You can also search for this author in PubMed Google Scholar
Patel Brijesh
View author publications
You can also search for this author in PubMed Google Scholar
Donald Adjeroh
View author publications
You can also search for this author in PubMed Google Scholar
Gianfranco Doretto
View author publications
You can also search for this author in PubMed Google Scholar
Anh Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Carol C. Wu
View author publications
You can also search for this author in PubMed Google Scholar
Hien Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Ngan Le
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Trong Thang Pham .

Editor information

Editors and Affiliations

Pohang University of Science and Technology (POSTECH), Pohang, Korea (Republic of)
Minsu Cho
Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, United Arab Emirates
Ivan Laptev
Google, Mountain View, CA, USA
Du Tran
National University of Singapore, Singapore, Singapore
Angela Yao
Peking University, Beijing, China
Hongbin Zha

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pham, T.T. et al. (2025). FG-CXR: A Radiologist-Aligned Gaze Dataset for Enhancing Interpretability in Chest X-Ray Report Generation. In: Cho, M., Laptev, I., Tran, D., Yao, A., Zha, H. (eds) Computer Vision – ACCV 2024. ACCV 2024. Lecture Notes in Computer Science, vol 15477. Springer, Singapore. https://doi.org/10.1007/978-981-96-0960-4_5

Download citation

DOI: https://doi.org/10.1007/978-981-96-0960-4_5
Published: 08 December 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-96-0959-8
Online ISBN: 978-981-96-0960-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

FG-CXR: A Radiologist-Aligned Gaze Dataset for Enhancing Interpretability in Chest X-Ray Report Generation