Knowledge-Enhanced Medical Visual Question Answering: A Survey (Invited Talk Summary)

Wang, Haofen; Du, Huifang

doi:10.1007/978-981-99-1354-1_1

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1784))

Included in the following conference series:

Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data

716 Accesses

Abstract

Medical Visual Question Answering (Med-VQA) is a task in the field of Artificial Intelligence where a medical image is given with a related question, and the task is to provide an accurate answer to the question. It involves the integration of computer vision, natural language processing, and medical domain knowledge. Furthermore, incorporating medical knowledge in Med-VQA can improve the reasoning ability and accuracy of the answers. While knowledge-enhanced Visual Question Answering (VQA) in the general domain has been widely researched, medical VQA requires further examination due to its unique features. In the paper, we gather information on and analyze the current publicly accessible Med-VQA datasets with external knowledge. We also critically review the key technologies combined with knowledge in Med-VQA tasks in terms of the advancements and limitations. Finally, we discuss the existing challenges and future directions for Med-VQA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

MMQL: Multi-Question Learning for Medical Visual Question Answering

Localized Questions in Medical Visual Question Answering

Development of a large-scale medical visual question-answering dataset

Article Open access 21 December 2024

References

Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT (2019)
Google Scholar
Chen, F.L., et al.: VLP: a survey on vision-language pre-training. Mach. Intell. Res. 20(1), 38–56 (2023)
Article Google Scholar
Snell, J., Swersky, K., Zemel, R.: Prototypical networks for few-shot learning. In: NIPS (2017)
Google Scholar
Xian, Y., Lampert, C.H., Schiele, B., Akata, Z.: Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2251–2265 (2018)
Article Google Scholar
Sarrouti, M., Ben Abacha, A., Demner-Fushman, D.: Goal-driven visual question generation from radiology images. Information 12(8), 334 (2021). https://doi.org/10.3390/info12080334
Article Google Scholar
Lin, Z., et al.: Medical visual question answering: a survey. arXiv:2111.10056 (2021)
Sengar, N., Joshi, R.C., Dutta, M.K., Burget, R.: EyeDeep-Net: a multi-class diagnosis of retinal diseases using deep neural network. Neural Comput. Appl. 1–21 (2023). https://doi.org/10.1007/s00521-023-08249-x
Liu, R., et al.: Application of artificial intelligence-based dual-modality analysis combining fundus photography and optical coherence tomography in diabetic retinopathy screening in a community hospital. Biomed. Eng. Online 21(1), 1–11 (2022). https://doi.org/10.1186/s12938-022-01018-2
Article Google Scholar
Antol, S., et al.: VQA: visual question answering. In: ICCV (2015)
Google Scholar
Yu, Z., Yu, J., Cui, Y., Tao, D., Tian, Q.: Deep modular co-attention networks for visual question answering. In: CVPR (2019)
Google Scholar
Zheng, W., Yin, L., Chen, X., Ma, Z., Liu, S., Yang, B.: Knowledge base graph embedding module design for Visual question answering model. Pattern Recogn. 120, 108153 (2021). https://doi.org/10.1016/j.patcog.2021.108153
Article Google Scholar
Marino, K., Rastegari, M., Farhadi, A., Mottaghi, R.: Ok-VQA: a visual question answering benchmark requiring external knowledge. In: CVPR (2019)
Google Scholar
Ravi, S., Chinchure, A., Sigal, L., Liao, R., Shwartz, V.: VLC-BERT: visual question answering with contextualized commonsense knowledge. In: WACV (2023)
Google Scholar
Song, L., Li, J., Liu, J., Yang, Y., Shang, X., Sun, M.: Answering knowledge-based visual questions via the exploration of question purpose. Pattern Recogn. 133, 109015 (2023). https://doi.org/10.1016/j.patcog.2022.109015
Article Google Scholar
Huang, J., et al.: Medical knowledge-based network for patient-oriented visual question answering. Inf. Process. Manag. 60(2), 103241 (2023). https://doi.org/10.1016/j.ipm.2022.103241
Article Google Scholar
Chen, Z., Li, G., Wan, X.: Reason and learn: enhancing medical vision-and-language pre-training with knowledge. In: ACM Multimedia (2022)
Google Scholar
Hasan, S.A., Ling, Y., Farri, O., Liu, J., Müller, H., Lungren, M.P.: Overview of ImageCLEF 2018 medical domain visual question answering task. In: CLEF (Working Notes) (2018)
Google Scholar
Lau, J.J., Gayen, S., Abacha, A.B., Demner-Fushman, D.: A dataset of clinically generated visual questions and answers about radiology images. Sci. Data 5, 1–10 (2018). https://doi.org/10.1038/sdata.2018.251
Article Google Scholar
Abacha, A.B., Hasan, S.A., Datla, V.V., Liu, J., Demner-Fushman, D., Müller, H.: VQA-med: overview of the medical visual question answering task at ImageCLEF 2019. In: CLEF (working notes) (2019)
Google Scholar
Kovaleva, O., et al.: Towards visual dialog for radiology. In: Proceedings of the 19th SIGBioMed Workshop on Biomedical Language Processing (2020)
Google Scholar
He, X., Zhang, Y., Mou, L., Xing, E., Xie, P.: Pathvqa: 30000+ questions for medical visual question answering. arXiv:2003.10286 (2020)
Abacha, A.B., Datla, V.V., Hasan, S.A., Demner-Fushman, D., Müller, H.: Overview of the VQA-med task at ImageCLEF 2020: visual question answering and generation in the medical domain. In: CLEF (working notes) (2020)
Google Scholar
Liu, B., Zhan, L.M., Xu, L., Ma, L., Yang, Y., Wu, X.M.: Slake: a semantically-labeled knowledge-enhanced dataset for medical visual question answering. In: ISBI (2021)
Google Scholar
Abacha, A.B., et al.: Overview of the VQA-med task at ImageCLEF 2021: visual question answering and generation in the medical domain. In: CLEF (working notes) (2021)
Google Scholar
Huang, Y., Wang, X., Liu, F., Huang, G.: OVQA: a clinically generated visual question answering dataset. In: SIGIR (2022)
Google Scholar
Narasimhan, M., Lazebnik, S., Schwing, A.: Out of the box: reasoning with graph convolution nets for factual visual question answering. In: NIPS (2018)
Google Scholar
Narasimhan, M., Schwing, A.G.: Straight to the facts: learning knowledge base retrieval for factual visual question answering. In: ECCV (2018)
Google Scholar
Yang, Z., et al.: An empirical study of GPT-3 for few-shot knowledge-based VQA. In: AAAI (2022)
Google Scholar
Song, L., Li, J., Liu, J., Yang, Y., Shang, X., Sun, M.: Answering knowledge-based visual questions via the exploration of question purpose. Pattern Recogn. 133, 109015 (2023)
Article Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: NIPS (2013)
Google Scholar
Luo, F., Zhang, Y., Wang, X.: IMAS++: an intelligent medical analysis system enhanced with deep graph neural networks. In: CIKM (2021)
Google Scholar
Chen, Z., Li, G., Wan, X.: Align, reason and learn: enhancing medical vision-and-language pre-training with knowledge. In: ACMMM (2021)
Google Scholar
Zheng, W., Yan, L., Wang, F.Y., Gou, C.: Learning from the guidance: knowledge embedded meta-learning for medical visual question answering. In: ICONIP (2020)
Google Scholar
Huang, Y., Wang, X., Liu, F., Huang, G.: OVQA: a clinically generated visual question answering dataset. In: SIGIR (2022)
Google Scholar
Nguyen, B.D., Do, T.T., Nguyen, B.X., Do, T., Tjiputra, E., Tran, Q.D.: Overcoming data limitation in medical visual question answering. In: MICCAI (2019)
Google Scholar
Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: ICML (2017)
Google Scholar
Masci, J., Meier, U., Cireşan, D., Schmidhuber, J.: Stacked convolutional auto-encoders for hierarchical feature extraction. In: ICANN (2011)
Google Scholar

Download references

Acknowledgements

The present research was supported by the Fundamental Research Funds for the Central Universities with grant Nos. 22120220069 and the National Natural Science Foundation of China with Grant No. 62176185.

Author information

Authors and Affiliations

Tongji University, Shanghai, China
Haofen Wang & Huifang Du

Authors

Haofen Wang
View author publications
You can also search for this author in PubMed Google Scholar
Huifang Du
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haofen Wang .

Editor information

Editors and Affiliations

Guangzhou University, Guangzhou, China
Shiyu Yang
Griffith University, Gold Coast, QLD, Australia
Saiful Islam

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, H., Du, H. (2023). Knowledge-Enhanced Medical Visual Question Answering: A Survey (Invited Talk Summary). In: Yang, S., Islam, S. (eds) Web and Big Data. APWeb-WAIM 2022 International Workshops. APWeb-WAIM 2022. Communications in Computer and Information Science, vol 1784. Springer, Singapore. https://doi.org/10.1007/978-981-99-1354-1_1

Download citation

DOI: https://doi.org/10.1007/978-981-99-1354-1_1
Published: 30 March 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-1353-4
Online ISBN: 978-981-99-1354-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics