Abstract
Medical Visual Question Answering (Med-VQA) is a task in the field of Artificial Intelligence where a medical image is given with a related question, and the task is to provide an accurate answer to the question. It involves the integration of computer vision, natural language processing, and medical domain knowledge. Furthermore, incorporating medical knowledge in Med-VQA can improve the reasoning ability and accuracy of the answers. While knowledge-enhanced Visual Question Answering (VQA) in the general domain has been widely researched, medical VQA requires further examination due to its unique features. In the paper, we gather information on and analyze the current publicly accessible Med-VQA datasets with external knowledge. We also critically review the key technologies combined with knowledge in Med-VQA tasks in terms of the advancements and limitations. Finally, we discuss the existing challenges and future directions for Med-VQA.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT (2019)
Chen, F.L., et al.: VLP: a survey on vision-language pre-training. Mach. Intell. Res. 20(1), 38–56 (2023)
Snell, J., Swersky, K., Zemel, R.: Prototypical networks for few-shot learning. In: NIPS (2017)
Xian, Y., Lampert, C.H., Schiele, B., Akata, Z.: Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2251–2265 (2018)
Sarrouti, M., Ben Abacha, A., Demner-Fushman, D.: Goal-driven visual question generation from radiology images. Information 12(8), 334 (2021). https://doi.org/10.3390/info12080334
Lin, Z., et al.: Medical visual question answering: a survey. arXiv:2111.10056 (2021)
Sengar, N., Joshi, R.C., Dutta, M.K., Burget, R.: EyeDeep-Net: a multi-class diagnosis of retinal diseases using deep neural network. Neural Comput. Appl. 1–21 (2023). https://doi.org/10.1007/s00521-023-08249-x
Liu, R., et al.: Application of artificial intelligence-based dual-modality analysis combining fundus photography and optical coherence tomography in diabetic retinopathy screening in a community hospital. Biomed. Eng. Online 21(1), 1–11 (2022). https://doi.org/10.1186/s12938-022-01018-2
Antol, S., et al.: VQA: visual question answering. In: ICCV (2015)
Yu, Z., Yu, J., Cui, Y., Tao, D., Tian, Q.: Deep modular co-attention networks for visual question answering. In: CVPR (2019)
Zheng, W., Yin, L., Chen, X., Ma, Z., Liu, S., Yang, B.: Knowledge base graph embedding module design for Visual question answering model. Pattern Recogn. 120, 108153 (2021). https://doi.org/10.1016/j.patcog.2021.108153
Marino, K., Rastegari, M., Farhadi, A., Mottaghi, R.: Ok-VQA: a visual question answering benchmark requiring external knowledge. In: CVPR (2019)
Ravi, S., Chinchure, A., Sigal, L., Liao, R., Shwartz, V.: VLC-BERT: visual question answering with contextualized commonsense knowledge. In: WACV (2023)
Song, L., Li, J., Liu, J., Yang, Y., Shang, X., Sun, M.: Answering knowledge-based visual questions via the exploration of question purpose. Pattern Recogn. 133, 109015 (2023). https://doi.org/10.1016/j.patcog.2022.109015
Huang, J., et al.: Medical knowledge-based network for patient-oriented visual question answering. Inf. Process. Manag. 60(2), 103241 (2023). https://doi.org/10.1016/j.ipm.2022.103241
Chen, Z., Li, G., Wan, X.: Reason and learn: enhancing medical vision-and-language pre-training with knowledge. In: ACM Multimedia (2022)
Hasan, S.A., Ling, Y., Farri, O., Liu, J., Müller, H., Lungren, M.P.: Overview of ImageCLEF 2018 medical domain visual question answering task. In: CLEF (Working Notes) (2018)
Lau, J.J., Gayen, S., Abacha, A.B., Demner-Fushman, D.: A dataset of clinically generated visual questions and answers about radiology images. Sci. Data 5, 1–10 (2018). https://doi.org/10.1038/sdata.2018.251
Abacha, A.B., Hasan, S.A., Datla, V.V., Liu, J., Demner-Fushman, D., Müller, H.: VQA-med: overview of the medical visual question answering task at ImageCLEF 2019. In: CLEF (working notes) (2019)
Kovaleva, O., et al.: Towards visual dialog for radiology. In: Proceedings of the 19th SIGBioMed Workshop on Biomedical Language Processing (2020)
He, X., Zhang, Y., Mou, L., Xing, E., Xie, P.: Pathvqa: 30000+ questions for medical visual question answering. arXiv:2003.10286 (2020)
Abacha, A.B., Datla, V.V., Hasan, S.A., Demner-Fushman, D., Müller, H.: Overview of the VQA-med task at ImageCLEF 2020: visual question answering and generation in the medical domain. In: CLEF (working notes) (2020)
Liu, B., Zhan, L.M., Xu, L., Ma, L., Yang, Y., Wu, X.M.: Slake: a semantically-labeled knowledge-enhanced dataset for medical visual question answering. In: ISBI (2021)
Abacha, A.B., et al.: Overview of the VQA-med task at ImageCLEF 2021: visual question answering and generation in the medical domain. In: CLEF (working notes) (2021)
Huang, Y., Wang, X., Liu, F., Huang, G.: OVQA: a clinically generated visual question answering dataset. In: SIGIR (2022)
Narasimhan, M., Lazebnik, S., Schwing, A.: Out of the box: reasoning with graph convolution nets for factual visual question answering. In: NIPS (2018)
Narasimhan, M., Schwing, A.G.: Straight to the facts: learning knowledge base retrieval for factual visual question answering. In: ECCV (2018)
Yang, Z., et al.: An empirical study of GPT-3 for few-shot knowledge-based VQA. In: AAAI (2022)
Song, L., Li, J., Liu, J., Yang, Y., Shang, X., Sun, M.: Answering knowledge-based visual questions via the exploration of question purpose. Pattern Recogn. 133, 109015 (2023)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: NIPS (2013)
Luo, F., Zhang, Y., Wang, X.: IMAS++: an intelligent medical analysis system enhanced with deep graph neural networks. In: CIKM (2021)
Chen, Z., Li, G., Wan, X.: Align, reason and learn: enhancing medical vision-and-language pre-training with knowledge. In: ACMMM (2021)
Zheng, W., Yan, L., Wang, F.Y., Gou, C.: Learning from the guidance: knowledge embedded meta-learning for medical visual question answering. In: ICONIP (2020)
Huang, Y., Wang, X., Liu, F., Huang, G.: OVQA: a clinically generated visual question answering dataset. In: SIGIR (2022)
Nguyen, B.D., Do, T.T., Nguyen, B.X., Do, T., Tjiputra, E., Tran, Q.D.: Overcoming data limitation in medical visual question answering. In: MICCAI (2019)
Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: ICML (2017)
Masci, J., Meier, U., CireÅŸan, D., Schmidhuber, J.: Stacked convolutional auto-encoders for hierarchical feature extraction. In: ICANN (2011)
Acknowledgements
The present research was supported by the Fundamental Research Funds for the Central Universities with grant Nos. 22120220069 and the National Natural Science Foundation of China with Grant No. 62176185.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Wang, H., Du, H. (2023). Knowledge-Enhanced Medical Visual Question Answering: A Survey (Invited Talk Summary). In: Yang, S., Islam, S. (eds) Web and Big Data. APWeb-WAIM 2022 International Workshops. APWeb-WAIM 2022. Communications in Computer and Information Science, vol 1784. Springer, Singapore. https://doi.org/10.1007/978-981-99-1354-1_1
Download citation
DOI: https://doi.org/10.1007/978-981-99-1354-1_1
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-1353-4
Online ISBN: 978-981-99-1354-1
eBook Packages: Computer ScienceComputer Science (R0)