Skip to main content

Knowledge-Enhanced Medical Visual Question Answering: A Survey (Invited Talk Summary)

  • Conference paper
  • First Online:
Web and Big Data. APWeb-WAIM 2022 International Workshops (APWeb-WAIM 2022)

Abstract

Medical Visual Question Answering (Med-VQA) is a task in the field of Artificial Intelligence where a medical image is given with a related question, and the task is to provide an accurate answer to the question. It involves the integration of computer vision, natural language processing, and medical domain knowledge. Furthermore, incorporating medical knowledge in Med-VQA can improve the reasoning ability and accuracy of the answers. While knowledge-enhanced Visual Question Answering (VQA) in the general domain has been widely researched, medical VQA requires further examination due to its unique features. In the paper, we gather information on and analyze the current publicly accessible Med-VQA datasets with external knowledge. We also critically review the key technologies combined with knowledge in Med-VQA tasks in terms of the advancements and limitations. Finally, we discuss the existing challenges and future directions for Med-VQA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT (2019)

    Google Scholar 

  2. Chen, F.L., et al.: VLP: a survey on vision-language pre-training. Mach. Intell. Res. 20(1), 38–56 (2023)

    Article  Google Scholar 

  3. Snell, J., Swersky, K., Zemel, R.: Prototypical networks for few-shot learning. In: NIPS (2017)

    Google Scholar 

  4. Xian, Y., Lampert, C.H., Schiele, B., Akata, Z.: Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2251–2265 (2018)

    Article  Google Scholar 

  5. Sarrouti, M., Ben Abacha, A., Demner-Fushman, D.: Goal-driven visual question generation from radiology images. Information 12(8), 334 (2021). https://doi.org/10.3390/info12080334

    Article  Google Scholar 

  6. Lin, Z., et al.: Medical visual question answering: a survey. arXiv:2111.10056 (2021)

  7. Sengar, N., Joshi, R.C., Dutta, M.K., Burget, R.: EyeDeep-Net: a multi-class diagnosis of retinal diseases using deep neural network. Neural Comput. Appl. 1–21 (2023). https://doi.org/10.1007/s00521-023-08249-x

  8. Liu, R., et al.: Application of artificial intelligence-based dual-modality analysis combining fundus photography and optical coherence tomography in diabetic retinopathy screening in a community hospital. Biomed. Eng. Online 21(1), 1–11 (2022). https://doi.org/10.1186/s12938-022-01018-2

    Article  Google Scholar 

  9. Antol, S., et al.: VQA: visual question answering. In: ICCV (2015)

    Google Scholar 

  10. Yu, Z., Yu, J., Cui, Y., Tao, D., Tian, Q.: Deep modular co-attention networks for visual question answering. In: CVPR (2019)

    Google Scholar 

  11. Zheng, W., Yin, L., Chen, X., Ma, Z., Liu, S., Yang, B.: Knowledge base graph embedding module design for Visual question answering model. Pattern Recogn. 120, 108153 (2021). https://doi.org/10.1016/j.patcog.2021.108153

    Article  Google Scholar 

  12. Marino, K., Rastegari, M., Farhadi, A., Mottaghi, R.: Ok-VQA: a visual question answering benchmark requiring external knowledge. In: CVPR (2019)

    Google Scholar 

  13. Ravi, S., Chinchure, A., Sigal, L., Liao, R., Shwartz, V.: VLC-BERT: visual question answering with contextualized commonsense knowledge. In: WACV (2023)

    Google Scholar 

  14. Song, L., Li, J., Liu, J., Yang, Y., Shang, X., Sun, M.: Answering knowledge-based visual questions via the exploration of question purpose. Pattern Recogn. 133, 109015 (2023). https://doi.org/10.1016/j.patcog.2022.109015

    Article  Google Scholar 

  15. Huang, J., et al.: Medical knowledge-based network for patient-oriented visual question answering. Inf. Process. Manag. 60(2), 103241 (2023). https://doi.org/10.1016/j.ipm.2022.103241

    Article  Google Scholar 

  16. Chen, Z., Li, G., Wan, X.: Reason and learn: enhancing medical vision-and-language pre-training with knowledge. In: ACM Multimedia (2022)

    Google Scholar 

  17. Hasan, S.A., Ling, Y., Farri, O., Liu, J., Müller, H., Lungren, M.P.: Overview of ImageCLEF 2018 medical domain visual question answering task. In: CLEF (Working Notes) (2018)

    Google Scholar 

  18. Lau, J.J., Gayen, S., Abacha, A.B., Demner-Fushman, D.: A dataset of clinically generated visual questions and answers about radiology images. Sci. Data 5, 1–10 (2018). https://doi.org/10.1038/sdata.2018.251

    Article  Google Scholar 

  19. Abacha, A.B., Hasan, S.A., Datla, V.V., Liu, J., Demner-Fushman, D., Müller, H.: VQA-med: overview of the medical visual question answering task at ImageCLEF 2019. In: CLEF (working notes) (2019)

    Google Scholar 

  20. Kovaleva, O., et al.: Towards visual dialog for radiology. In: Proceedings of the 19th SIGBioMed Workshop on Biomedical Language Processing (2020)

    Google Scholar 

  21. He, X., Zhang, Y., Mou, L., Xing, E., Xie, P.: Pathvqa: 30000+ questions for medical visual question answering. arXiv:2003.10286 (2020)

  22. Abacha, A.B., Datla, V.V., Hasan, S.A., Demner-Fushman, D., Müller, H.: Overview of the VQA-med task at ImageCLEF 2020: visual question answering and generation in the medical domain. In: CLEF (working notes) (2020)

    Google Scholar 

  23. Liu, B., Zhan, L.M., Xu, L., Ma, L., Yang, Y., Wu, X.M.: Slake: a semantically-labeled knowledge-enhanced dataset for medical visual question answering. In: ISBI (2021)

    Google Scholar 

  24. Abacha, A.B., et al.: Overview of the VQA-med task at ImageCLEF 2021: visual question answering and generation in the medical domain. In: CLEF (working notes) (2021)

    Google Scholar 

  25. Huang, Y., Wang, X., Liu, F., Huang, G.: OVQA: a clinically generated visual question answering dataset. In: SIGIR (2022)

    Google Scholar 

  26. Narasimhan, M., Lazebnik, S., Schwing, A.: Out of the box: reasoning with graph convolution nets for factual visual question answering. In: NIPS (2018)

    Google Scholar 

  27. Narasimhan, M., Schwing, A.G.: Straight to the facts: learning knowledge base retrieval for factual visual question answering. In: ECCV (2018)

    Google Scholar 

  28. Yang, Z., et al.: An empirical study of GPT-3 for few-shot knowledge-based VQA. In: AAAI (2022)

    Google Scholar 

  29. Song, L., Li, J., Liu, J., Yang, Y., Shang, X., Sun, M.: Answering knowledge-based visual questions via the exploration of question purpose. Pattern Recogn. 133, 109015 (2023)

    Article  Google Scholar 

  30. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)

    Google Scholar 

  31. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  32. Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: NIPS (2013)

    Google Scholar 

  33. Luo, F., Zhang, Y., Wang, X.: IMAS++: an intelligent medical analysis system enhanced with deep graph neural networks. In: CIKM (2021)

    Google Scholar 

  34. Chen, Z., Li, G., Wan, X.: Align, reason and learn: enhancing medical vision-and-language pre-training with knowledge. In: ACMMM (2021)

    Google Scholar 

  35. Zheng, W., Yan, L., Wang, F.Y., Gou, C.: Learning from the guidance: knowledge embedded meta-learning for medical visual question answering. In: ICONIP (2020)

    Google Scholar 

  36. Huang, Y., Wang, X., Liu, F., Huang, G.: OVQA: a clinically generated visual question answering dataset. In: SIGIR (2022)

    Google Scholar 

  37. Nguyen, B.D., Do, T.T., Nguyen, B.X., Do, T., Tjiputra, E., Tran, Q.D.: Overcoming data limitation in medical visual question answering. In: MICCAI (2019)

    Google Scholar 

  38. Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: ICML (2017)

    Google Scholar 

  39. Masci, J., Meier, U., CireÅŸan, D., Schmidhuber, J.: Stacked convolutional auto-encoders for hierarchical feature extraction. In: ICANN (2011)

    Google Scholar 

Download references

Acknowledgements

The present research was supported by the Fundamental Research Funds for the Central Universities with grant Nos. 22120220069 and the National Natural Science Foundation of China with Grant No. 62176185.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haofen Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, H., Du, H. (2023). Knowledge-Enhanced Medical Visual Question Answering: A Survey (Invited Talk Summary). In: Yang, S., Islam, S. (eds) Web and Big Data. APWeb-WAIM 2022 International Workshops. APWeb-WAIM 2022. Communications in Computer and Information Science, vol 1784. Springer, Singapore. https://doi.org/10.1007/978-981-99-1354-1_1

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-1354-1_1

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-1353-4

  • Online ISBN: 978-981-99-1354-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics