ABSTRACT
Medical visual question Answering (MedVQA) attempts to answer medical questions posed by a correlative medical image. Although it has vast potential in the medical domain, this technology is still difficult to apply in real-life. It has not been widely adopted because accurate answer prediction requires a refined understanding of medical images and question text. Existing methods directly use the whole image and the whole question for multi-modal fusion to predict the answer. However, for a question, important information only exists in a small part of the whole image and a few critical words in the question, and extra information may interfere with the answer prediction. To this end, we introduce an effective multi-modal co-attention network (MMCN) for learning essential words in the question and essential regions in the image. Each word and region is scored by the attention weighting method, which is used to indicate the importance of each word and region in the process of model reasoning. Experimental comparisons show that our MMCN is superior to the most advanced methods of the public RAD-VQA dataset.
- Lau, J., Gayen, S., Ben Abacha, A. et al. A dataset of clinically generated visual questions and answers about radiology images. Sci Data 5, 180251 (2018).Google Scholar
- Abacha A B , Gayen S , Lau J J , NLM at ImageCLEF 2018 Visual Question Answering in the Medical Domain[C]// Ceur Workshop. 2018.Google Scholar
- Jiang Z , Chi C , Zhan Y . Research on Medical Question Answering System Based on Knowledge Graph[J]. IEEE Access, 2021, PP(99):1-1.Google ScholarCross Ref
- Xiang L A , Mc B , Jl C , A hybrid medical text classification framework: Integrating attentive rule construction and neural network[J]. Neurocomputing, 2021, 443:345-355.Google ScholarCross Ref
- Gong H , Chen G , Liu S , Cross-Modal Self-Attention with Multi-Task Pre-Training for Medical Visual Question Answering[J]. 2021 International Conference on Multimedia Retrieval (ICMR '21), August 21–24, 2021, Taipei, Taiwan, 2021.Google Scholar
- Yu Z , Yu J , Cui Y , Deep Modular Co-Attention Networks for Visual Question Answering[J]. 2019 IEEE Conference on Computer Vision and Pattern Recognition, 2019.Google Scholar
- Guo Z , Han D . Multi-Modal Explicit Sparse Attention Networks for Visual Question Answering[J]. Sensors, 2020, 20(23):6758.Google ScholarCross Ref
- Cornia M , Stefanini M , Baraldi L , Meshed-Memory Transformer for Image Captioning[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2020.Google Scholar
- Ren S , He K , Girshick R , Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2017, 39(6):1137-1149.Google ScholarDigital Library
- Anderson P , He X , Buehler C , Bottom-Up and Top-Down Attention for Image Captioning and VQA[J]. 2017.Google Scholar
- Abacha A B , Hasan S A , Datla V V , VQA-Med: Overview of the Medical Visual Question Answering Task at ImageCLEF 2019[J]. Lecture Notes in Computer Science, 2019.Google Scholar
- Sarrouti M . NLM at VQA-Med 2020: Visual Question Answering and Generation in the Medical Domain[C]// ImageCLEF. 2020.Google Scholar
- Jia D , Wei D , Socher R , ImageNet: A large-scale hierarchical image database[C]// 2009:248-255.Google Scholar
- Do T , Nguyen B X , Tjiputra E , Multiple Meta-model Quantifying for Medical Visual Question Answering[J]. 2021.Google ScholarDigital Library
- Finn C , Abbeel P , Levine S . Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks[C]// 2017.Google Scholar
- Liu B , Zhan L M , Wu X M . Contrastive Pre-training andRepresentation Distillation forMedical Visual Question Answering Based onRadiology Images[C]// International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, Cham, 2021.Google Scholar
- He K , Zhang X , Ren S , Deep Residual Learning for Image Recognition[J]. IEEE, 2016.Google ScholarCross Ref
- Gong H , Chen G , Liu S , Cross-Modal Self-Attention with Multi-Task Pre-Training for Medical Visual Question Answering[J]. 2021 International Conference on Multimedia Retrieval (ICMR '21), August 21–24, 2021, Taipei, Taiwan, 2021.Google Scholar
- Yang Z , He X , Gao J , Stacked Attention Networks for Image Question Answering[J]. IEEE Computer Society, 2015.Google Scholar
- Lu J , Yang J , Batra D , Hierarchical Co-Attention for Visual Question Answering[J]. 2016.Google Scholar
- Vu M H , T Löfstedt, Nyholm T , A Question-Centric Model for Visual Question Answering in Medical Imaging[J]. IEEE Transactions on Medical Imaging, 2020, PP(99):1-1.Google Scholar
- Ye L , Rochan M , Liu Z , Cross-Modal Self-Attention Network for Referring Image Segmentation[J]. 2019.Google ScholarCross Ref
- Nguyen B D , Do T T , Nguyen B X , Overcoming Data Limitation in Medical Visual Question Answering[J]. 2019.Google ScholarDigital Library
- Zhang Y , Chen Q , Yang Z , BioWordVec, improving biomedical word embeddings with subword information and MeSH[J]. Scientific Data, 2019, 6(1).Google ScholarCross Ref
- Hochreiter S , Schmidhuber J . Long Short-Term Memory[J]. Neural Computation, 1997, 9(8):1735-1780.Google ScholarDigital Library
- Vaswani A , Shazeer N , Parmar N , Attention Is All You Need[C]// arXiv. arXiv, 2017.Google Scholar
- Kim J H , Jun J , Zhang B T . Bilinear Attention Networks[J]. 2018.Google Scholar
Recommendations
Medical Visual Question Answering via Conditional Reasoning
MM '20: Proceedings of the 28th ACM International Conference on MultimediaMedical visual question answering (Med-VQA) aims to accurately answer a clinical question presented with a medical image. Despite its enormous potential in healthcare industry and services, the technology is still in its infancy and is far from ...
Medical knowledge-based network for Patient-oriented Visual Question Answering
AbstractVisual Question Answering (VQA) systems have achieved great success in general scenarios. In medical domain, VQA systems are still in their infancy as the datasets are limited by scale and application scenarios. Current medical VQA ...
Highlights- We introduce a new Patient-oriented medical VQA dataset (P-VQA).
- P-VQA covers ...
Medical visual question answering based on question-type reasoning and semantic space constraint
AbstractMedical visual question answering (Med-VQA) aims to accurately answer clinical questions about medical images. Despite its enormous potential for application in the medical domain, the current technology is still in its infancy. ...
Highlights- A new Framework has been proposed for the medical visual question answering tasks.
Comments