Hybrid deep learning model for answering visual medical questions

Gasmi, Karim

doi:10.1007/s11227-022-04474-8

Hybrid deep learning model for answering visual medical questions

Published: 11 April 2022

Volume 78, pages 15042–15059, (2022)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Karim Gasmi ORCID: orcid.org/0000-0003-0138-2226¹

893 Accesses
Explore all metrics

Abstract

Due to the increase in electronic documents containing medical information, the search for specific information is often complex and time-consuming. This has prompted the development of new tools designed to address this issue. Automated visual question/answer (VQA) systems are becoming more challenging to develop. These are computer programs that take images and questions as input and then combine all inputs to generate text-based answers. Due to the enormous amount of question and the limited number of specialists, many issues stay unanswered. It’s possible to solve this problem by using automatic question classifiers that guide queries to experts based on their subject preferences. For these purposes, we propose a VQA approach based on a hybrid deep learning model. The model consists of three steps: (1) the classification of medical questions based on a BERT model; (2) image and text feature extraction using a deep learning model, more specifically the extraction of medical image features by a hybrid deep learning model; and (3) text feature extraction using a Bi-LSTM model. Finally, to predict the appropriate answer, our approach uses a KNN model. Additionally, this study examines the influence of the Adam, AdaGrad, Stochastic gradient descent and RMS Prop optimization techniques on the performance of the network. As a consequence of the studies, it was shown that Adam and SGD optimization algorithms consistently produced higher outcomes. Experiments using the ImageCLEF 2019 dataset revealed that the suggested method increases BLEU and WBSS values considerably.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient Adapter on Pre-trained Visual Feature Reliance in Medical Visual Question Answering

MedFuseNet: An attention-based multimodal deep learning model for visual question answering in the medical domain

Article Open access 06 October 2021

M2FNet: Multi-granularity Feature Fusion Network for Medical Visual Question Answering

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

http://pewinternet.org/Reports/2013/Health-online.aspx.

References

He X, Cai Z, Wei W, Zhang Y, Mou L, Xing E, Xie P (2021) Towards visual question answering on pathology images. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pp. 708–718
Demner-Fushman D, Lin JJ (2006) Answer extraction, semantic clustering, and extractive summarization for clinical question answering. In: ACL
Lin JJ, Katz B (2003) Question answering from the web using knowledge annotation and knowledge mining techniques. In: CIKM ’03
Popescu A-M, Etzioni O, Kautz HA (2003) Towards a theory of natural language interfaces to databases. In: IUI ’03
Rinaldi F, Dowdall J, Schneider G, Persidis A (2004) Answering questions in the genomics domain. In: ACL 2004
Katz B (1997) From sentence processing to information access on the world wide web. In: AAAI Spring Symposium on Natural Language Processing for the World Wide Web, vol. 1, p. 997
Lin Z, Zhang D, Tac Q, Shi D, Haffari G, Wu Q, He M, Ge Z (2021) Medical visual question answering: a survey. arXiv preprint arXiv:2111.10056
Devlin J, Chang M-W, Lee K, Toutanova K (2019) Bert: pre-training of deep bidirectional transformers for language understanding. In: NAACL
Lehnert WG (1977) Human and computational question answering. Cogn Sci 1:47–73
Article Google Scholar
Do T, Nguyen BX, Tjiputra E, Tran M, Tran QD, Nguyen A (2021) Multiple meta-model quantifying for medical visual question answering. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 64–74. Springer
Liu B, Zhan L-M, Xu L, Ma L, Yang Y, Wu X-M (2021) Slake: a semantically-labeled knowledge-enhanced dataset for medical visual question answering. In: 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI), pp. 1650–1654. IEEE
Gassara A, Rodriguez IB, Jmaiel M, Drira K (2017) A bigraphical multi-scale modeling methodology for system of systems. Comput Electr Eng 58:113–125
Article Google Scholar
Monceaux L, Robba I (2002) Les analyseurs syntaxiques : atouts pour une analyse des questions dans un système de question-réponse ? In: JEPTALNRECITAL
Mendes S, Moriceau V (2004) L’analyse des questions: intérêts pour la génération des réponses. In: Workshop Question-Response
Ferret O, Grau B, Hurault-Plantet M, Illouz G, Jacquemin C, Masson N, Lecuyer P (2000) Qalc–the question-answering system of limsi-cnrs. In: TREC
Grau B, Ligozat A-L, Robba I, Vilnat A, Monceaux L (2006) Frasques: a question-answering system in the equer evaluation campaign. In: LREC 2006, p. 2006
Laurent D, Séguéla P (2005) Qristal, système de questions-réponses. In: Actes de la 12ème Conférence sur Le Traitement Automatique des Langues Naturelles. Articles longs, pp. 51–60
Benamara F (2004) Cooperative question answering in restricted domains: the webcoop experiment. In: Proceedings of the Conference on Question Answering in Restricted Domains, pp. 31–38
Teillaud JS (2017) medecine/sciences 2017: the french touch des avancées des connaissances biomédicales en... langue française. M S-Med Sci 33:7–8
Google Scholar
Zweigenbaum P (2001) Traitements automatiques de la terminologie médicale. Revue française de linguistique appliquée 6(2):47–62
Article Google Scholar
Khabou N, Rodriguez IB (2015) Threshold-based context analysis approach for ubiquitous systems. Concurr Comput Pract Exp 27(6):1378–1390
Article Google Scholar
Malinowski M, Fritz M (2014) A multi-world approach to question answering about real-world scenes based on uncertain input. In: NIPS
Agrawal A, Lu J, Antol S, Mitchell M, Zitnick CL, Parikh D, Batra D (2015) Vqa: visual question answering. Int J Comput Vision 123:4–31
Article MathSciNet Google Scholar
Goyal, Y, Khot, T, Summers-Stay, D, Batra, D, Parikh, D. (2017) Making the v in vqa matter: Elevating the role of image understanding in visual question answering. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 6325–6334
Frome, A, Corrado, G.S, Shlens, J, Bengio, S, Dean, J, Ranzato, M, Mikolov, T.: Devise: a deep visual-semantic embedding model. In: NIPS (2013)
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Commun ACM 60:84–90
Article Google Scholar
Mikolov, T, Chen, K, Corrado, G.S, Dean, J.: Efficient estimation of word representations in vector space. In: ICLR (2013)
Wang L, Li Y, Huang J, Lazebnik S (2019) Learning two-branch neural networks for image-text matching tasks. IEEE Trans Pattern Anal Mach Intell 41:394–407
Article Google Scholar
Huang, Y, Wang, W, Wang, L.: Instance-aware image and sentence matching with selective multimodal lstm. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 7254–7262 (2017)
Dong, X, Zhu, L, Zhang, D, Yang, Y, Wu, F (2018): Fast parameter adaptation for few-shot image captioning and visual question answering. Proceedings of the 26th ACM international conference on Multimedia
Mao J, Gan C, Kohli P, Tenenbaum JB, Wu J (2019) The neuro-symbolic concept learner: interpreting scenes, words, and sentences from natural supervision. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA
Yi K, Wu J, Gan C, Torralba A, Kohli P, Tenenbaum JB (2018) Neural-symbolic vqa: disentangling reasoning from vision and language understanding. In: NeurIPS
Nguyen BD, Do T-T, Nguyen BX, Do TK, Tjiputra E, Tran QD (2019) Overcoming data limitation in medical visual question answering. In: MICCAI
Masci J, Meier U, Ciresan DC, Schmidhuber J (2011) Stacked convolutional auto-encoders for hierarchical feature extraction. In: ICANN
Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In: ICML
Chatfield K, Simonyan K, Vedaldi A, Zisserman A (2014) Return of the devil in the details: delving deep into convolutional nets. In: Valstar MF, French AP, Pridmore TP (eds) British Machine Vision Conference, BMVC 2014. Nottingham, UK
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: Bengio, Y, LeCun, Y (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, Conference Track Proceedings
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 16 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770–778
Ghourabi A, Mahmood MA, Alzubi QM (2020) A hybrid cnn-lstm model for sms spam detection in Arabic and English messages. Future Internet 12:156
Article Google Scholar
Song M, Zhao X, Liu Y, Zhao Z (2018) Text sentiment analysis based on convolutional neural network and bidirectional lstm model. In: ICPCSEE
Cui Z, Ke R, Pu Z, Wang Y (2020) Stacked bidirectional and unidirectional lstm recurrent neural network for forecasting network-wide traffic state with missing values. ArXiv
Abacha AB, Hasan SA, Datla V, Liu J, Demner-Fushman D, Müller H (2019) Vqa-med: overview of the medical visual question answering task at imageclef 2019. In: CLEF
Papineni K, Roukos S, Ward T, Zhu W-J (2002) Bleu: a method for automatic evaluation of machine translation. In: ACL
Sogancioglu G, Öztürk H, Özgür A (2017) Biosses: a semantic sentence similarity estimation system for the biomedical domain. Bioinformatics 33:49–58
Article Google Scholar
Wu Z, Palmer MS (1994) Verb semantics and lexical selection. In: 32nd Annual Meeting of the Association for Computational Linguistics, New Mexico State University, pp. 133–138
Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: Bengio Y, LeCun Y. (Eds) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings
Duchi JC, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res
Tieleman T, Hinton G (2012) Lecture 6.5-rmsprop, coursera: Neural networks for machine learning. University of Toronto, Technical Report

Download references

Author information

Authors and Affiliations

Department of Computer Science, College of Arts and Sciences at Tabarjal, Jouf University, Jouf, Saudi Arabia
Karim Gasmi

Authors

Karim Gasmi
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Karim Gasmi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gasmi, K. Hybrid deep learning model for answering visual medical questions. J Supercomput 78, 15042–15059 (2022). https://doi.org/10.1007/s11227-022-04474-8

Download citation

Accepted: 19 March 2022
Published: 11 April 2022
Issue Date: September 2022
DOI: https://doi.org/10.1007/s11227-022-04474-8

Keywords

Part of a collection:

SI - New Trends in Autonomous Systems Engineering

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hybrid deep learning model for answering visual medical questions

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Efficient Adapter on Pre-trained Visual Feature Reliance in Medical Visual Question Answering

MedFuseNet: An attention-based multimodal deep learning model for visual question answering in the medical domain

M2FNet: Multi-granularity Feature Fusion Network for Medical Visual Question Answering

Explore related subjects

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now