Design and development of counting-based visual question answering model using heuristic-based feature selection with deep learning

Welde, Tesfayee Meshu; Liao, Lejian

doi:10.1007/s10462-022-10385-0

Design and development of counting-based visual question answering model using heuristic-based feature selection with deep learning

Published: 17 January 2023

Volume 56, pages 8859–8888, (2023)
Cite this article

Artificial Intelligence Review Aims and scope Submit manuscript

Tesfayee Meshu Welde¹ &
Lejian Liao¹

351 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

Visual Question Answering (VQA) is the most significant area that adopts both computer vision techniques and natural language processing techniques. Among all the question types, the most challenging question type is said to be counting, such as “How many?” Still, VQA models consist of certain difficulties in counting the objects that are present in the natural images. The basic technique in the VQA involved either classifying answers according to a definite-length description of both the question and image or estimating summing fractional counts from every image segment. Soft attention in these methods is utilized to find these primary issues. To circumvent this problem, the main intention of this paper is to implement the latest visual question-answering system based on a counting scenario. At first, the standard benchmark datasets related to the visual question-answering system are gathered. This question-answering system dataset is usually incorporated with both images and questions. Hence, feature extraction is adopted for both questions and images. For the questions, the text pre-processing is initially employed by punctuation removal, stemming, and stop word removal and the word2vec features are extracted. Similarly, the deep features of the given images are extracted from the pooling layer of the Deep Convolutional Neural Network (DCNN). These two sets of features are integrated and are fed to the selection of optimal feature procedures for acquiring the most significant features that are giving unique information. The selection of optimal features is handled by the Optimized Deep Neural-Long Short-Term Memory (DN-LSTM). It needs less time and computational complexity and also can be applied to solving all engineering optimization problems. It also can tackle multilevel thresholding problems. These advantages in the Parameter Improved-Elephant Herding Optimization (PI-EHO) over the conventional optimization algorithms seek more attention for choosing the EHO in the designed method. Finally, the answer generation is done by hybrid deep learning with Long Short Term Memory (LSTM) and Deep Neural Network (DNN), for which the architecture is improvised by the proposed EHO. The given designed method is experimented on the different data sets, yielding promising results when compared to existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multiple answers to a question: a new approach for visual question answering

Article 01 January 2020

Comprehensive Analysis of State-of-the-Art Techniques for VQA

Question Type Guided Attention in Visual Question Answering

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

The data underlying this article are available in the visual question-answering database, at https://visualqa.org/download.html.

References

Ambati, Loknath Sai and El-Gayar, Omar (2021) Human Activity Recognition: A Comparison of Machine Learning Approaches. Journal of the Midwest Association for Information Systems
Loknath Sai Ambati, Kanthi Narukonda,Giridhar Reddy Bojja, Dave Bishop (2020d) Factors Influencing the Adoption of Artificial Intelligence in Organizations-From an Employee's Perspective," Adoption of AI in organization from employee perspective
Antol S, Agrawal A, Lu J, Mitchell M, Batra D, Zitnick CL, Parikh D (2017) VQA: visual question answering. Int J Comput vis 123:4–31
Article MathSciNet Google Scholar
Baek J-W, Chung K-Y (2021) Multimedia recommendation using word2Vec-based social relationship mining. Multimedia Tools Appl 80:34499–34515
Article Google Scholar
Bui QT, Pham MV, Nguyen QH, Nguyen LX, Pham HM (2019) Whale optimization algorithm and adaptive neuro-fuzzy inference system: a hybrid method for feature selection and land pattern classification. Int J Remote Sensing. https://doi.org/10.1080/01431161.2019.1578000
Article Google Scholar
Prithvijit Chattopadhyay, Ramakrishna Vedantam, Ramprasaath R. Selvaraju, Dhruv Batra, Devi Parikh, (2017) Counting Everyday Objects in Everyday Scenes. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1135–1144
Chen SW et al (2017) counting apples and oranges with deep learning: a data-driven approach. IEEE Robot Automation Lett 2(2):781–788
Article Google Scholar
M. Chen, Y. Wang, S. Chen and Y. Wu, (2019a) Counting Attention Based on Classification Confidence for Visual Question Answering. IEEE International Conference on Big Data and Cloud Computing (BdCloud), pp. 1173–1179
Chen M, Wang Y, Chen S, Wu Y (2019b) Counting Attention based on classification confidence for visual question answering. IEEE Intl Conf Parallel Distrib Process Appl Big Data Cloud Comput Sustain Comput Commun Soc Comput Netw. https://doi.org/10.1109/ISPA-BDCloud-SustainCom-SocialCom48970.2019.00167
Article Google Scholar
Chen C, Han D, Wang J (2020) Multimodal encoder-decoder attention networks for visual question answering. IEEE Access 8:35662–35671
Article Google Scholar
I. Chowdhury, K. Nguyen, C. Fookes and S. Sridharan, (2017) A cascaded long short-term memory (LSTM) driven generic visual question answering (VQA). IEEE International Conference on Image Processing (ICIP), pp. 1842–1846
Gao D, Wang R, Shan S, Chen X (2020) Learning to recognize visual concepts for visual question answering with structural label space. IEEE J Sel Topics Signal Process 14(3):494–505
Article Google Scholar
He T, Droppo J (2016) Exploiting LSTM structure in deep neural networks for speech recognition. IEEE Int Conf Acoust Speech Signal Process (ICASSP). https://doi.org/10.1109/ICASSP.2016.7472718
Article Google Scholar
He S, Han C, Han G, Qin J (2020) Exploring duality in visual question-driven top-down saliency. IEEE Trans Neural Netw Learn Syst 31(7):2672–2679
Google Scholar
Jagadeeshwar TL, Kalyani S, Rajagopal P, Srinivasan B (2021) Statistics-based baseline-free approach for rapid inspection of delamination in composite structures using ultrasonic guided waves. Struct Health Monit. https://doi.org/10.1177/14759217211073335
Article Google Scholar
Kadhim A (2018) An evaluation of preprocessing techniques for text classification. Int J Comput Sci Inf Secur 16(6):22–32
Google Scholar
Kafle S, de Silva N, Dou D (2020) An overview of utilizing knowledge bases in neural networks for question answering. Inf Syst Front 22:1095–1111
Article Google Scholar
Lao M, Guo Y, Wang H, Zhang X (2018) Cross-modal multistep fusion network with co-attention for visual question answering. IEEE Access 6:31516–31524
Article Google Scholar
Li J, Lei H, Alavi AH, Wang GG (2020) Elephant herding optimization: variants, hybrids, and applications. Mathematics. https://doi.org/10.3390/math8091415
Article Google Scholar
Liang J, Jiang L, Cao L, Kalantidis Y, Li L, Hauptmann AG (2019) Focal visual-text attention for memex question answering. IEEE Trans Pattern Anal Mach Intell 41(8):1893–1908
Article Google Scholar
Lobry S, Marcos D, Kellenberger B, Tuia D (2020) Better generic objects counting when asking questions to images: a multitask approach for remote sensing visual question answering. ISPRS Ann Photogramm Remote Sens Spat Inform Sci. https://doi.org/10.5194/isprs-annals-V-2-2020-1021-2020
Lobry S, Marcos D, Murray J, Tuia D (2020) RSVQA: visual question answering for remote sensing data. IEEE Trans Geosci Remote Sens 58(12):8555–8566
Article Google Scholar
Maaike de Boer, Steven Reitsma and Klamer Schutte, (2016b) Counting in Visual Question Answering. Dutch-Belgian Information Retrieval Workshop
Miyanishi T, Maekawa T, Kawanabe M (2021) Sim2RealQA: using life simulation to solve question answering real-world events. IEEE Access 9:75003–75020
Article Google Scholar
Duy-Kien Nguyen, VedanujGoswami and Xinlei Chen, (2020b) MoVie: Revisiting Modulated Convolutions for Visual Counting and Beyond. Computer Vision and Pattern Recognition
Park S, Hwang S, Hong J, Byun H (2020) Fair-VQA: fairness-aware visual question answering through sensitive attribute prediction. IEEE Access 8:215091–215099
Article Google Scholar
Sahoo RM, Padhy SK (2020c) Elephant herding optimization for multiprocessor task scheduling in heterogeneous environment. Comput Intell Pattern Recognit. https://doi.org/10.1007/978-981-15-2449-3_18
Article Google Scholar
Stefan Schneider and Alex Zhuang (2020a) Counting Fish and Dolphins in Sonar Images Using Deep Learning. Computer Vision and Pattern Recognition 24
Song H, Liang H, Li H, Dai Z, Yun X (2019) Vision-based vehicle detection and counting system using deep learning in highway scenes. Eur Transp Res Rev. https://doi.org/10.1186/s12544-019-0390-4
Article Google Scholar
Tabjula JL, Kanakambaran S, Kalyani S, Rajagopal P, Srinivasan B (2021) Outlier analysis for defect detection using sparse sampling in guided wave structural health monitoring. Struct Contr Health Monit. https://doi.org/10.1002/stc.2690
Article Google Scholar
Trott A, Xiong C, Socher R (2018) Interpretable counting for visual question answering. Artif Intell. https://doi.org/10.48550/arXiv.1712.08697
Article Google Scholar
Vosooghifard M, Ebrahimpour H (2015) Applying Grey Wolf Optimizer-based decision tree classifier for cancer classification on gene expression data. Int Conf Comput Knowl Eng (ICCKE). https://doi.org/10.1109/ICCKE.2015.7365818
Article Google Scholar
Wang P, Wu Q, Shen C, Dick A, van den Hengel A (2018) FVQA: fact-based visual question answering. IEEE Trans Pattern Anal Mach Intell 40(10):2413–2427
Article Google Scholar
Wu Q, Shen C, Wang P, Dick A, A. v. d. Hengel, (2018) Image captioning and visual question answering based on attributes and external knowledge. IEEE Trans Pattern Anal Mach Intell 40(6):1367–1381
Article Google Scholar
Yang C, Jiang M, Jiang B, Zhou W, Li K (2019) Co-attention network with question type for visual question answering. IEEE Access 7:40771–40781
Article Google Scholar
Yu J et al (2020) Reasoning on the relation: enhancing visual representation for visual question answering and cross-modal retrieval. IEEE Trans Multimedia 22(12):3196–3209
Article Google Scholar
Zhang C, Li H, Wang X, Yang X (2015) Cross-scene crowd counting via deep convolutional neural networks. Proc IEEE Conf Comput vis Pattern Recognit (CVPR). https://doi.org/10.1109/CVPR.2015.7298684
Article Google Scholar
Zhang J, Ma S, Sameki M, Sclaroff S, Betke M, Lin Z, Shen X, Price B, Mech R (2016a) Salient Object Subitizing. Comput vis Pattern Recognit. https://doi.org/10.1007/s11263-017-1011-0
Article Google Scholar
Zhang Y, Hare J, Prügel-Bennett A (2018) Learning to count objects in natural images for visual question answering. Comput vis Pattern Recognit. https://doi.org/10.48550/arXiv.1802.05766
Article Google Scholar
Xiaoqin Zhang, Weiming Hu, S. Maybank, Xi Li, and Mingliang Zhu, (2008) Sequential particle swarm optimization for visual tracking. IEEE Conference on Computer Vision and Pattern Recognition
Jianming Zhang, Shugao Ma, MehrnooshSameki, Stan Sclaroff, MargritBetke, Zhe Lin, XiaohuiShen, Brian Price, RadomirMech (2015) Salient Object Subitizing. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4045–4054

Download references

Author information

Authors and Affiliations

Department of Computer Science and Information Technology, Beijing Institute of Technology, Beijing, 100081, China
Tesfayee Meshu Welde & Lejian Liao

Authors

Tesfayee Meshu Welde
View author publications
You can also search for this author in PubMed Google Scholar
Lejian Liao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lejian Liao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Welde, T.M., Liao, L. Design and development of counting-based visual question answering model using heuristic-based feature selection with deep learning. Artif Intell Rev 56, 8859–8888 (2023). https://doi.org/10.1007/s10462-022-10385-0

Download citation

Accepted: 29 December 2022
Published: 17 January 2023
Issue Date: August 2023
DOI: https://doi.org/10.1007/s10462-022-10385-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Design and development of counting-based visual question answering model using heuristic-based feature selection with deep learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multiple answers to a question: a new approach for visual question answering

Comprehensive Analysis of State-of-the-Art Techniques for VQA

Question Type Guided Attention in Visual Question Answering

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Design and development of counting-based visual question answering model using heuristic-based feature selection with deep learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multiple answers to a question: a new approach for visual question answering

Comprehensive Analysis of State-of-the-Art Techniques for VQA

Question Type Guided Attention in Visual Question Answering

Explore related subjects

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation