Enhancing Interaction with Social Networking Sites for Visually Impaired People by Using Textual and Visual Question Answering

Pradhan, Akshit; Shukla, Pragya; Patra, Pallavi; Pathak, Rohit; Jena, Ajay Kumar

doi:10.1007/978-981-13-8581-0_1

Akshit Pradhan¹¹,
Pragya Shukla¹¹,
Pallavi Patra¹¹,
Rohit Pathak¹¹ &
…
Ajay Kumar Jena¹¹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1031))

Included in the following conference series:

International Conference on Computational Intelligence, Communications, and Business Analytics

1104 Accesses

Abstract

Question Answering (QA) is an all-around inquired issue in Natural Language Processing (NLP). This paper expands the boundaries of Question Answering by including textual and visual aspects, then further combining it with the SNS for enhancing its interaction with visually impaired people. In our proposed work, a text supported with an image is fed into the hybrid model which is a combination of CNN and LSTM, producing the most accurate result with the highest probability. Both questions and answers are open ended visual and textual queries in specifically targeted diverse regions of a picture including the subtle elements of a text. Subsequently, we created a framework that required a point by point comprehension of the picture which is more complex than the framework delivering just pictorial inscriptions. The model achieved better results than other models. By using this model, we enhanced interaction with the SNS with greater efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

CognitiveCam: A Visual Question Answering Application

HELPI VIZ: A Semantic image Annotation and Visualization Platform for Visually Impaired

Artificial Eye: Online Video Browsing Guide for Visually Impaired

References

Mishra, A., Jain, S.K.: A survey on question answering systems with classification. J. King Saud Univ.-Comput. Inf. Sci. 28(3), 345–361 (2016)
Google Scholar
Nadkarni, P.M., Ohno-Machado, L., Chapman, W.W.: Natural language processing: an introduction. J. Am. Med. Inform. Assoc. 18(5), 544–551 (2011)
Article Google Scholar
Yang, Z., He, X., Gao, J., Deng, L., Smola, A.: Stacked attention networks for image question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 21–29 (2016)
Google Scholar
Ma, L., Lu, Z., Li, H.: Learning to answer questions from image using convolutional neural network. In: AAAI, vol. 3, no. 7, p. 16, February 2016
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Mao, J., Xu, W., Yang, Y., Wang, J., Huang, Z., Yuille, A.: Deep captioning with multimodal recurrent neural networks (M-RNN). arXiv preprint arXiv:1412.6632 (2014)
Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Google Scholar
Kumar, A., et al.: Ask me anything: dynamic memory networks for natural language processing. In: International Conference on Machine Learning, pp. 1378–1387, June 2016
Google Scholar
Agrawal, A., et al.: VQA: visual question answering. Int. J. Comput. Vis. 123(1), 4–31 (2017)
Article MathSciNet Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Ren, M., Kiros, R., Zemel, R.: Exploring models and data for image question answering. In: Advances in Neural Information Processing Systems, pp. 2953–2961 (2015)
Google Scholar
Xiong, C., Merity, S., Socher, R.: Dynamic memory networks for visual and textual question answering. In: International Conference on Machine Learning, pp. 2397–2406, June 2016
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
MSCOCO Dataset. http://cocodataset.org
The bAbI Dataset. https://research.fb.com/downloads/babi
Malinowski, M., Rohrbach, M., Fritz, M.: Ask your neurons: a neural-based approach to answering questions about images. In: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1–9. IEEE Computer Society, December 2015
Google Scholar
Gao, H., Mao, J., Zhou, J., Huang, Z., Wang, L., Xu, W.: Are you talking to a mach? Dataset and methods for multilingual image question. In: Advances in Neural Information Processing Systems, pp. 2296–2304 (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Engineering, Kalinga Institute of Industrial Technology, Deemed to be University, Bhubaneswar, India
Akshit Pradhan, Pragya Shukla, Pallavi Patra, Rohit Pathak & Ajay Kumar Jena

Authors

Akshit Pradhan
View author publications
You can also search for this author in PubMed Google Scholar
Pragya Shukla
View author publications
You can also search for this author in PubMed Google Scholar
Pallavi Patra
View author publications
You can also search for this author in PubMed Google Scholar
Rohit Pathak
View author publications
You can also search for this author in PubMed Google Scholar
Ajay Kumar Jena
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Akshit Pradhan .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, University of Kalyani, Kalyani, West Bengal, India
Jyotsna Kumar Mandal
Department of Computer Science and Engineering, Assam University, Silchar, Assam, India
Somnath Mukhopadhyay
Department of Computer and Systems Sciences, Visva Bharati University, Santiniketan, West Bengal, India
Paramartha Dutta
Department of Computer Science and Engineering, Kalyani Government Engineering College, Kalyani, West Bengal, India
Kousik Dasgupta

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pradhan, A., Shukla, P., Patra, P., Pathak, R., Jena, A.K. (2019). Enhancing Interaction with Social Networking Sites for Visually Impaired People by Using Textual and Visual Question Answering. In: Mandal, J., Mukhopadhyay, S., Dutta, P., Dasgupta, K. (eds) Computational Intelligence, Communications, and Business Analytics. CICBA 2018. Communications in Computer and Information Science, vol 1031. Springer, Singapore. https://doi.org/10.1007/978-981-13-8581-0_1

Download citation

DOI: https://doi.org/10.1007/978-981-13-8581-0_1
Published: 26 June 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-8580-3
Online ISBN: 978-981-13-8581-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics