Abstract
Question Answering (QA) is an all-around inquired issue in Natural Language Processing (NLP). This paper expands the boundaries of Question Answering by including textual and visual aspects, then further combining it with the SNS for enhancing its interaction with visually impaired people. In our proposed work, a text supported with an image is fed into the hybrid model which is a combination of CNN and LSTM, producing the most accurate result with the highest probability. Both questions and answers are open ended visual and textual queries in specifically targeted diverse regions of a picture including the subtle elements of a text. Subsequently, we created a framework that required a point by point comprehension of the picture which is more complex than the framework delivering just pictorial inscriptions. The model achieved better results than other models. By using this model, we enhanced interaction with the SNS with greater efficiency.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Mishra, A., Jain, S.K.: A survey on question answering systems with classification. J. King Saud Univ.-Comput. Inf. Sci. 28(3), 345–361 (2016)
Nadkarni, P.M., Ohno-Machado, L., Chapman, W.W.: Natural language processing: an introduction. J. Am. Med. Inform. Assoc. 18(5), 544–551 (2011)
Yang, Z., He, X., Gao, J., Deng, L., Smola, A.: Stacked attention networks for image question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 21–29 (2016)
Ma, L., Lu, Z., Li, H.: Learning to answer questions from image using convolutional neural network. In: AAAI, vol. 3, no. 7, p. 16, February 2016
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Mao, J., Xu, W., Yang, Y., Wang, J., Huang, Z., Yuille, A.: Deep captioning with multimodal recurrent neural networks (M-RNN). arXiv preprint arXiv:1412.6632 (2014)
Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Kumar, A., et al.: Ask me anything: dynamic memory networks for natural language processing. In: International Conference on Machine Learning, pp. 1378–1387, June 2016
Agrawal, A., et al.: VQA: visual question answering. Int. J. Comput. Vis. 123(1), 4–31 (2017)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Ren, M., Kiros, R., Zemel, R.: Exploring models and data for image question answering. In: Advances in Neural Information Processing Systems, pp. 2953–2961 (2015)
Xiong, C., Merity, S., Socher, R.: Dynamic memory networks for visual and textual question answering. In: International Conference on Machine Learning, pp. 2397–2406, June 2016
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
MSCOCO Dataset. http://cocodataset.org
The bAbI Dataset. https://research.fb.com/downloads/babi
Malinowski, M., Rohrbach, M., Fritz, M.: Ask your neurons: a neural-based approach to answering questions about images. In: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1–9. IEEE Computer Society, December 2015
Gao, H., Mao, J., Zhou, J., Huang, Z., Wang, L., Xu, W.: Are you talking to a mach? Dataset and methods for multilingual image question. In: Advances in Neural Information Processing Systems, pp. 2296–2304 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Pradhan, A., Shukla, P., Patra, P., Pathak, R., Jena, A.K. (2019). Enhancing Interaction with Social Networking Sites for Visually Impaired People by Using Textual and Visual Question Answering. In: Mandal, J., Mukhopadhyay, S., Dutta, P., Dasgupta, K. (eds) Computational Intelligence, Communications, and Business Analytics. CICBA 2018. Communications in Computer and Information Science, vol 1031. Springer, Singapore. https://doi.org/10.1007/978-981-13-8581-0_1
Download citation
DOI: https://doi.org/10.1007/978-981-13-8581-0_1
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-8580-3
Online ISBN: 978-981-13-8581-0
eBook Packages: Computer ScienceComputer Science (R0)