Skip to main content
Log in

DeepInsight: a CNN-based approach for machine reading comprehension in query answering systems and its applications

  • 1203: Applications of Advanced Artificial Intelligence in Multimedia and Information Security
  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Understanding and reading an unstructured text to answer queries on it, also called Machine Reading Comprehension (MRC), has been a hot topic amongst researchers worldwide in the past few years. There are many ways in which this problem has been tackled; each one has its perks and pitfalls. MRC is being thought about meticulously because of its various applications which are a pressing need due to ever-increasing data in this modern world. However, no work has been done focusing on the applications of MRC. We propose DeepInsight, an efficient CNN-based machine reading comprehension model inspired by QANet and an assemblage of three case studies. Each case study is about a specific application of DeepInsight, alongside the details of its implementation, results from the analysis, and shortcomings. These case studies can be referred to by developers around the world while building similar applications.The first case study is a fully implemented end-to-end android-based application that can upload documents to a server and receive different queries. Second, a mobile application that can help people with visual impairment to comprehend documents. Third, a video query application that can answer questions posed on video data, using the deep captioning model as a core for this application. DeepInsight has an EM/F1 score of 77.0 / 86.3 and performs better than the present state-of-the-art models while keeping the inference time of 0.535 seconds, justifiable for real-world applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Zhang S, He F (2020) DRCDN: learning deep residual convolutional dehazing networks. Vis Comput 36(9):1797–1808

    Article  Google Scholar 

  2. Pan Y, He F, Yu H (2020) Learning social representations with deep autoencoder for recommender system. World Wide Web 23(4):2259–2279

    Article  Google Scholar 

  3. Quan Q, He F, Li H (2021) A multi-phase blending method with incremental intensity for training detection networks. Vis Comput 37(2):245–259

    Article  Google Scholar 

  4. Li H, He F, Chen Y, Pan Y (2021) MLFS-CCDE: multi-objective large-scale feature selection by cooperative coevolutionary differential evolution. Memet Comput 13(1):1–18

    Article  Google Scholar 

  5. Riloff E, Thelen M (2000) A rule-based question answering system for reading comprehension tests. Reading Comprehension Tests as Evaluation for Computer-Based Language Understanding Systems, ANLP-NAACL Workshop

    Book  Google Scholar 

  6. Wang S, Jiang J (2016) Machine comprehension using match-lstm and answer pointer. arXiv:1608.07905

  7. Seo M, Kembhavi A, Farhadi A, Hajishirzi H (2016) Bidirectional attention flow for machine comprehension. arXiv:1611.01603

  8. Yu AW, Dohan D, Luong MT, Zhao R, Chen K, Norouzi M, Le QV (2018) Qanet: Combining local convolution with global self-attention for reading comprehension. arXiv:1804.09541

  9. Liu S, Zhang X, Zhang S, Wang H, Zhang W (2019) Neural machine reading comprehension: Methods and trends. Appl Sci 9(18):3698

    Article  Google Scholar 

  10. Venugopalan S, Rohrbach M, Donahue J, Mooney R, Darrell T, Saenko K (2015) Sequence to sequence-video to text. In Proceedings of the IEEE international conference on computer vision, pp 4534–4542

  11. Wang J, Jiang W, Ma L, Liu W, Xu Y (2018) Bidirectional attentive fusion with context gating for dense video captioning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7190–7198

  12. Sutskever I, Vinyals O, Le QV (2014) Sequence to Sequence learning with neural networks. In Advances in neural information processing systems, pp 3104–3112

  13. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  14. Gehring J, Auli M, Grangier D, Yarats D, Dauphin YN (2017) Convolutional sequence to sequence learning. International conference on machine learning, pp 1243–1252

  15. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, ... Polosukhin I (2017) Attention is all you need. In Advances in neural information processing systems, pp 5998–6008

  16. Li W, Li W, Wu Y (2018) A unified model for document-based question answering based on human-like reading strategy. In Proceedings of the AAAI conference on artificial intelligence

  17. Xiao H, Wang F, Yan J, Zheng J (2018) Dual ask-answer network for machine reading comprehension. arXiv:1809.01997

  18. Abobeah R, Shoukry A, Katto J (2020) Video Alignment Using Bi-Directional Attention Flow in a Multi-Stage Learning Model. IEEE Access 8:18097–18109

    Article  Google Scholar 

  19. Guadarrama S, Krishnamoorthy N, Malkarnenkar G, Venugopalan S, Mooney R, Darrell T, Saenko K (2013) Youtube2text: Recognizing and describing arbitrary activities using semantic hierarchies and zero-shot recognition. In Proceedings of the IEEE international conference on computer vision, pp 2712–2719

  20. Rohrbach M, Qiu W, Titov I, Thater S, Pinkal M, Schiele B (2013) Translating video content to natural language descriptions. In Proceedings of the IEEE international conference on computer vision, pp 433–440

  21. Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell: Neural image caption generation with visual attention. In International conference on machine learning, pp 2048–2057

  22. Gao L, Guo Z, Zhang H, Xu X, Shen HT (2017) Video captioning with attention-based LSTM and semantic consistency. IEEE Trans Multimed 19(9):2045–2055

    Article  Google Scholar 

  23. Yu H, Wang J, Huang Z, Yang Y, Xu W (2016) Video paragraph captioning using hierarchical recurrent neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4584–4593

  24. Krishna R, Hata K, Ren F, Fei-Fei L, Carlos Niebles J (2017) Dense-captioning events in videos. In Proceedings of the IEEE international conference on computer vision, pp 706–715

  25. Escorcia V, Heilbron FC, Niebles JC, Ghanem B (2016) Daps: Deep action proposals for action understanding. In European conference on computer vision, pp 768–784

  26. Pennington J, Socher R, Manning CD (2014) Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543

  27. Srivastava RK, Greff K, Schmidhuber J (2015) Highway networks. arXiv:1505.00387

  28. Wang W, Yang N, Wei F, Chang B, Zhou M (2017) Gated self-matching networks for reading comprehension and question answering. In Proceedings of the 55th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp 189–198

  29. Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, Ghemawat S, Goodfellow I, Harp A, Irving G, Isard M, Jia Y, Jozefowicz R, Kaiser L, Kudlur M, Levenberg J, Mane D, Monga R, Moore S, Murray D, Olah C, Schuster M, Shlens J, Steiner B, Sutskever I, Talwar K, Tucker P, Vanhoucke V, Vasudevan V, Viegas F, Vinyals O, Warden P, Wattenberg M, Wicke M, Yu Y, Zheng X (2016) TensorFlow: large-scale machine learning on heterogeneous distributed Systems. arXiv:1603.04467

  30. Harris CR, Millman KJ, van der Walt SJ, Gommers R, Virtanen P, Cournapeau D, Wieser E, Taylor J, Berg S, Smith NJ, Kern R, Picus M, Hoyer S, van Kerkwijk MH, Brett M, Haldane A, del Río JF, Wiebe M, Peterson P, Gérard-Marchant P, Sheppard K, Reddy T, Weckesser W, Abbasi H, Gohlke C, Oliphant TE (2020) Array programming with NumPy. Nature 585(7825):357–362

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Venkanna U..

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest and there was no human or animal testing or participation involved in this research.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shukla, A., Chourasia, K., Jain, G. et al. DeepInsight: a CNN-based approach for machine reading comprehension in query answering systems and its applications. Multimed Tools Appl 83, 3313–3333 (2024). https://doi.org/10.1007/s11042-023-17732-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-17732-5

Keywords

Navigation