Abstract
Understanding and reading an unstructured text to answer queries on it, also called Machine Reading Comprehension (MRC), has been a hot topic amongst researchers worldwide in the past few years. There are many ways in which this problem has been tackled; each one has its perks and pitfalls. MRC is being thought about meticulously because of its various applications which are a pressing need due to ever-increasing data in this modern world. However, no work has been done focusing on the applications of MRC. We propose DeepInsight, an efficient CNN-based machine reading comprehension model inspired by QANet and an assemblage of three case studies. Each case study is about a specific application of DeepInsight, alongside the details of its implementation, results from the analysis, and shortcomings. These case studies can be referred to by developers around the world while building similar applications.The first case study is a fully implemented end-to-end android-based application that can upload documents to a server and receive different queries. Second, a mobile application that can help people with visual impairment to comprehend documents. Third, a video query application that can answer questions posed on video data, using the deep captioning model as a core for this application. DeepInsight has an EM/F1 score of 77.0 / 86.3 and performs better than the present state-of-the-art models while keeping the inference time of 0.535 seconds, justifiable for real-world applications.





Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Zhang S, He F (2020) DRCDN: learning deep residual convolutional dehazing networks. Vis Comput 36(9):1797–1808
Pan Y, He F, Yu H (2020) Learning social representations with deep autoencoder for recommender system. World Wide Web 23(4):2259–2279
Quan Q, He F, Li H (2021) A multi-phase blending method with incremental intensity for training detection networks. Vis Comput 37(2):245–259
Li H, He F, Chen Y, Pan Y (2021) MLFS-CCDE: multi-objective large-scale feature selection by cooperative coevolutionary differential evolution. Memet Comput 13(1):1–18
Riloff E, Thelen M (2000) A rule-based question answering system for reading comprehension tests. Reading Comprehension Tests as Evaluation for Computer-Based Language Understanding Systems, ANLP-NAACL Workshop
Wang S, Jiang J (2016) Machine comprehension using match-lstm and answer pointer. arXiv:1608.07905
Seo M, Kembhavi A, Farhadi A, Hajishirzi H (2016) Bidirectional attention flow for machine comprehension. arXiv:1611.01603
Yu AW, Dohan D, Luong MT, Zhao R, Chen K, Norouzi M, Le QV (2018) Qanet: Combining local convolution with global self-attention for reading comprehension. arXiv:1804.09541
Liu S, Zhang X, Zhang S, Wang H, Zhang W (2019) Neural machine reading comprehension: Methods and trends. Appl Sci 9(18):3698
Venugopalan S, Rohrbach M, Donahue J, Mooney R, Darrell T, Saenko K (2015) Sequence to sequence-video to text. In Proceedings of the IEEE international conference on computer vision, pp 4534–4542
Wang J, Jiang W, Ma L, Liu W, Xu Y (2018) Bidirectional attentive fusion with context gating for dense video captioning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7190–7198
Sutskever I, Vinyals O, Le QV (2014) Sequence to Sequence learning with neural networks. In Advances in neural information processing systems, pp 3104–3112
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Gehring J, Auli M, Grangier D, Yarats D, Dauphin YN (2017) Convolutional sequence to sequence learning. International conference on machine learning, pp 1243–1252
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, ... Polosukhin I (2017) Attention is all you need. In Advances in neural information processing systems, pp 5998–6008
Li W, Li W, Wu Y (2018) A unified model for document-based question answering based on human-like reading strategy. In Proceedings of the AAAI conference on artificial intelligence
Xiao H, Wang F, Yan J, Zheng J (2018) Dual ask-answer network for machine reading comprehension. arXiv:1809.01997
Abobeah R, Shoukry A, Katto J (2020) Video Alignment Using Bi-Directional Attention Flow in a Multi-Stage Learning Model. IEEE Access 8:18097–18109
Guadarrama S, Krishnamoorthy N, Malkarnenkar G, Venugopalan S, Mooney R, Darrell T, Saenko K (2013) Youtube2text: Recognizing and describing arbitrary activities using semantic hierarchies and zero-shot recognition. In Proceedings of the IEEE international conference on computer vision, pp 2712–2719
Rohrbach M, Qiu W, Titov I, Thater S, Pinkal M, Schiele B (2013) Translating video content to natural language descriptions. In Proceedings of the IEEE international conference on computer vision, pp 433–440
Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell: Neural image caption generation with visual attention. In International conference on machine learning, pp 2048–2057
Gao L, Guo Z, Zhang H, Xu X, Shen HT (2017) Video captioning with attention-based LSTM and semantic consistency. IEEE Trans Multimed 19(9):2045–2055
Yu H, Wang J, Huang Z, Yang Y, Xu W (2016) Video paragraph captioning using hierarchical recurrent neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4584–4593
Krishna R, Hata K, Ren F, Fei-Fei L, Carlos Niebles J (2017) Dense-captioning events in videos. In Proceedings of the IEEE international conference on computer vision, pp 706–715
Escorcia V, Heilbron FC, Niebles JC, Ghanem B (2016) Daps: Deep action proposals for action understanding. In European conference on computer vision, pp 768–784
Pennington J, Socher R, Manning CD (2014) Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543
Srivastava RK, Greff K, Schmidhuber J (2015) Highway networks. arXiv:1505.00387
Wang W, Yang N, Wei F, Chang B, Zhou M (2017) Gated self-matching networks for reading comprehension and question answering. In Proceedings of the 55th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp 189–198
Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, Ghemawat S, Goodfellow I, Harp A, Irving G, Isard M, Jia Y, Jozefowicz R, Kaiser L, Kudlur M, Levenberg J, Mane D, Monga R, Moore S, Murray D, Olah C, Schuster M, Shlens J, Steiner B, Sutskever I, Talwar K, Tucker P, Vanhoucke V, Vasudevan V, Viegas F, Vinyals O, Warden P, Wattenberg M, Wicke M, Yu Y, Zheng X (2016) TensorFlow: large-scale machine learning on heterogeneous distributed Systems. arXiv:1603.04467
Harris CR, Millman KJ, van der Walt SJ, Gommers R, Virtanen P, Cournapeau D, Wieser E, Taylor J, Berg S, Smith NJ, Kern R, Picus M, Hoyer S, van Kerkwijk MH, Brett M, Haldane A, del Río JF, Wiebe M, Peterson P, Gérard-Marchant P, Sheppard K, Reddy T, Weckesser W, Abbasi H, Gohlke C, Oliphant TE (2020) Array programming with NumPy. Nature 585(7825):357–362
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest and there was no human or animal testing or participation involved in this research.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Shukla, A., Chourasia, K., Jain, G. et al. DeepInsight: a CNN-based approach for machine reading comprehension in query answering systems and its applications. Multimed Tools Appl 83, 3313–3333 (2024). https://doi.org/10.1007/s11042-023-17732-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-17732-5