Abstract
Recent technological advancements have led to a significant increase in the quantity and accessibility of videos. The decrease in video acquisition costs and the increase in memory capacity have made it possible to store large video collections in computer systems. To effectively exploit these collections, it is crucial to have tools that facilitate access and management. In this paper, we present a multimedia retrieval approach that prioritizes the user’s needs by starting with a text-based query. The approach consists of two main parts: (i) a new multi-level and deep-semantic video classification indexing method, and (ii) a query expansion mechanism and relevance feedback system to improve the results based on the user’s feedback. Our contribution is demonstrated through the implementation of the Deep-VISEN prototype and experiments on a collection of 2700 videos and 62838 images. The results show that our algorithm is effective and precise.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17256-y/MediaObjects/11042_2023_17256_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17256-y/MediaObjects/11042_2023_17256_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17256-y/MediaObjects/11042_2023_17256_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17256-y/MediaObjects/11042_2023_17256_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17256-y/MediaObjects/11042_2023_17256_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17256-y/MediaObjects/11042_2023_17256_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17256-y/MediaObjects/11042_2023_17256_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17256-y/MediaObjects/11042_2023_17256_Fig8_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17256-y/MediaObjects/11042_2023_17256_Fig9_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17256-y/MediaObjects/11042_2023_17256_Fig10_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17256-y/MediaObjects/11042_2023_17256_Fig11_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17256-y/MediaObjects/11042_2023_17256_Fig12_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17256-y/MediaObjects/11042_2023_17256_Fig13_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17256-y/MediaObjects/11042_2023_17256_Fig14_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17256-y/MediaObjects/11042_2023_17256_Fig15_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17256-y/MediaObjects/11042_2023_17256_Fig16_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17256-y/MediaObjects/11042_2023_17256_Fig17_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17256-y/MediaObjects/11042_2023_17256_Fig18_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17256-y/MediaObjects/11042_2023_17256_Fig19_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17256-y/MediaObjects/11042_2023_17256_Fig20_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17256-y/MediaObjects/11042_2023_17256_Fig21_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17256-y/MediaObjects/11042_2023_17256_Fig22_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17256-y/MediaObjects/11042_2023_17256_Fig23_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17256-y/MediaObjects/11042_2023_17256_Fig24_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17256-y/MediaObjects/11042_2023_17256_Fig25_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17256-y/MediaObjects/11042_2023_17256_Fig26_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17256-y/MediaObjects/11042_2023_17256_Fig27_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17256-y/MediaObjects/11042_2023_17256_Fig28_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17256-y/MediaObjects/11042_2023_17256_Fig29_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17256-y/MediaObjects/11042_2023_17256_Fig30_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17256-y/MediaObjects/11042_2023_17256_Fig31_HTML.png)
Similar content being viewed by others
References
Hamroun M, Lajmi S, Nicolas H, Amous I (2019) VISEN: a video interactive retrieval engine based on semantic network in large video collections. In: Proceedings of the 23rd international database applications & engineering symposium, association for computing machinery, New York, USA, IDEAS ’19, pp 1–10. https://doi.org/10.1145/3331076.3331094. Accessed 07 Jan 2023
Chen J, Mao J, Liu Y, Zhang F, Min Z, Ma S (2021). Towards a better understanding of query reformulation behavior in web search. https://doi.org/10.1145/3442381.3450127
Ntirogiannis K, Gatos B, Pratikakis I (2011) Binarization of textual content in video frames. In: 2011 International conference on document analysis and recognition, pp 673–677. https://doi.org/10.1109/ICDAR.2011.141
Christel MG, Hauptmann AG (2005) The use and utility of high-level semantic features in video retrieval. In: Leow WK, Lew MS, Chua TS, Ma WY, Chaisorn L, Bakker EM (eds) Image and video retrieval. Springer, Berlin Heidelberg, pp 134–144
Snoek C, Worring M, Koelma D (2023) Smeulders A (2007) A learned Lexicon-Driven Paradigm for interactive video retrieval. IEEE Trans Multimed 9(2):280–292. https://doi.org/10.1109/TMM.2006.886275 Accessed 19 Jan
Worring M, Snoek C, de Rooij O, Nguyen G, van Balen R, Koelma D (2006) Mediamill: advanced browsing in news video archives. Lect Notes Comput Sci 533–536. Accessed 19 Jan 2023
Vrochidis S, Moumtzidou A, King P, Dimou A, Mezaris V, Kompatsiaris I (2010) VERGE: a video interactive retrieval engine. In: 2010 International workshop on content based multimedia indexing (CBMI), pp 1–6. https://doi.org/10.1109/CBMI.2010.5529884, iSSN: 1949-3991
Hu WM, Xie NH, Li L, Zeng XL, Maybank S (2011) A survey on visual content-based video indexing and retrieval. IEEE Trans Syst Man Cybern Part C Appl Rev 41:797–819. https://doi.org/10.1109/TSMCC.2011.2109710 Recherche Google
Etter D (2009) KB Video Retrieval at TRECVID 2011. https://www.semanticscholar.org/paper/KB-Video-Retrieval-at-TRECVID-2011-Etter/3d454d230f04e396d8d5379a2621689793157cb7. Accessed 19 Jan 2023
Ellouze N, Lammari N, Métais E, Ahmed MB CITOM: approche de construction incrémentale d’une Topic Map multilingue
Rossetto L, Giangreco I, Tănase C, Schuldt H (2017) Multimodal Video Retrieval with the 2017 IMOTION System. In: Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval, Association for Computing Machinery, New York, NY, USA, ICMR ’17, pp 457–460. https://doi.org/10.1145/3078971.3079012. Accessed 19 Jan 2023
Feki I, Anis Ba, Alimi A (2012) New process to identify audio concepts based on binary classifiers encapsulation. Int J Comput Electr Eng 515–518. https://doi.org/10.7763/IJCEE.2012.V4.546
Elleuch N, Zarka M, Feki I, Anis Ba, Alimi A (2010) Regimvid at trecvid2010: semantic indexing. https://doi.org/10.13140/2.1.4395.3607
Elleuch N, Anis Ba, Alimi A (2014) A generic framework for semantic video indexing based on visual concepts/contexts detection. Multimed Tools Appl 74. https://doi.org/10.1007/s11042-014-1955-9
Smeulders A, Worring M, Santini S, Gupta A, Jain R (2000) Content-based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intel 22(12):1349–1380. https://doi.org/10.1109/34.895972. Conference Name: IEEE Transactions on Pattern Analysis and Machine Intelligence
Toriah STM, Ghalwash AZ (2023) Youssif AAA (2018) Semantic-based video retrieval survey. J Comput Commun 6(8):28–44. Number: 8 Publisher: Scientific Research Publishing. https://doi.org/10.4236/jcc.2018.68003 Accessed 07 Jan
Sjoberg M, Viitaniemi V, Koskela M, Laaksonen J () PicSOM Experiments in TRECVID 2009
Slimi J, Mansouri S, Ben Ammar A, Alimi AM (2013a) Video exploration tool based on semantic network. In: Proceedings of the 10th conference on open research areas in information retrieval, LE Centre De Hautes Etudes Internationales D’informatique Documentaire, Paris, FRA, OAIR ’13, pp 213–214
Slimi J, Ben Ammar A, Alimi AM (2013b) Interactive video data visualization system based on semantic organization. In: 2013 11th International workshop on content-based multimedia indexing (CBMI), pp 161–166. https://doi.org/10.1109/CBMI.2013.6576575. iSSN: 1949-3991
Halima MB, Hamroun M, Moussa SB, Alimi AM (2013) An interactive engine for multilingual video browsing using semantic content. https://doi.org/10.48550/arXiv.1308.3225. . Accessed 19 Jan 2023
Zhang Z, Li W, Gurrin C, Smeaton AF (2016) Faceted navigation for browsing large video collection. In: Tian Q, Sebe N, Qi GJ, Huet B, Hong R, Liu X (eds) MultiMedia modeling, Springer International Publishing, Cham, Lecture Notes in Computer Science, pp 412–417. https://doi.org/10.1007/978-3-319-27674-8_42
Galanopoulos D, Markatopoulou F, Mezaris V, Patras I (2017) Concept language models and event-based concept number selection for zero-example event detection. In: Proceedings of the 2017 ACM on international conference on multimedia retrieval, Association for Computing Machinery, New York, USA, ICMR ’17, pp 397–401. https://doi.org/10.1145/3078971.3079043. Accessed 12 Jan 2023
Janwe N, Bhoyar K (2020) Semantic concept based video retrieval using convolutional neural network. SN Appl Sci 2:80. https://doi.org/10.1007/s42452-019-1870-9
Amato F, Greco L, Persia F, Poccia SR, De Santo A (2015) Content-based multimedia retrieval. In: Colace F, De Santo M, Moscato V, Picariello A, Schreiber FA, Tanca L (eds) Data management in pervasive systems, data-centric systems and applications, Springer International Publishing, Cham, pp 291–310. https://doi.org/10.1007/978-3-319-20062-0_14. Accessed 30 Dec 2022
Faudemay P, Seyrat C (1997) Intelligent delivery of personalised video programmes from a video database. In: Database and expert systems applications. 8th International conference, DEXA ’97. Proceedings, pp 172–177. https://doi.org/10.1109/DEXA.1997.617264
Meng L, Tan AH, Xu D (2013) Semi-supervised heterogeneous fusion for multimedia data co-clustering. IEEE Transactions on Knowledge and Data Engineering 26. https://doi.org/10.1109/TKDE.2013.47
Poria S, Chaturvedi I, Cambria E, Hussain A (2016) Convolutional mkl based multimodal emotion recognition and sentiment analysis, pp 439–448. https://doi.org/10.1109/ICDM.2016.0055
Xu J, Huang F, Zhang X, Wang S, Li C, Li Z, He Y (2019) Visual-textual sentiment classification with bi-directional multi-level attention networks. Knowl-Based Syst 178. https://doi.org/10.1016/j.knosys.2019.04.018
Xu J, Huang F, Zhang X, Wang S, Li C, Li Z, He Y (2019) Sentiment analysis of social images via hierarchical deep fusion of content and links. Appl Soft Comput 80. https://doi.org/10.1016/j.asoc.2019.04.010
Huang F, Zhang X, Zhao Z, Xu J, Li Z (2019) Image-text sentiment analysis via deep multimodal attentive fusion. Knowl Based Syst 167:26–37
Yadav A, Vishwakarma D (2020) Sentiment analysis using deep learning architectures: a review. Artif Intell Rev 53:1–51. https://doi.org/10.1007/s10462-019-09794-5
Xu N (2017) Analyzing multimodal public sentiment based on hierarchical semantic attentional network, pp 152–154. https://doi.org/10.1109/ISI.2017.8004895
Chen F, Ji R, Su J, Cao D, Gao Y (2017) Predicting microblog sentiments via weakly supervised multi-modal deep learning. IEEE Trans Multimed PP:1. https://doi.org/10.1109/TMM.2017.2757769
Zhao Z, Zhu H, Xue Z, Liu Z, Tian J, Chua M, Liu M (2019) An image-text consistency driven multimodal sentiment analysis approach for social media. Inf Process Manag 56. https://doi.org/10.1016/j.ipm.2019.102097
Yu J, Jiang J, Xia R (2020) Entity-sensitive attention and fusion network for entity-level multimodal sentiment classification. IEEE/ACM Trans Audio Speech Lang Process 28:429–439. https://doi.org/10.1109/TASLP.2019.2957872
Liu AA, Shao Z, Wong Y, Li J, Yu-Ting S, Kankanhalli M (2019) Lstm-based multi-label video event detection. Multimed Tools Appl 78. https://doi.org/10.1007/s11042-017-5532-x
Shao Z, Han J, Debattista K, Pang Y (2023) Textual context-aware dense captioning with diverse words. IEEE Trans Multimed 1–15. https://doi.org/10.1109/TMM.2023.3241517
Hu X, Gan Z, Wang J, Yang Z, Liu Z, Lu Y, Wang L (2021) Scaling up vision-language pretraining for image captioning. 2022 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 17959–17968
Shao Z, Han J, Marnerides D, Debattista K (2022) Region-object relation-aware dense captioning via transformer. IEEE Trans Neural Netw Learn Syst
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. 2021 IEEE/CVF Int Conf Comput Vision (ICCV), pp 9992–10002
Furnas GW, Landauer TK, Gomez LM, Dumais ST (1987) The vocabulary problem in human-system communication.Commun ACM 30(11):964–71
Maron ME, Kuhns JL (1960) On relevance, probabilistic indexing and information retrieval. J ACM 7:216–244
Rocchio JJ (1971) Relevance feedback in information retrieval. In: Salton G (ed) The Smart retrieval system-experiments in automatic document processing. Prentice-Hall, Englewood Cliffs, NJ, pp 313–323
Jones KS (1971) Automatic keyword classification for information retrieval. https://api.semanticscholar.org/CorpusID:62724133
Rijsbergen CV (1977) A theoretical basis for the use of co-occurrence data in information retrieval. J Doc 33(2):106–119. https://doi.org/10.1108/eb026637
(1986) A non-classical logic for information retrieval. Comput J 29(6):481–485
PORTER M (1982) Implementing a probabilistic information retrieval system
Yu CT, Buckley C, Lam K, Salton G (1983) A generalized term dependence model in information retrieval. Cornell University, Tech. rep
Harman D (1992) Relevance feedback revisited. In: Proceedings of the 15th annual international ACM SIGIR conference on research and development in information retrieval, pp 1–10
(2020) Statista: average number of search terms for online search queries in the united states as of january 2020. https://www.statista.com/statistics/269740/number-of-search-terms-in-internet-research-in-the-us/
keyworddiscovery (2020) Keyword: query size by country. https://www.keyworddiscovery.com/keyword-stats.html
Azad H, Deepak A, Chakraborty C, Abhishek K (2022) Improving query expansion using pseudo-relevant web knowledge for information retrieval. Pattern Recognit Lett 158. https://doi.org/10.1016/j.patrec.2022.04.013
Azad HK, Deepak A (2017) Query expansion techniques for information retrieval: a survey. CoRR abs/1708.00247. http://arxiv.org/abs/1708.00247
Hamid A (2017) Relevance feedback in information retrieval systems
Nguyen HQ, Lam K, Le LT, Pham HH, Tran DQ, Nguyen DB, Le DD, Pham CM, Tong HTT, Dinh DH, Do CD, Doan LT, Nguyen CN, Nguyen BT, Nguyen QV, Hoang AD, Phan HN, Nguyen AT, Ho PH, Ngo DT, Nguyen NT, Nguyen NT, Dao M, Vu V (2020) Vindr-CXR: an open dataset of chest x-rays with radiologist’s annotations. https://doi.org/10.48550/ARXIV.2012.15029. https://arxiv.org/abs/2012.15029
Kermany DS, Zhang K, Goldbaum MH (2018) Labeled optical coherence tomography (oct) and chest x-ray images for classification
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on computer vision and pattern recognition, pp 248–255. https://doi.org/10.1109/CVPR.2009.5206848
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. https://doi.org/10.48550/ARXIV.1412.6980. https://arxiv.org/abs/1412.6980
Lewis JR (1995) IBM computer usability satisfaction questionnaires: psychometric evaluation and instructions for use. Int J Hum Comput Interact 7(1):57–78. Taylor & Francis. https://doi.org/10.1080/10447319509526110
Development and application of a metric on semantic nets | IEEE Journals & Magazine | IEEE Xplore. https://ieeexplore.ieee.org/document/24528. Accessed 19 Jan 2023
Wu Z, Palmer M (1994) Verb semantics and lexical selection. In: 32nd Annual meeting of the association for computational linguistics, Association for Computational Linguistics, Las Cruces, New Mexico, USA, pp 133–138. https://doi.org/10.3115/981732.981751. https://aclanthology.org/P94-1019. Accessed 19 Jan 2023
Resnik P (1995) Using Information content to evaluate semantic similarity in a Taxonomy. https://doi.org/10.48550/arXiv.cmp-lg/9511007. http://arxiv.org/abs/cmp-lg/9511007. Accessed 19 Jan 2023
Jiang JJ, Conrath DW (1997) Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy. https://doi.org/10.48550/arXiv.cmp-lg/9709008. . Accessed 19 Jan 2023
Hamroun M, Lajmi S, Nicolas H, Amous I (2018) ISE: Interactive image search using visual content. In: Proceedings of the 20th international conference on enterprise information systems, SCITEPRESS - science and technology publications, Funchal, Madeira, Portugal, pp 253–261. https://doi.org/10.5220/0006806702530261. http://www.scitepress.org/DigitalLibrary/Link.aspx?doi=10.5220/0006806702530261. Accessed 19 Jan 2023
Kennedy L, Chang S (2007) A reranking approach for context-based concept fusion in video indexing and retrieval, pp 333–340. https://doi.org/10.1145/1282280.1282331
Behmo R, Paragios N, Prinet V (2008) Graph commute times for image representation. In: 2008 IEEE Conference on computer vision and pattern recognition, pp 1–8. ISSN: 1063-6919. https://doi.org/10.1109/CVPR.2008.4587840
Chin J, Diehl V, Norman K (1988) Development of an instrument measuring user satisfaction of the human-computer interface. ACM CHIi’
SUS: A quick and dirty usability scale. https://www.researchgate.net/publication/228593520_SUS_A_quick_and_dirty_usability_scale. Accessed 20 Jan 2023
Brooke J (2013) SUS: a retrospective. J Usability Stud 8:29–40
Rashid U, Viviani M, Pasi G (2016) A graph-based approach for visualizing and exploring a multimedia search result space. Inf Sci 370–371:303–322. https://doi.org/10.1016/j.ins.2016.07.072 Accessed 20 Jan 2023
Belz A, Muscat A, Aberton M, Benjelloun S (2015) Describing spatial relationships between objects in images in English and French. In: Proceedings of the fourth workshop on vision and language, Association for Computational Linguistics, Lisbon, Portugal, pp 104–113. https://doi.org/10.18653/v1/W15-2816. https://aclanthology.org/W15-2816. Accessed 20 Jan 2023
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Hamroun, M., Lajmi, S., Jallouli, M. et al. Efficient text-based query based on multi-level and deep-semantic multimedia indexing and retrieval. Multimed Tools Appl 83, 55811–55850 (2024). https://doi.org/10.1007/s11042-023-17256-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-17256-y