A deep learning-based classification for topic detection of audiovisual documents

Fourati, Manel; Jedidi, Anis; Gargouri, Faiez

doi:10.1007/s10489-022-03938-x

A deep learning-based classification for topic detection of audiovisual documents

Published: 02 August 2022

Volume 53, pages 8776–8798, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Manel Fourati¹,
Anis Jedidi¹ &
Faiez Gargouri¹

363 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

The description of the audiovisual documents aims essentially at providing meaningful and explanatory information about their content. Despite the multiple efforts made by several researchers to extract descriptions, the lack of pertinent semantic descriptions always persists. We introduce, in this paper, a new approach to improve the semantic descriptions of the cinematic audiovisual documents. To ensure a high description level, we combine different sources of information related to the content (the script of the movie and the superposed text of the image). This process is mainly based on a semantic segmentation algorithm. The Structured Topic Model (STM) and the LSCOM Ontology (http://www.ee.columbia.edu/ln/dvmm/lscom/) (Large Scale Concept ontologyMultimedia) are adapted for knowledge and descriptions extraction. Deep classification techniques, such as LSTM (long short-term memory) and softmax regression, are used to classify the generic topics into specific topics. The performance of the developed approach is assessed as follows. First, STM topic is adapted and evaluated using the CMU movie summary corpus. Then, the topics detection and classification processes are applied and their results are compared to those provided by human judgments employing the MoviLens dataset. Finally, quantitative evaluation is performed utilizing the M-VAD (Montreal Video Annotation Dataset) [44] and MPII-MD (large scale movie description datasets) [35] databases. The comparative study proves that the suggested approach outperforms the existing ones in terms of the precision of the obtained topics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Topic and Thematic Description for Movies Documents

A survey on description and modeling of audiovisual documents

Article 15 August 2020

Topic Segmentation of Web Documents with Automatic Cue Phrase Identification and BLSTM-CNN

Notes

References

Antol S, Agrawal A, Lu J, Mitchell M, Batra D, Lawrence Zitnick C, Parikh D (2015) Vqa: Visual question answering. In: Proceedings of the IEEE international conference on computer vision, pp 2425–2433
Atkinson J, Gonzalez A, Munoz M, Astudillo H (2014) Web metadata extraction and semantic indexing for learning objects extraction. Appl Intell 41(2):649–664
Article Google Scholar
Ballan L, Bertini M, Del Bimbo A, Seidenari L, Serra G (2011) Event detection and recognition for semantic annotation of video. Multimed Tools Appl 51(1):279–302
Article Google Scholar
Basu S, Yu Y, Singh VK, Zimmermann R (2016) Videopedia: Lecture video recommendation for educational blogs using topic modeling. Springer, Cham, pp 238–250
Google Scholar
Bellegarda JR (1997) A latent semantic analysis framework for large-span language modeling. In: EUROSPEECH
Ben-Ahmed O, Huet B (2018) Deep multimodal features for movie genre and interestingness prediction. In: 2018 International conference on content-based multimedia indexing (CBMI). IEEE, pp 1–6
Bougiatiotis K, Giannakopoulos T (2016) Content representation and similarity of movies based on topic extraction from subtitles. In: Proceedings of the 9th Hellenic conference on artificial intelligence. ACM, pp 1–7
Chang X, Yang Y, Hauptmann A, Xing EP, Yu YL (2015) Semantic concept discovery for large-scale zero-shot event detection. In: Twenty-fourth international joint conference on artificial intelligence
Chen D, Dolan WB (2011) Collecting highly parallel data for paraphrase evaluation. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, pp 190–200
Chen X, Zou D, Cheng G, Xie H (2020) Detecting latent topics and trends in educational technologies over four decades using structural topic modeling: a retrospective of all volumes of computers & education. Comput Educ 151(103):855
Google Scholar
Dascalu M, Dessus P, Trausan-matu S, Bianco M, Nardy A (2013) Readerbench, an environment for analyzing text complexity and reading strategies. In: Artif Intell Educ. Springer, pp 379–388
Denkowski M, Lavie A (2014) Meteor universal: Language specific translation evaluation for any target language. In: Proceedings of the ninth workshop on statistical machine translation, pp 376–380
Fang Z, Liu J, Li Y, Qiao Y, Lu H (2019) Improving visual question answering using dropout and enhanced question encoder. Pattern Recogn 90:404–414
Article Google Scholar
Fourati M, Jedidi A, Gargouri F (2017) Generic descriptions for movie document: an experimental study. In: 2017 IEEE/ACS 14Th international conference on computer systems and applications (AICCSA). IEEE, pp 766–773
Fourati M, Jedidi A, Gargouri F (2020) A survey on description and modeling of audiovisual documents. Multimed Tools Appl 79(45):33,519–33, 546
Article Google Scholar
Fourati M, Jedidi A, Hassin HB, Gargouri F (2015) Towards fusion of textual and visual modalities for describing audiovisual documents. Inter J Multimed Data Eng Manag (IJMDEM) 6(2):52–70
Article Google Scholar
Gan Z, Gan C, He X, Pu Y, Tran K, Gao J, Carin L, Deng L (2017) Semantic compositional networks for visual captioning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5630–5639
Gharbi H, Bahroun S, Zagrouba E (2019) Key frame extraction for video summarization using local description and repeatability graph clustering. SIViP 13(3):507–515
Article Google Scholar
Hamroun M, Tamine K, Crespin B (2021) Multimodal video indexing (mvi): A new method based on machine learning and semi-automatic annotation on large video collections. International Journal of Image and Graphics p 2250022
Hao X, Zhou F, Li X (2020) Scene-edge gru for video caption. In: 2020 IEEE 4Th information technology, networking, electronic and automation control conference (ITNEC). IEEE, vol 1, pp 1290–1295
Harispe S, Senchez D, Ranwez S, Janaqi S, Montmain J (2014) A framework for unifying ontology-based semantic similarity measures: a study in the biomedical domain. J Biomed Inf 48:38–53
Article Google Scholar
He Y, Li Y, Lei J, Leung C (2016) A framework of query expansion for image retrieval based on knowledge base and concept similarity. Neurocomputing - Inpress
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
Huang Q, Xiong Y, Rao A, Wang J, Lin D (2020) Movienet: a holistic dataset for movie understanding. In: Computer vision–ECCV 2020: 16th european conference, glasgow, UK, August 23–28, 2020, Proceedings, Part IV 16. Springer, pp 709–727
Jelodar H, Wang Y, Yuan C, Feng X, Jiang X, Li Y, Zhao L (2019) Latent dirichlet allocation (lda) and topic modeling: models, applications, a survey. Multimed Tools Appl 78(11):15,169–15,211
Article Google Scholar
Li L, Tang S, Zhang Y, Deng L, Tian Q (2017) Gla: Global–local attention for image description. IEEE Trans Multimedia 20(3):726–737
Article Google Scholar
Li X, Zhang J, Ouyang J (2019) Dirichlet multinomial mixture with variational manifold regularization: Topic modeling over short texts. In: Proceedings of the AAAI Conference on artificial intelligence, vol 33, pp 7884–7891
Lin CY (2004) Rouge: a package for automatic evaluation of summaries. In: Text summarization branches out, pp 74–81
Luo B, Li H, Meng F, Wu Q, Huang C (2017) Video object segmentation via global consistency aware query strategy. IEEE Trans Multimed 19(7):1482–1493
Article Google Scholar
Matthews P (2019) Human-in-the-loop topic modelling: Assessing topic labelling and genre-topic relations with a movie plot summary corpus. In: The human position in an artificial world: creativity, ethics and AI in knowledge organization. Ergon-verlag, pp 181–207
Matthews P, Glitre K (2021) Genre analysis of movies using a topic model of plot summaries. J Assoc Inf Sci 72:1–17
Google Scholar
Mocanu B, Tapu R, Tapu E (2016) Video retrieval using relevant topics extraction from movie subtitles. In: 12Th IEEE international symposium on electronics and telecommunications (ISETC), 2016. IEEE, pp 327–330
Papineni K, Roukos S, Ward T, Zhu WJ (2002) Bleu:, a method for automatic evaluation of machine translation. p 311–318
Roberts ME, Stewart BM, Tingley D (2019) Stm: an r package for structural topic models. J Stat Softw 91(1):1–40
Google Scholar
Rohrbach A, Rohrbach M, Tandon N, Schiele B (2015) A dataset for movie description. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3202–3212
Rotman D, Porat D, Ashour G (2016) Robust and efficient video scene detection using optimal sequential grouping. In: 2016 IEEE International symposium on multimedia (ISM). IEEE, pp 275–280
Rotman D, Porat D, Ashour G (2017) Robust video scene detection using multimodal fusion of optimally grouped features. In: 2017 IEEE 19Th international workshop on multimedia signal processing (MMSP). IEEE, pp 1–6
Sadique MF, Rahman MA, Haque SR (2020) Content based unsupervised video summarization using birds foraging search. In: 2020 11Th international conference on computing, communication and networking technologies (ICCCNT). IEEE, pp 1–7
Sanchez-Nielsen E, Chavez-Gutierrez F, Lorenzo-Navarro J (2019) A semantic parliamentary multimedia approach for retrieval of video clips with content understanding. Multimedia Systems 25:337–354
Article Google Scholar
Shah R, Zimmermann R (2017) Multimodal analysis of user-generated multimedia content. Springer
Song J, Guo Y, Gao L, Li X, Hanjalic A, Shen HT (2018) From deterministic to generative: Multimodal stochastic rnns for video captioning. IEEE Trans Neural Netw Learn Syst 30(10):3047–3058
Article Google Scholar
Stappen L, Baird A, Cambria E, Schuller BW (2021) Sentiment analysis and topic recognition in video transcriptions. IEEE Intell Syst 36(2):88–95
Article Google Scholar
Tang P, Wang C, Wang X, Liu W, Zeng W, Wang J (2019) Object detection in videos by high quality object linking. IEEE Trans Pattern Anal Mach Intell 42(5):1272–1278
Article Google Scholar
Torabi A, Pal C, Larochelle H, Courville A (2015) Using descriptive video services to create a large data source for video annotation research. CoRR:1503.01070, p 1–7
Trojahn TH, Goularte R (2021) Temporal video scene segmentation using deep-learning. Multimed Tools Appl 80(12):17, 487–17, 513
Article Google Scholar
Tsai WL (2021) A cooperative mechanism for managing multimedia project documentation. Multimedia Tools and Applications, p 1–14
Vedantam R, Lawrence Zitnick C, Parikh D (2015) Cider: Consensus-based image description evaluation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4566–4575
Wang H, Gao C, Han Y (2020) Sequence in sequence for video captioning. Pattern Recogn Lett 130:327–334
Article Google Scholar
Xu J, Mei T, Yao T, Rui Y (2016) Msr-vtt: a large video description dataset for bridging video and language. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5288–5296
Yang H, Meinel C (2014) Content based lecture video retrieval using speech and video text information. IEEE Trans Learn Technol 7(2):142–154
Article Google Scholar
Yang Y, Zhou J, Ai J, Bin Y, Hanjalic A, Shen HT, Ji Y (2018) Video captioning by adversarial lstm. IEEE Trans Image Process 27(11):5600–5611
Article MathSciNet Google Scholar
Ye G, Li Y, Xu H, Liu D, Chang SF (2015) Eventnet: a large scale structured concept library for complex event detection in video. In: Proceedings of the 23rd ACM international conference on Multimedia. ACM, pp 471–480
Zhao B, Li X, Lu X (2019) Cam-rnn: Co-attention model based rnn for video captioning. IEEE Trans Image Process 28(11):5552–5565
Article MathSciNet MATH Google Scholar
Zhou W, Li H, Tian Q (2017) Recent advance in content-based image retrieval: A literature survey. arXiv:1706.06064

Download references

Author information

Authors and Affiliations

MIR@CL Laboratory, University of Sfax, Sfax, Tunisia
Manel Fourati, Anis Jedidi & Faiez Gargouri

Authors

Manel Fourati
View author publications
You can also search for this author in PubMed Google Scholar
Anis Jedidi
View author publications
You can also search for this author in PubMed Google Scholar
Faiez Gargouri
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Manel Fourati.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fourati, M., Jedidi, A. & Gargouri, F. A deep learning-based classification for topic detection of audiovisual documents. Appl Intell 53, 8776–8798 (2023). https://doi.org/10.1007/s10489-022-03938-x

Download citation

Accepted: 24 June 2022
Published: 02 August 2022
Issue Date: April 2023
DOI: https://doi.org/10.1007/s10489-022-03938-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A deep learning-based classification for topic detection of audiovisual documents

Abstract

Access this article

Similar content being viewed by others

Topic and Thematic Description for Movies Documents

A survey on description and modeling of audiovisual documents

Topic Segmentation of Web Documents with Automatic Cue Phrase Identification and BLSTM-CNN

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A deep learning-based classification for topic detection of audiovisual documents

Abstract

Access this article

Similar content being viewed by others

Topic and Thematic Description for Movies Documents

A survey on description and modeling of audiovisual documents

Topic Segmentation of Web Documents with Automatic Cue Phrase Identification and BLSTM-CNN

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation