An approach for exploring a video via multimodal feature extraction and user interactions

Salim, Fahim A.; Haider, Fasih; Conlan, Owen; Luz, Saturnino

doi:10.1007/s12193-018-0268-0

An approach for exploring a video via multimodal feature extraction and user interactions

Original Paper
Published: 13 July 2018

Volume 12, pages 285–296, (2018)
Cite this article

Journal on Multimodal User Interfaces Aims and scope Submit manuscript

Fahim A. Salim¹,
Fasih Haider²,
Owen Conlan¹ &
…
Saturnino Luz²

502 Accesses
7 Citations
3 Altmetric
Explore all metrics

Abstract

Exploring the content of a video is typically inefficient due to the linear streamed nature of its media and the lack of interactivity. Video may be seen as a combination of a set of features, the visual track, the audio track and transcription of the spoken words, etc. These features may be viewed as a set of temporally bounded parallel modalities. It is our contention that together these modalities and derived features have the potential to be presented individually or in discrete combination, to allow deeper and effective content exploration within different parts of a video in an interactive manner. A novel system for video exploration by offering video content as an alternative representation is proposed. The proposed system represents the extracted multimodal features as an automatically generated interactive multimedia webpage. This paper also presents a user study conducted to learn its (proposed system) usage patterns. The learned usage patterns may be utilized to build a template driven representation engine that uses the features to offer a multimodal synopsis of video that may lead to efficient exploration of video content.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Alternative Approach to Exploring a Video

VERGE: A Multimodal Interactive Search Engine for Video Browsing and Retrieval

From Artifact to Content Source: Using Multimodality in Video to Support Personalized Recomposition

Notes

https://techcrunch.com/2017/02/28/people-now-watch-1-billion-hours-of-youtube-per-day/—last verified: October 2017.

References

autosummarizer.com (2016) http://autosummarizer.com/
Belo L, Caetano C, do Patrocínio Z, Guimarães SJ (2016) Summarizing video sequence using a graph-based hierarchical approach. Neurocomputing 173:1001–1016. https://doi.org/10.1016/j.neucom.2015.08.057
Article Google Scholar
Bouamrane MM, King D, Luz S, Masoodian M (2004) A framework for collaborative writing with recording and post-meeting retrieval capabilities. In: Proceedings of the sixth international workshop on collaborative editing systems, Chicago, November 6, 2004. IEEE distributed systems online journal on collaborative computing
Bouamrane MM, Luz S (2007) An analytical evaluation of search by content and interaction patterns on multimodal meeting records. Multimed Syst 13(2):89–103. https://doi.org/10.1007/s00530-007-0087-8
Article Google Scholar
Bradski G (2000) The OpenCV Library. Dr. Dobbs J Softw Tools 120:122–125
Google Scholar
Calumby RT, André M, Torres S (2017) Neurocomputing diversity-based interactive learning meets multimodality. Neurocomputing 259:159–175. https://doi.org/10.1016/j.neucom.2016.08.129
Article Google Scholar
Chen F, De Vleeschouwer C, Cavallaro A (2014) Resource allocation for personalized video summarization. IEEE Trans Multimed 16(2):455–469. https://doi.org/10.1109/TMM.2013.2291967
Article Google Scholar
Choi FYY (2000) Advances in domain independent linear text segmentation. In: Proceedings of NAACL 2000, Stroudsburg, PA, USA, pp 26–33
Cobârzan C, Schoeffmann K, Bailer W, Hürst W, Blažek A, Lokoč J, Vrochidis S, Barthel KU, Rossetto L (2017) Interactive video search tools: a detailed analysis of the video browser showdown 2015. Multimed Tools Appl 76(4):5539–5571. https://doi.org/10.1007/s11042-016-3661-2
Article Google Scholar
Craig CL, Friehs CG (2013) Video and HTML: testing online tutorial formats with biology students. J Web Librariansh 7(3):292–304. https://doi.org/10.1080/19322909.2013.815112
Article Google Scholar
Dong A, Li H (2008) Ontology-driven annotation and access of presentation video data. Estudios de Economía Aplicada 26(2):840–860
Google Scholar
Evangelopoulos G, Zlatintsi A, Potamianos A, Maragos P, Rapantzikos K, Skoumas G, Avrithis Y (2013) Multimodal saliency and fusion for movie summarization based on aural, visual, and textual attention. IEEE Trans Multimed 15(7):1553–1568. https://doi.org/10.1109/TMM.2013.2267205
Article Google Scholar
Farhadi B, Ghaznavi-Ghoushchi MB (2013) Creating a novel semantic video search engine through enrichment textual and temporal features of subtitled YouTube media fragments. In: Proceedings of the 3rd international conference on computer and knowledge engineering, ICCKE 2013 (Iccke), pp 64–72 https://doi.org/10.1109/ICCKE.2013.6682857
Freeland C (2013) The rise of the new global super-rich. https://www.ted.com/talks/chrystia_freeland_the_rise_of_the_new_global_super_rich
Galuščáková P, Saleh S, Pecina P (2016) SHAMUS: UFAL search and hyperlinking multimedia system. Springer, Cham, pp 853–856. https://doi.org/10.1007/978-3-319-30671-1_80
Book Google Scholar
Ganier F, de Vries P (2016) Are instructions in video format always better than photographs when learning manual techniques? The case of learning how to do sutures. Learn Instr 44:87–96. https://doi.org/10.1016/j.learninstruc.2016.03.004
Article Google Scholar
Girgensohn A, Marlow J, Shipman F, Wilcox L (2015) HyperMeeting: supporting asynchronous meetings with hypervideo. In: Proceedings of the 23rd annual ACM Conference on multimedia conference, pp 611–620. https://doi.org/10.1145/2733373.2806258
Haesen M, Meskens J, Luyten K, Coninx K, Becker J, Tuytelaars T, Poulisse G, Pham T, Moens M (2011) Finding a needle in a haystack: an interactive video archive explorer for professional video searchers. Multimed Tools Appl 63(2):331–356. https://doi.org/10.1007/s11042-011-0809-y
Article Google Scholar
Halvey M, Vallet D, Hannah D, Jose JM (2014) Supporting exploratory video retrieval tasks with grouping and recommendation. Inf Process Manag 50(6):876–898. https://doi.org/10.1016/j.ipm.2014.06.004
Article Google Scholar
Hosseini MS, Eftekhari-Moghadam AM (2013) Fuzzy rule-based reasoning approach for event detection and annotation of broadcast soccer video. Appl Soft Comput 13(2):846–866. https://doi.org/10.1016/j.asoc.2012.10.007
Article Google Scholar
Hudelist MA, Schoeffmann K, Xu Q (2015) Improving interactive known-item search in video with the keyframe navigation tree. Springer, Cham, pp 306–317
Google Scholar
Lei P, Sun C, Lin S, Huang T (2015) Effect of metacognitive strategies and verbal-imagery cognitive style on biology-based video search and learning performance. Comput Educ 87:326–339. https://doi.org/10.1016/j.compedu.2015.07.004
Article Google Scholar
Lienhart R, Kuranov A, Pisarevsky V (2003) Empirical analysis of detection cascades of boosted classifiers for rapid object detection. In: Proceedings of the 25th DAGM pattern recognition symposium, pp 297–304. https://doi.org/10.1007/978-3-540-45243-0_39
Google Scholar
Luz S, Masoodian M (2004) A mobile system for non-linear access to time-based data. In: Proceedings of the working conference on advanced visual interfaces, ACM, pp 454–457
Manning C, Surdeanu M, Bauer J, Finkel J, Bethard S, McClosky D (2014) The Stanford CoreNLP natural language processing toolkit. In: ACL system demos, pp 55–60
Marchionini G (2006) Exploratory search: from finding to understanding. Commun ACM 49(4):41–46. https://doi.org/10.1145/1121949.1121979
Article Google Scholar
Marchionini G (2006) From finding to understanding. Commun ACM 49(4):41–46
Article Google Scholar
Matejka J, Grossman T, Fitzmaurice G (2014) Video lens : rapid playback and exploration of large video collections and associated metadata. In: Proceedings of UIST’14, pp 541–550. https://doi.org/10.1145/2642918.2647366
Merkt M, Schwan S (2014) Training the use of interactive videos: effects on mastering different tasks. Instr Sci 42(3):421–441. https://doi.org/10.1007/s11251-013-9287-0
Article Google Scholar
Moumtzidou A, Avgerinakis K, Apostolidis E, Aleksić V, Markatopoulou F, Papagiannopoulou C, Vrochidis S, Mezaris V, Busch R, Kompatsiaris I (2014) VERGE: an interactive search engine for browsing video collections. Springer, Cham, pp 411–414
Google Scholar
Nautiyal A, Kenny E, Dawson-Howe K (2014) Video adaptation for the creation of advanced intelligent content for conferences. In: Irish machine vision and image processing conference, pp 122–127
Pavel A, Reed C, Hartmann B, Agrawala M (2014) Video digests: a browsable, skimmable format for informational lecture videos. In: Symposium on user interface software and technology, USA, pp 573–582. https://doi.org/10.1145/2642918.2647400
Piketty T (2014) New thoughts on capital in the twenty-first century. https://www.ted.com/talks/thomas_piketty_new_thoughts_on_capital_in_the_twenty_first_century
Rafailidis D, Manolopoulou S, Daras P (2013) A unified framework for multimodal retrieval. Pattern Recognit 46(12):3358–3370. https://doi.org/10.1016/j.patcog.2013.05.023
Article Google Scholar
Ratinov L, Roth D (2009) Design challenges and misconceptions in named entity recognition. In: Proceedings of CoNLL ’09, ACL, Stroudsburg, pp 147–155
Rogers Y (2012) HCI theory: classical, modern, and contemporary, vol 5. Morgan & Claypool Publishers, San Rafael
Google Scholar
Salim FA, Haider F, Conlan O, Luz S (2017) An alternative approach to exploring a video. In: Karpov A, Potapova R, Mporas I (eds) Speech and computer. Springer, Cham, pp 109–118
Chapter Google Scholar
Schoeffmann K, Taschwer M, Boeszoermenyi L (2010) The video explorer a tool for navigation and searching within a single video based on fast content analysis. In: Proceedings of the ACM conference on Multimedia systems, pp 247–258. https://doi.org/10.1145/1730836.1730867
Shipman F, Girgensohn A, Wilcox L (2008) Authoring, viewing, and generating hypervideo. ACM Trans Multimed Comput Commun Appl 5(2):1–19. https://doi.org/10.1145/1413862.1413868
Article Google Scholar
Steinbock D (2016) http://tagcrowd.com/
Tian Q, Sebe N, Qi GJ, Huet B, Hong R, Liu X (2016) MultiMedia modeling. 22nd international conference, MMM 2016 Miami, FL, USA, January 4–6, 2016 proceedings, part I. Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics) vol 9516, pp 382–394. https://doi.org/10.1007/978-3-319-27671-7
Google Scholar
Tonndorf K, Handschigl C, Windscheid J, Kosch H, Granitzer M (2015) The effect of non-linear structures on the usage of hypervideo for physical training. IN: Proceedings—IEEE international conference on multimedia and expo, August 2015. https://doi.org/10.1109/ICME.2015.7177378
Waitelonis J, Sack H (2012) Towards exploratory video search using linked data. Multimed Tools Appl 59(2):645–672. https://doi.org/10.1007/s11042-011-0733-1
Article Google Scholar
Zhang H, Liu Y, Ma Z (2013) Fusing inherent and external knowledge with nonlinear learning for cross-media retrieval. Neurocomputing 119:10–16. https://doi.org/10.1016/j.neucom.2012.03.033
Article Google Scholar

Download references

Author information

Authors and Affiliations

ADAPT Centre, Trinity College Dublin, Dublin, Ireland
Fahim A. Salim & Owen Conlan
Usher Institute, University of Edinburgh, Edinburgh, UK
Fasih Haider & Saturnino Luz

Authors

Fahim A. Salim
View author publications
You can also search for this author inPubMed Google Scholar
Fasih Haider
View author publications
You can also search for this author inPubMed Google Scholar
Owen Conlan
View author publications
You can also search for this author inPubMed Google Scholar
Saturnino Luz
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Fahim A. Salim.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This research is supported by The ADAPT Centre for Digital Content Technology. ADAPT is funded under the SFI Research Centres Programme (Grant 13/RC/2106) and is co-funded under the European Regional Development Fund at School of Computer Science and Statistics, Trinity College Dublin, Ireland and EU H2020 project SAAM under Grant No. 769661 at the University of Edinburgh, UK and Science Foundation Ireland (Grant 12/CE/I2267).

Appendix

See Fig. 8.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Salim, F.A., Haider, F., Conlan, O. et al. An approach for exploring a video via multimodal feature extraction and user interactions. J Multimodal User Interfaces 12, 285–296 (2018). https://doi.org/10.1007/s12193-018-0268-0

Download citation

Received: 28 October 2017
Accepted: 28 June 2018
Published: 13 July 2018
Issue Date: December 2018
DOI: https://doi.org/10.1007/s12193-018-0268-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An approach for exploring a video via multimodal feature extraction and user interactions

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An Alternative Approach to Exploring a Video

VERGE: A Multimodal Interactive Search Engine for Video Browsing and Retrieval

From Artifact to Content Source: Using Multimodality in Video to Support Personalized Recomposition

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now