research-article

Interactive Search or Sequential Browsing? A Detailed Analysis of the Video Browser Showdown 2018

Authors:

Gregor Kovalčík,

Klaus Schöffmann,

Stefanos Vrochidis,

Phuong Anh Nguyen,

Sitapa Rujikietgumjorn,

Kai Uwe BarthelAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 15, Issue 1

Article No.: 29, Pages 1 - 18

https://doi.org/10.1145/3295663

Published: 13 February 2019 Publication History

Abstract

This work summarizes the findings of the 7th iteration of the Video Browser Showdown (VBS) competition organized as a workshop at the 24th International Conference on Multimedia Modeling in Bangkok. The competition focuses on video retrieval scenarios in which the searched scenes were either previously observed or described by another person (i.e., an example shot is not available). During the event, nine teams competed with their video retrieval tools in providing access to a shared video collection with 600 hours of video content. Evaluation objectives, rules, scoring, tasks, and all participating tools are described in the article. In addition, we provide some insights into how the different teams interacted with their video browsers, which was made possible by a novel interaction logging mechanism introduced for this iteration of the VBS. The results collected at the VBS evaluation server confirm that searching for one particular scene in the collection when given a limited time is still a challenging task for many of the approaches that were showcased during the event. Given only a short textual description, finding the correct scene is even harder. In ad hoc search with multiple relevant scenes, the tools were mostly able to find at least one scene, whereas recall was the issue for many teams. The logs also reveal that even though recent exciting advances in machine learning narrow the classical semantic gap problem, user-centric interfaces are still required to mediate access to specific content. Finally, open challenges and lessons learned are presented for future VBS events.

References

[1]

Elasticsearch: RESTful, Distributed Search 8 Analytics. Home Page. Retrieved March 30, 2018, from https://www.elastic.co/products/elasticsearch.

[2]

NearPy. Home Page. Retrieved March 30, 2018, from https://github.com/pixelogik/NearPy.

[3]

Giuseppe Amato, Fabrizio Falchi, Claudio Gennaro, and Fausto Rabitti. 2017. Searching and annotating 100M Images with YFCC100M-HNfc6 and MI-File. In Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing (CMBI’17). 26:1--26:4.

Digital Library

[4]

George Awad, Asad Butt, Jonathan Fiscus, Martial Michel, David Joy, Wessel Kraaij, et al. 2017. TRECVID 2017: Evaluating ad-hoc and instance video search, events detection, video captioning and hyperlinking. In Proceedings of the 17th AnnualTREC Video Retrieval Evaluation (TRECVID’17).

[5]

Kai Uwe Barthel and Nico Hezel. 2018. Visually exploring millions of images using image maps and graphs. In Big Data Analytics for Large-Scale Multimedia Search, B. Huet, S. Vrochidis, and E. Chang (Eds.). John Wiley 8 Sons, New Jersey, 251--275.

[6]

Kai Uwe Barthel, Nico Hezel, and Radek Mackowiak. 2015. Graph-based browsing for large video collections. In MultiMedia Modeling, X. He, S. Luo, D. Tao, C. Xu, J. Yang, and M. A. Hasan (Eds.). Springer International Publishing, Cham, Switzerland, 237--242.

[7]

Kai Uwe Barthel, Nico Hezel, and Radek Mackowiak. 2016. Navigating a graph of scenes for exploring large video collections. In MultiMedia Modeling, Q. Tian, N. Sebe, G.-J. Qi, B. Huet, R. Hong, and X. Liu (Eds.). Springer International Publishing, Cham, Switzerland, 418--423.

[8]

Claudiu Cobârzan, Klaus Schoeffmann, Werner Bailer, Wolfgang Hürst, Adam Blažek, Jakub Lokoč, et al. 2017. Interactive video search tools: A detailed analysis of the Video Browser Showdown 2015. Multimedia Tools and Applications 76, 4, 5539--5571.

Digital Library

[9]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’09). IEEE, Los Alamitos, CA, 248--255.

[10]

Ivan Giangreco and Heiko Schuldt. 2016. ADAM pro: Database support for big multimedia retrieval. Datenbank-Spektrum 16, 1, 17--26.

[11]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Identity mappings in deep residual networks. arXiv:1603.05027. http://arxiv.org/abs/1603.05027.

[12]

Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning. 448--456.

Digital Library

[13]

Melody Y. Ivory and Marti A. Hearst. 2001. The state of the art in automating usability evaluation of user interfaces. ACM Computing Surveys 33, 4, 470--516.

Digital Library

[14]

Justin Johnson, Andrej Karpathy, and Fei-Fei Li. 2015. DenseCap: Fully convolutional localization networks for dense captioning. arXiv:1511.07571. http://arxiv.org/abs/1511.07571.

[15]

Teuvo Kohonen. 1998. The self-organizing map. Neurocomputing 21, 1-3, 1--6.

[16]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097--1105.

Digital Library

[17]

Martha Larson, Mohammad Soleymani, Guillaume Gravier, Bogdan Ionescu, and Gareth J. F. Jones. 2017. The benchmarking initiative for multimedia evaluation: MediaEval 2016. IEEE MultiMedia 24, 1, 93--96.

[18]

Andreas Leibetseder, Sabrina Kletz, and Klaus Schoeffmann. 2018. Sketch-based similarity search for collaborative feature maps. In MultiMedia Modeling, K. Schoeffmann, T. H. Chalidabhongse, C. W. Ngo, S. Aramvith, N. E. O’Connor, Y.-S. Ho, et al. (Eds.). Springer International Publishing, Cham, Switzerland, 425--430.

[19]

Michael S. Lew, Nicu Sebe, Chabane Djeraba, and Ramesh Jain. 2006. Content-based multimedia information retrieval: State of the art and challenges. ACM Transactions on Multimedia Computing, Communications, and Applications 2, 1, 1--19.

Digital Library

[20]

N. Liu and J. Han. 2016. DHSNet: Deep hierarchical saliency network for salient object detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 678--686.

[21]

J. Lokoč, W. Bailer, K. Schoeffmann, B. Muenzer, and G. Awad. 2018. On influential trends in interactive video retrieval: Video Browser Showdown 2015-2017. IEEE Transactions on Multimedia 20, 12, 3361--3376.

[22]

Jakub Lokoč, Gregor Kovalčík, and Tomáš Souček. 2018. Revisiting SIRET video retrieval tool. In Proceedings of the 24th International Conference on Multimedia Modeling (MMM’18), Part II. 419--424.

[23]

Jakub Lokoč, Tomáš Souček, and Gregor Kovalčík. 2018. Using an interactive video retrieval tool for lifelog data. In Proceedings of the 2018 ACM Workshop on the Lifelog Search Challenge (LSC’18). ACM, New York, NY, 15--19.

Digital Library

[24]

Yi-Jie Lu, Phuong Anh Nguyen, Hao Zhang, and Chong-Wah Ngo. 2017. Concept-based interactive search system. In MultiMedia Modeling, K. Schoeffmann, T. H. Chalidabhongse, C. W. Ngo, S. Aramvith, N.E. O’Connor, Y.-S. Ho, et al. (Eds.). Springer International Publishing, Cham, Switzerland, 463--468.

[25]

Michael McCandless, Erik Hatcher, and Otis Gospodnetic. 2010. Lucene in Action, Second Edition: Covers Apache Lucene 3.0. Manning Publications, Greenwich, CT.

Digital Library

[26]

Phuong Anh Nguyen, Yi-Jie Lu, Hao Zhang, and Chong-Wah Ngo. 2018. Enhanced VIREO KIS at VBS 2018. In MultiMedia Modeling, K. Schoeffmann, T. H. Chalidabhongse, C. W. Ngo, S. Aramvith, N. E. O’Connor, Y.-S. Ho, et al. (Eds.). Springer International Publishing, Cham, Switzerland, 407--412.

[27]

Manfred Jürgen Primus, Bernd Münzer, Andreas Leibetseder, and Klaus Schoeffmann. 2018. The ITEC collaborative video search system at the Video Browser Showdown 2018. In MultiMedia Modeling, K. Schoeffmann, T. H. Chalidabhongse, C. W. Ngo, S. Aramvith, N. E. O’Connor, Y.-S. Ho, et al. (Eds.). Springer International Publishing, Cham, Switzerland, 438--443.

[28]

Marek Rogozinski Rafal Kuc. 2013. Mastering ElasticSearch. Packt Publishing.

Digital Library

[29]

Joseph Redmon and Ali Farhadi. 2016. YOLO9000: Better, faster, stronger. arXiv:1612.08242. http://arxiv.org/abs/1612.08242

[30]

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Neural Information Processing Systems (NIPS).

Digital Library

[31]

Luca Rossetto, Ivan Giangreco, Ralph Gasser, and Heiko Schuldt. 2018. Competitive video retrieval with vitrivr. In Proceedings of the 24th International Conference on MultiMedia Modeling (MMM’18), Part II. 403--406.

[32]

Luca Rossetto, Ivan Giangreco, and Heiko Schuldt. 2014. Cineast: A multi-feature sketch-based video retrieval engine. In Proceedings of the 2014 IEEE International Symposium on Multimedia. 18--23.

Digital Library

[33]

Luca Rossetto, Ivan Giangreco, Claudiu Tanase, and Heiko Schuldt. 2016. Vitrivr: A flexible retrieval stack supporting multiple query modes for searching in multimedia collections. In Proceedings of the 2016 ACM Conference on Multimedia (MM’16). ACM, New York, NY, 1183--1186.

Digital Library

[34]

Sitapa Rujikietgumjorn, Nattachai Watcharapinchai, and Sanparith Marukatat. 2018. Sloth search system. In Proceedings of the 24th International Conference on MultiMedia Modeling (MMM’18), Part II. 431--437.

[35]

Klaus Schoeffmann. 2014. A user-centric media retrieval competition: The Video Browser Showdown 2012-2014. IEEE MultiMedia 21, 4, 8--13.

[36]

Klaus Schoeffmann, Frank Hopfgartner, Oge Marques, Laszlo Boeszoermenyi, and Joemon M. Jose. 2010. Video browsing interfaces and applications: A review. SPIE Reviews 1, 1, 018004.

[37]

Klaus Schoeffmann, Marco A. Hudelist, and Jochen Huber. 2015. Video interaction tools: A survey of recent work. ACM Computing Surveys 48, 1, Article 14 (Sept. 2015), 34 pages.

Digital Library

[38]

Klaus Schoeffmann, Manfred Jürgen Primus, Bernd Muenzer, Stefan Petscharnig, Christof Karisch, Qing Xu, et al. 2017. Collaborative Feature Maps for Interactive Video Search. Springer International Publishing, Cham, Switzerland, 457--462.

[39]

Mei-Ling Shyu, Zongxing Xie, Min Chen, and Shu-Ching Chen. 2008. Video semantic event/concept detection using a subspace-based multimedia data mining framework. IEEE Transactions on Multimedia 10, 2, 252--259.

Digital Library

[40]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arxiv:1409.1556. http://arxiv.org/abs/1409.1556.

[41]

R. Smith. 2007. An overview of the tesseract OCR engine. In Proceedings of the 9th International Conference on Document Analysis and Recognition—Volume 02 (ICDAR’07). IEEE, Los Alamitos, CA, 629--633.

Digital Library

[42]

Cees G. M. Snoek and Marcel Worring. 2005. Multimedia event-based video indexing using time intervals. IEEE Transactions on Multimedia 7, 4, 638--647.

Digital Library

[43]

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott E. Reed, Dragomir Anguelov, et al. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 1--9.

[44]

Thanh-Dat Truong, Vinh-Tiep Nguyen, Minh-Triet Tran, Trang-Vinh Trieu, Tien Do, Thanh Duc Ngo, et al. 2018. Video search based on semantic extraction and locally regional object proposal. In MultiMedia Modeling, K. Schoeffmann, T. H. Chalidabhongse, C. W. Ngo, S. Aramvith, N. E. O’Connor, Y.-S. Ho, et al. (Eds.). Springer International Publishing, Cham, Switzerland, 451--456.

[45]

O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. 2017. Show and tell: Lessons learned from the 2015 mscoco image captioning challenge. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 4, 652--663.

Digital Library

[46]

Marcel Worring, Paul Sajda, Simone Santini, David A. Shamma, Alan F. Smeaton, and Qiang Yang. 2012. Where is the user in multimedia retrieval? IEEE MultiMedia 19, 4, 6--10.

Digital Library

[47]

Zheng-Jun Zha, Meng Wang, Yan-Tao Zheng, Yi Yang, Richang Hong, and Tat-Seng Chua. 2012. Interactive video indexing with statistical active learning. IEEE Transactions on Multimedia 14, 1, 17--27.

Digital Library

[48]

Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. 2017. Places: A 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 6, 1452--1464.

[49]

Bolei Zhou, Agata Lapedriza, Jianxiong Xiao, Antonio Torralba, and Aude Oliva. 2014. Learning deep features for scene recognition using places database. In Proceedings of the 27th International Conference on Neural Information Processing Systems—Volume 1 (NIPS’14). 487--495.

Digital Library

Cited By

Sharma UKhan ORudinac SJónsson B(2025)Exquisitor at the Video Browser Showdown 2025: Unifying Conversational Search and User Relevance FeedbackMultiMedia Modeling10.1007/978-981-96-2074-6_31(264-271)Online publication date: 1-Jan-2025
https://doi.org/10.1007/978-981-96-2074-6_31
Sauter LGasser RSchuldt HBernstein ARossetto L(2024)Performance Evaluation in Multimedia RetrievalACM Transactions on Multimedia Computing, Communications, and Applications10.1145/367888121:1(1-23)Online publication date: 14-Oct-2024
https://dl.acm.org/doi/10.1145/3678881
Wang JTang HKantor TSoltani TPopov VWang X(2024)Surgment: Segmentation-enabled Semantic Search and Creation of Visual Question and Feedback to Support Video-Based Surgery LearningProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642587(1-18)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642587
Show More Cited By

Index Terms

Interactive Search or Sequential Browsing? A Detailed Analysis of the Video Browser Showdown 2018
1. Information systems
  1. Information retrieval
    1. Specialized information retrieval
      1. Multimedia and multimodal retrieval
        Video search

Recommendations

Exploring Effective Interactive Text-Based Video Search in vitrivr
MultiMedia Modeling
Abstract
vitrivr is a general purpose retrieval system that supports a wide range of query modalities. In this paper, we briefly introduce the system and describe the changes and adjustments made for the 2023 iteration of the video browser showdown. These ...
What Is the Role of Similarity for Known-Item Search at Video Browser Showdown?
Similarity Search and Applications
Abstract
Across many domains, machine learning approaches start to compete with human experts in tasks originally considered as very difficult for automation. However, effective retrieval of general video shots still represents an issue due to their ...
Interactive Video Search: Where is the User in the Age of Deep Learning?
MM '18: Proceedings of the 26th ACM international conference on Multimedia

In this tutorial we discuss interactive video search tools and methods, review their need in the age of deep learning, and explore video and multimedia search challenges and their role as evaluation benchmarks in the field of multimedia information ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 15, Issue 1

February 2019

265 pages

ISSN:1551-6857

EISSN:1551-6865

DOI:10.1145/3309717

Editor:
Alberto Del Bimbo
University of Firenze, Italy

Issue’s Table of Contents

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 February 2019

Accepted: 01 November 2018

Revised: 01 November 2018

Received: 01 July 2018

Published in TOMM Volume 15, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Swiss National Science Foundation (SNSF)
European Regional Development Fund and the Carinthian Economic Promotion Fund (KWF)
Universität Klagenfurt and Lakeside Labs GmbH, Klagenfurt, Austria
Council of the Hong Kong Special Administrative Region, China
Czech Science Foundation (GAČR)
Horizon 2020 Research and Innovation Programme V4Design
CHIST-ERA project IMOTION

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

49
Total Citations
View Citations
452
Total Downloads

Downloads (Last 12 months)17
Downloads (Last 6 weeks)0

Reflects downloads up to 14 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Sharma UKhan ORudinac SJónsson B(2025)Exquisitor at the Video Browser Showdown 2025: Unifying Conversational Search and User Relevance FeedbackMultiMedia Modeling10.1007/978-981-96-2074-6_31(264-271)Online publication date: 1-Jan-2025
https://doi.org/10.1007/978-981-96-2074-6_31
Sauter LGasser RSchuldt HBernstein ARossetto L(2024)Performance Evaluation in Multimedia RetrievalACM Transactions on Multimedia Computing, Communications, and Applications10.1145/367888121:1(1-23)Online publication date: 14-Oct-2024
https://dl.acm.org/doi/10.1145/3678881
Wang JTang HKantor TSoltani TPopov VWang X(2024)Surgment: Segmentation-enabled Semantic Search and Creation of Visual Question and Feedback to Support Video-Based Surgery LearningProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642587(1-18)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642587
Schall KBailer WBarthel KCarrara FLokoč JPeška LSchoeffmann KVadicamo LVairo C(2024)Interactive multimodal video search: an extended post-evaluation for the VBS 2022 competitionInternational Journal of Multimedia Information Retrieval10.1007/s13735-024-00325-913:2Online publication date: 26-Mar-2024
https://doi.org/10.1007/s13735-024-00325-9
Schoeffmann KNasirihaghighi S(2024)DiveXplore at the Video Browser Showdown 2024MultiMedia Modeling10.1007/978-3-031-53302-0_34(372-379)Online publication date: 29-Jan-2024
https://dl.acm.org/doi/10.1007/978-3-031-53302-0_34
Khan OZhu HSharma UKanoulas ERudinac SJónsson B(2024)Exquisitor at the Video Browser Showdown 2024: Relevance Feedback Meets Conversational SearchMultiMedia Modeling10.1007/978-3-031-53302-0_31(347-355)Online publication date: 29-Jan-2024
https://dl.acm.org/doi/10.1007/978-3-031-53302-0_31
Schoeffmann KStefanics DLeibetseder A(2023)diveXplore at the Video Browser Showdown 2023MultiMedia Modeling10.1007/978-3-031-27077-2_59(684-689)Online publication date: 29-Mar-2023
https://doi.org/10.1007/978-3-031-27077-2_59
Sauter LGasser RBernstein ASchuldt HRossetto LRossetto LBailer WSchoeffmann KLokoč J(2022)An Asynchronous Scheme for the Distributed Evaluation of Interactive Multimedia RetrievalProceedings of the 2nd International Workshop on Interactive Multimedia Retrieval10.1145/3552467.3554797(33-39)Online publication date: 14-Oct-2022
https://dl.acm.org/doi/10.1145/3552467.3554797
Rossetto LBerns FSchoeffmann KAwad GBeecks C(2022)The V3C1 datasetACM SIGMultimedia Records10.1145/3524274.352428311:2(1-1)Online publication date: 7-Mar-2022
https://dl.acm.org/doi/10.1145/3524274.3524283
Leibetseder AStefanics DSchoeffmann KGurrin CHealy GZhou LÞór Jónsson BDang-Nguyen DLokoč JTran MHürst WRossetto LSchoeffmann K(2022)lifeXplore at the Lifelog Search Challenge 2022Proceedings of the 5th Annual on Lifelog Search Challenge10.1145/3512729.3533005(48-52)Online publication date: 27-Jun-2022
https://dl.acm.org/doi/10.1145/3512729.3533005
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Issue’s Table of Contents