Abstract
This work summarizes the findings of the 7th iteration of the Video Browser Showdown (VBS) competition organized as a workshop at the 24th International Conference on Multimedia Modeling in Bangkok. The competition focuses on video retrieval scenarios in which the searched scenes were either previously observed or described by another person (i.e., an example shot is not available). During the event, nine teams competed with their video retrieval tools in providing access to a shared video collection with 600 hours of video content. Evaluation objectives, rules, scoring, tasks, and all participating tools are described in the article. In addition, we provide some insights into how the different teams interacted with their video browsers, which was made possible by a novel interaction logging mechanism introduced for this iteration of the VBS. The results collected at the VBS evaluation server confirm that searching for one particular scene in the collection when given a limited time is still a challenging task for many of the approaches that were showcased during the event. Given only a short textual description, finding the correct scene is even harder. In ad hoc search with multiple relevant scenes, the tools were mostly able to find at least one scene, whereas recall was the issue for many teams. The logs also reveal that even though recent exciting advances in machine learning narrow the classical semantic gap problem, user-centric interfaces are still required to mediate access to specific content. Finally, open challenges and lessons learned are presented for future VBS events.
- Elasticsearch: RESTful, Distributed Search 8 Analytics. Home Page. Retrieved March 30, 2018, from https://www.elastic.co/products/elasticsearch.Google Scholar
- NearPy. Home Page. Retrieved March 30, 2018, from https://github.com/pixelogik/NearPy.Google Scholar
- Giuseppe Amato, Fabrizio Falchi, Claudio Gennaro, and Fausto Rabitti. 2017. Searching and annotating 100M Images with YFCC100M-HNfc6 and MI-File. In Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing (CMBI’17). 26:1--26:4. Google ScholarDigital Library
- George Awad, Asad Butt, Jonathan Fiscus, Martial Michel, David Joy, Wessel Kraaij, et al. 2017. TRECVID 2017: Evaluating ad-hoc and instance video search, events detection, video captioning and hyperlinking. In Proceedings of the 17th AnnualTREC Video Retrieval Evaluation (TRECVID’17).Google Scholar
- Kai Uwe Barthel and Nico Hezel. 2018. Visually exploring millions of images using image maps and graphs. In Big Data Analytics for Large-Scale Multimedia Search, B. Huet, S. Vrochidis, and E. Chang (Eds.). John Wiley 8 Sons, New Jersey, 251--275.Google Scholar
- Kai Uwe Barthel, Nico Hezel, and Radek Mackowiak. 2015. Graph-based browsing for large video collections. In MultiMedia Modeling, X. He, S. Luo, D. Tao, C. Xu, J. Yang, and M. A. Hasan (Eds.). Springer International Publishing, Cham, Switzerland, 237--242.Google Scholar
- Kai Uwe Barthel, Nico Hezel, and Radek Mackowiak. 2016. Navigating a graph of scenes for exploring large video collections. In MultiMedia Modeling, Q. Tian, N. Sebe, G.-J. Qi, B. Huet, R. Hong, and X. Liu (Eds.). Springer International Publishing, Cham, Switzerland, 418--423.Google Scholar
- Claudiu Cobârzan, Klaus Schoeffmann, Werner Bailer, Wolfgang Hürst, Adam Blažek, Jakub Lokoč, et al. 2017. Interactive video search tools: A detailed analysis of the Video Browser Showdown 2015. Multimedia Tools and Applications 76, 4, 5539--5571. Google ScholarDigital Library
- Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’09). IEEE, Los Alamitos, CA, 248--255.Google ScholarCross Ref
- Ivan Giangreco and Heiko Schuldt. 2016. ADAM pro: Database support for big multimedia retrieval. Datenbank-Spektrum 16, 1, 17--26.Google ScholarCross Ref
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Identity mappings in deep residual networks. arXiv:1603.05027. http://arxiv.org/abs/1603.05027.Google Scholar
- Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning. 448--456. Google ScholarDigital Library
- Melody Y. Ivory and Marti A. Hearst. 2001. The state of the art in automating usability evaluation of user interfaces. ACM Computing Surveys 33, 4, 470--516. Google ScholarDigital Library
- Justin Johnson, Andrej Karpathy, and Fei-Fei Li. 2015. DenseCap: Fully convolutional localization networks for dense captioning. arXiv:1511.07571. http://arxiv.org/abs/1511.07571.Google Scholar
- Teuvo Kohonen. 1998. The self-organizing map. Neurocomputing 21, 1-3, 1--6.Google ScholarCross Ref
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097--1105. Google ScholarDigital Library
- Martha Larson, Mohammad Soleymani, Guillaume Gravier, Bogdan Ionescu, and Gareth J. F. Jones. 2017. The benchmarking initiative for multimedia evaluation: MediaEval 2016. IEEE MultiMedia 24, 1, 93--96.Google ScholarCross Ref
- Andreas Leibetseder, Sabrina Kletz, and Klaus Schoeffmann. 2018. Sketch-based similarity search for collaborative feature maps. In MultiMedia Modeling, K. Schoeffmann, T. H. Chalidabhongse, C. W. Ngo, S. Aramvith, N. E. O’Connor, Y.-S. Ho, et al. (Eds.). Springer International Publishing, Cham, Switzerland, 425--430.Google Scholar
- Michael S. Lew, Nicu Sebe, Chabane Djeraba, and Ramesh Jain. 2006. Content-based multimedia information retrieval: State of the art and challenges. ACM Transactions on Multimedia Computing, Communications, and Applications 2, 1, 1--19. Google ScholarDigital Library
- N. Liu and J. Han. 2016. DHSNet: Deep hierarchical saliency network for salient object detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 678--686.Google Scholar
- J. Lokoč, W. Bailer, K. Schoeffmann, B. Muenzer, and G. Awad. 2018. On influential trends in interactive video retrieval: Video Browser Showdown 2015-2017. IEEE Transactions on Multimedia 20, 12, 3361--3376.Google Scholar
- Jakub Lokoč, Gregor Kovalčík, and Tomáš Souček. 2018. Revisiting SIRET video retrieval tool. In Proceedings of the 24th International Conference on Multimedia Modeling (MMM’18), Part II. 419--424.Google Scholar
- Jakub Lokoč, Tomáš Souček, and Gregor Kovalčík. 2018. Using an interactive video retrieval tool for lifelog data. In Proceedings of the 2018 ACM Workshop on the Lifelog Search Challenge (LSC’18). ACM, New York, NY, 15--19. Google ScholarDigital Library
- Yi-Jie Lu, Phuong Anh Nguyen, Hao Zhang, and Chong-Wah Ngo. 2017. Concept-based interactive search system. In MultiMedia Modeling, K. Schoeffmann, T. H. Chalidabhongse, C. W. Ngo, S. Aramvith, N.E. O’Connor, Y.-S. Ho, et al. (Eds.). Springer International Publishing, Cham, Switzerland, 463--468.Google Scholar
- Michael McCandless, Erik Hatcher, and Otis Gospodnetic. 2010. Lucene in Action, Second Edition: Covers Apache Lucene 3.0. Manning Publications, Greenwich, CT. Google ScholarDigital Library
- Phuong Anh Nguyen, Yi-Jie Lu, Hao Zhang, and Chong-Wah Ngo. 2018. Enhanced VIREO KIS at VBS 2018. In MultiMedia Modeling, K. Schoeffmann, T. H. Chalidabhongse, C. W. Ngo, S. Aramvith, N. E. O’Connor, Y.-S. Ho, et al. (Eds.). Springer International Publishing, Cham, Switzerland, 407--412.Google Scholar
- Manfred Jürgen Primus, Bernd Münzer, Andreas Leibetseder, and Klaus Schoeffmann. 2018. The ITEC collaborative video search system at the Video Browser Showdown 2018. In MultiMedia Modeling, K. Schoeffmann, T. H. Chalidabhongse, C. W. Ngo, S. Aramvith, N. E. O’Connor, Y.-S. Ho, et al. (Eds.). Springer International Publishing, Cham, Switzerland, 438--443.Google Scholar
- Marek Rogozinski Rafal Kuc. 2013. Mastering ElasticSearch. Packt Publishing. Google ScholarDigital Library
- Joseph Redmon and Ali Farhadi. 2016. YOLO9000: Better, faster, stronger. arXiv:1612.08242. http://arxiv.org/abs/1612.08242Google Scholar
- Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Neural Information Processing Systems (NIPS). Google ScholarDigital Library
- Luca Rossetto, Ivan Giangreco, Ralph Gasser, and Heiko Schuldt. 2018. Competitive video retrieval with vitrivr. In Proceedings of the 24th International Conference on MultiMedia Modeling (MMM’18), Part II. 403--406.Google ScholarCross Ref
- Luca Rossetto, Ivan Giangreco, and Heiko Schuldt. 2014. Cineast: A multi-feature sketch-based video retrieval engine. In Proceedings of the 2014 IEEE International Symposium on Multimedia. 18--23. Google ScholarDigital Library
- Luca Rossetto, Ivan Giangreco, Claudiu Tanase, and Heiko Schuldt. 2016. Vitrivr: A flexible retrieval stack supporting multiple query modes for searching in multimedia collections. In Proceedings of the 2016 ACM Conference on Multimedia (MM’16). ACM, New York, NY, 1183--1186. Google ScholarDigital Library
- Sitapa Rujikietgumjorn, Nattachai Watcharapinchai, and Sanparith Marukatat. 2018. Sloth search system. In Proceedings of the 24th International Conference on MultiMedia Modeling (MMM’18), Part II. 431--437.Google ScholarCross Ref
- Klaus Schoeffmann. 2014. A user-centric media retrieval competition: The Video Browser Showdown 2012-2014. IEEE MultiMedia 21, 4, 8--13.Google Scholar
- Klaus Schoeffmann, Frank Hopfgartner, Oge Marques, Laszlo Boeszoermenyi, and Joemon M. Jose. 2010. Video browsing interfaces and applications: A review. SPIE Reviews 1, 1, 018004.Google Scholar
- Klaus Schoeffmann, Marco A. Hudelist, and Jochen Huber. 2015. Video interaction tools: A survey of recent work. ACM Computing Surveys 48, 1, Article 14 (Sept. 2015), 34 pages. Google ScholarDigital Library
- Klaus Schoeffmann, Manfred Jürgen Primus, Bernd Muenzer, Stefan Petscharnig, Christof Karisch, Qing Xu, et al. 2017. Collaborative Feature Maps for Interactive Video Search. Springer International Publishing, Cham, Switzerland, 457--462.Google Scholar
- Mei-Ling Shyu, Zongxing Xie, Min Chen, and Shu-Ching Chen. 2008. Video semantic event/concept detection using a subspace-based multimedia data mining framework. IEEE Transactions on Multimedia 10, 2, 252--259. Google ScholarDigital Library
- Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arxiv:1409.1556. http://arxiv.org/abs/1409.1556.Google Scholar
- R. Smith. 2007. An overview of the tesseract OCR engine. In Proceedings of the 9th International Conference on Document Analysis and Recognition—Volume 02 (ICDAR’07). IEEE, Los Alamitos, CA, 629--633. Google ScholarDigital Library
- Cees G. M. Snoek and Marcel Worring. 2005. Multimedia event-based video indexing using time intervals. IEEE Transactions on Multimedia 7, 4, 638--647. Google ScholarDigital Library
- Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott E. Reed, Dragomir Anguelov, et al. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 1--9.Google ScholarCross Ref
- Thanh-Dat Truong, Vinh-Tiep Nguyen, Minh-Triet Tran, Trang-Vinh Trieu, Tien Do, Thanh Duc Ngo, et al. 2018. Video search based on semantic extraction and locally regional object proposal. In MultiMedia Modeling, K. Schoeffmann, T. H. Chalidabhongse, C. W. Ngo, S. Aramvith, N. E. O’Connor, Y.-S. Ho, et al. (Eds.). Springer International Publishing, Cham, Switzerland, 451--456.Google Scholar
- O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. 2017. Show and tell: Lessons learned from the 2015 mscoco image captioning challenge. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 4, 652--663. Google ScholarDigital Library
- Marcel Worring, Paul Sajda, Simone Santini, David A. Shamma, Alan F. Smeaton, and Qiang Yang. 2012. Where is the user in multimedia retrieval? IEEE MultiMedia 19, 4, 6--10. Google ScholarDigital Library
- Zheng-Jun Zha, Meng Wang, Yan-Tao Zheng, Yi Yang, Richang Hong, and Tat-Seng Chua. 2012. Interactive video indexing with statistical active learning. IEEE Transactions on Multimedia 14, 1, 17--27. Google ScholarDigital Library
- Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. 2017. Places: A 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 6, 1452--1464.Google ScholarCross Ref
- Bolei Zhou, Agata Lapedriza, Jianxiong Xiao, Antonio Torralba, and Aude Oliva. 2014. Learning deep features for scene recognition using places database. In Proceedings of the 27th International Conference on Neural Information Processing Systems—Volume 1 (NIPS’14). 487--495. Google ScholarDigital Library
Index Terms
- Interactive Search or Sequential Browsing? A Detailed Analysis of the Video Browser Showdown 2018
Recommendations
Exploring Effective Interactive Text-Based Video Search in vitrivr
MultiMedia ModelingAbstractvitrivr is a general purpose retrieval system that supports a wide range of query modalities. In this paper, we briefly introduce the system and describe the changes and adjustments made for the 2023 iteration of the video browser showdown. These ...
What Is the Role of Similarity for Known-Item Search at Video Browser Showdown?
Similarity Search and ApplicationsAbstractAcross many domains, machine learning approaches start to compete with human experts in tasks originally considered as very difficult for automation. However, effective retrieval of general video shots still represents an issue due to their ...
Interactive Video Search: Where is the User in the Age of Deep Learning?
MM '18: Proceedings of the 26th ACM international conference on MultimediaIn this tutorial we discuss interactive video search tools and methods, review their need in the age of deep learning, and explore video and multimedia search challenges and their role as evaluation benchmarks in the field of multimedia information ...
Comments