skip to main content
research-article

Interactive Search or Sequential Browsing? A Detailed Analysis of the Video Browser Showdown 2018

Published:13 February 2019Publication History
Skip Abstract Section

Abstract

This work summarizes the findings of the 7th iteration of the Video Browser Showdown (VBS) competition organized as a workshop at the 24th International Conference on Multimedia Modeling in Bangkok. The competition focuses on video retrieval scenarios in which the searched scenes were either previously observed or described by another person (i.e., an example shot is not available). During the event, nine teams competed with their video retrieval tools in providing access to a shared video collection with 600 hours of video content. Evaluation objectives, rules, scoring, tasks, and all participating tools are described in the article. In addition, we provide some insights into how the different teams interacted with their video browsers, which was made possible by a novel interaction logging mechanism introduced for this iteration of the VBS. The results collected at the VBS evaluation server confirm that searching for one particular scene in the collection when given a limited time is still a challenging task for many of the approaches that were showcased during the event. Given only a short textual description, finding the correct scene is even harder. In ad hoc search with multiple relevant scenes, the tools were mostly able to find at least one scene, whereas recall was the issue for many teams. The logs also reveal that even though recent exciting advances in machine learning narrow the classical semantic gap problem, user-centric interfaces are still required to mediate access to specific content. Finally, open challenges and lessons learned are presented for future VBS events.

References

  1. Elasticsearch: RESTful, Distributed Search 8 Analytics. Home Page. Retrieved March 30, 2018, from https://www.elastic.co/products/elasticsearch.Google ScholarGoogle Scholar
  2. NearPy. Home Page. Retrieved March 30, 2018, from https://github.com/pixelogik/NearPy.Google ScholarGoogle Scholar
  3. Giuseppe Amato, Fabrizio Falchi, Claudio Gennaro, and Fausto Rabitti. 2017. Searching and annotating 100M Images with YFCC100M-HNfc6 and MI-File. In Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing (CMBI’17). 26:1--26:4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. George Awad, Asad Butt, Jonathan Fiscus, Martial Michel, David Joy, Wessel Kraaij, et al. 2017. TRECVID 2017: Evaluating ad-hoc and instance video search, events detection, video captioning and hyperlinking. In Proceedings of the 17th AnnualTREC Video Retrieval Evaluation (TRECVID’17).Google ScholarGoogle Scholar
  5. Kai Uwe Barthel and Nico Hezel. 2018. Visually exploring millions of images using image maps and graphs. In Big Data Analytics for Large-Scale Multimedia Search, B. Huet, S. Vrochidis, and E. Chang (Eds.). John Wiley 8 Sons, New Jersey, 251--275.Google ScholarGoogle Scholar
  6. Kai Uwe Barthel, Nico Hezel, and Radek Mackowiak. 2015. Graph-based browsing for large video collections. In MultiMedia Modeling, X. He, S. Luo, D. Tao, C. Xu, J. Yang, and M. A. Hasan (Eds.). Springer International Publishing, Cham, Switzerland, 237--242.Google ScholarGoogle Scholar
  7. Kai Uwe Barthel, Nico Hezel, and Radek Mackowiak. 2016. Navigating a graph of scenes for exploring large video collections. In MultiMedia Modeling, Q. Tian, N. Sebe, G.-J. Qi, B. Huet, R. Hong, and X. Liu (Eds.). Springer International Publishing, Cham, Switzerland, 418--423.Google ScholarGoogle Scholar
  8. Claudiu Cobârzan, Klaus Schoeffmann, Werner Bailer, Wolfgang Hürst, Adam Blažek, Jakub Lokoč, et al. 2017. Interactive video search tools: A detailed analysis of the Video Browser Showdown 2015. Multimedia Tools and Applications 76, 4, 5539--5571. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’09). IEEE, Los Alamitos, CA, 248--255.Google ScholarGoogle ScholarCross RefCross Ref
  10. Ivan Giangreco and Heiko Schuldt. 2016. ADAM pro: Database support for big multimedia retrieval. Datenbank-Spektrum 16, 1, 17--26.Google ScholarGoogle ScholarCross RefCross Ref
  11. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Identity mappings in deep residual networks. arXiv:1603.05027. http://arxiv.org/abs/1603.05027.Google ScholarGoogle Scholar
  12. Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning. 448--456. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Melody Y. Ivory and Marti A. Hearst. 2001. The state of the art in automating usability evaluation of user interfaces. ACM Computing Surveys 33, 4, 470--516. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Justin Johnson, Andrej Karpathy, and Fei-Fei Li. 2015. DenseCap: Fully convolutional localization networks for dense captioning. arXiv:1511.07571. http://arxiv.org/abs/1511.07571.Google ScholarGoogle Scholar
  15. Teuvo Kohonen. 1998. The self-organizing map. Neurocomputing 21, 1-3, 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  16. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097--1105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Martha Larson, Mohammad Soleymani, Guillaume Gravier, Bogdan Ionescu, and Gareth J. F. Jones. 2017. The benchmarking initiative for multimedia evaluation: MediaEval 2016. IEEE MultiMedia 24, 1, 93--96.Google ScholarGoogle ScholarCross RefCross Ref
  18. Andreas Leibetseder, Sabrina Kletz, and Klaus Schoeffmann. 2018. Sketch-based similarity search for collaborative feature maps. In MultiMedia Modeling, K. Schoeffmann, T. H. Chalidabhongse, C. W. Ngo, S. Aramvith, N. E. O’Connor, Y.-S. Ho, et al. (Eds.). Springer International Publishing, Cham, Switzerland, 425--430.Google ScholarGoogle Scholar
  19. Michael S. Lew, Nicu Sebe, Chabane Djeraba, and Ramesh Jain. 2006. Content-based multimedia information retrieval: State of the art and challenges. ACM Transactions on Multimedia Computing, Communications, and Applications 2, 1, 1--19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. N. Liu and J. Han. 2016. DHSNet: Deep hierarchical saliency network for salient object detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 678--686.Google ScholarGoogle Scholar
  21. J. Lokoč, W. Bailer, K. Schoeffmann, B. Muenzer, and G. Awad. 2018. On influential trends in interactive video retrieval: Video Browser Showdown 2015-2017. IEEE Transactions on Multimedia 20, 12, 3361--3376.Google ScholarGoogle Scholar
  22. Jakub Lokoč, Gregor Kovalčík, and Tomáš Souček. 2018. Revisiting SIRET video retrieval tool. In Proceedings of the 24th International Conference on Multimedia Modeling (MMM’18), Part II. 419--424.Google ScholarGoogle Scholar
  23. Jakub Lokoč, Tomáš Souček, and Gregor Kovalčík. 2018. Using an interactive video retrieval tool for lifelog data. In Proceedings of the 2018 ACM Workshop on the Lifelog Search Challenge (LSC’18). ACM, New York, NY, 15--19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Yi-Jie Lu, Phuong Anh Nguyen, Hao Zhang, and Chong-Wah Ngo. 2017. Concept-based interactive search system. In MultiMedia Modeling, K. Schoeffmann, T. H. Chalidabhongse, C. W. Ngo, S. Aramvith, N.E. O’Connor, Y.-S. Ho, et al. (Eds.). Springer International Publishing, Cham, Switzerland, 463--468.Google ScholarGoogle Scholar
  25. Michael McCandless, Erik Hatcher, and Otis Gospodnetic. 2010. Lucene in Action, Second Edition: Covers Apache Lucene 3.0. Manning Publications, Greenwich, CT. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Phuong Anh Nguyen, Yi-Jie Lu, Hao Zhang, and Chong-Wah Ngo. 2018. Enhanced VIREO KIS at VBS 2018. In MultiMedia Modeling, K. Schoeffmann, T. H. Chalidabhongse, C. W. Ngo, S. Aramvith, N. E. O’Connor, Y.-S. Ho, et al. (Eds.). Springer International Publishing, Cham, Switzerland, 407--412.Google ScholarGoogle Scholar
  27. Manfred Jürgen Primus, Bernd Münzer, Andreas Leibetseder, and Klaus Schoeffmann. 2018. The ITEC collaborative video search system at the Video Browser Showdown 2018. In MultiMedia Modeling, K. Schoeffmann, T. H. Chalidabhongse, C. W. Ngo, S. Aramvith, N. E. O’Connor, Y.-S. Ho, et al. (Eds.). Springer International Publishing, Cham, Switzerland, 438--443.Google ScholarGoogle Scholar
  28. Marek Rogozinski Rafal Kuc. 2013. Mastering ElasticSearch. Packt Publishing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Joseph Redmon and Ali Farhadi. 2016. YOLO9000: Better, faster, stronger. arXiv:1612.08242. http://arxiv.org/abs/1612.08242Google ScholarGoogle Scholar
  30. Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Neural Information Processing Systems (NIPS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Luca Rossetto, Ivan Giangreco, Ralph Gasser, and Heiko Schuldt. 2018. Competitive video retrieval with vitrivr. In Proceedings of the 24th International Conference on MultiMedia Modeling (MMM’18), Part II. 403--406.Google ScholarGoogle ScholarCross RefCross Ref
  32. Luca Rossetto, Ivan Giangreco, and Heiko Schuldt. 2014. Cineast: A multi-feature sketch-based video retrieval engine. In Proceedings of the 2014 IEEE International Symposium on Multimedia. 18--23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Luca Rossetto, Ivan Giangreco, Claudiu Tanase, and Heiko Schuldt. 2016. Vitrivr: A flexible retrieval stack supporting multiple query modes for searching in multimedia collections. In Proceedings of the 2016 ACM Conference on Multimedia (MM’16). ACM, New York, NY, 1183--1186. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Sitapa Rujikietgumjorn, Nattachai Watcharapinchai, and Sanparith Marukatat. 2018. Sloth search system. In Proceedings of the 24th International Conference on MultiMedia Modeling (MMM’18), Part II. 431--437.Google ScholarGoogle ScholarCross RefCross Ref
  35. Klaus Schoeffmann. 2014. A user-centric media retrieval competition: The Video Browser Showdown 2012-2014. IEEE MultiMedia 21, 4, 8--13.Google ScholarGoogle Scholar
  36. Klaus Schoeffmann, Frank Hopfgartner, Oge Marques, Laszlo Boeszoermenyi, and Joemon M. Jose. 2010. Video browsing interfaces and applications: A review. SPIE Reviews 1, 1, 018004.Google ScholarGoogle Scholar
  37. Klaus Schoeffmann, Marco A. Hudelist, and Jochen Huber. 2015. Video interaction tools: A survey of recent work. ACM Computing Surveys 48, 1, Article 14 (Sept. 2015), 34 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Klaus Schoeffmann, Manfred Jürgen Primus, Bernd Muenzer, Stefan Petscharnig, Christof Karisch, Qing Xu, et al. 2017. Collaborative Feature Maps for Interactive Video Search. Springer International Publishing, Cham, Switzerland, 457--462.Google ScholarGoogle Scholar
  39. Mei-Ling Shyu, Zongxing Xie, Min Chen, and Shu-Ching Chen. 2008. Video semantic event/concept detection using a subspace-based multimedia data mining framework. IEEE Transactions on Multimedia 10, 2, 252--259. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arxiv:1409.1556. http://arxiv.org/abs/1409.1556.Google ScholarGoogle Scholar
  41. R. Smith. 2007. An overview of the tesseract OCR engine. In Proceedings of the 9th International Conference on Document Analysis and Recognition—Volume 02 (ICDAR’07). IEEE, Los Alamitos, CA, 629--633. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Cees G. M. Snoek and Marcel Worring. 2005. Multimedia event-based video indexing using time intervals. IEEE Transactions on Multimedia 7, 4, 638--647. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott E. Reed, Dragomir Anguelov, et al. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 1--9.Google ScholarGoogle ScholarCross RefCross Ref
  44. Thanh-Dat Truong, Vinh-Tiep Nguyen, Minh-Triet Tran, Trang-Vinh Trieu, Tien Do, Thanh Duc Ngo, et al. 2018. Video search based on semantic extraction and locally regional object proposal. In MultiMedia Modeling, K. Schoeffmann, T. H. Chalidabhongse, C. W. Ngo, S. Aramvith, N. E. O’Connor, Y.-S. Ho, et al. (Eds.). Springer International Publishing, Cham, Switzerland, 451--456.Google ScholarGoogle Scholar
  45. O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. 2017. Show and tell: Lessons learned from the 2015 mscoco image captioning challenge. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 4, 652--663. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Marcel Worring, Paul Sajda, Simone Santini, David A. Shamma, Alan F. Smeaton, and Qiang Yang. 2012. Where is the user in multimedia retrieval? IEEE MultiMedia 19, 4, 6--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Zheng-Jun Zha, Meng Wang, Yan-Tao Zheng, Yi Yang, Richang Hong, and Tat-Seng Chua. 2012. Interactive video indexing with statistical active learning. IEEE Transactions on Multimedia 14, 1, 17--27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. 2017. Places: A 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 6, 1452--1464.Google ScholarGoogle ScholarCross RefCross Ref
  49. Bolei Zhou, Agata Lapedriza, Jianxiong Xiao, Antonio Torralba, and Aude Oliva. 2014. Learning deep features for scene recognition using places database. In Proceedings of the 27th International Conference on Neural Information Processing Systems—Volume 1 (NIPS’14). 487--495. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Interactive Search or Sequential Browsing? A Detailed Analysis of the Video Browser Showdown 2018

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 15, Issue 1
      February 2019
      265 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/3309717
      Issue’s Table of Contents

      Copyright © 2019 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 13 February 2019
      • Revised: 1 November 2018
      • Accepted: 1 November 2018
      • Received: 1 July 2018
      Published in tomm Volume 15, Issue 1

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format