Skip to main content

Optimizing the Interactive Video Retrieval Tool Vibro for the Video Browser Showdown 2024

  • Conference paper
  • First Online:
MultiMedia Modeling (MMM 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14557))

Included in the following conference series:

  • 309 Accesses

Abstract

Vibro (Video Browser) is an interactive video retrieval system and the winner of the Video Browser Showdown 2022 and 2023. This paper gives an overview of the underlying concepts of this tool and highlights the changes that were implemented for the upcoming competition of 2024. Additionally, we propose a way to evaluate retrieval engine performance for the specific use case of known-item search, by using logged query data from previous competitions. This evaluation helps in finding an optimal embedding model candidate for image- and text-to-image retrieval and making the right decisions to improve the current system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Berns, F., Rossetto, L., Schoeffmann, K., Beecks, C., Awad, G.: V3C1 dataset: an evaluation of content characteristics. In: Proceedings of the 2019 on International Conference on Multimedia Retrieval, ICMR 2019 (2019)

    Google Scholar 

  2. Dosovitskiy, A., et al.: An image is worth 16 \(\times \) 16 words: transformers for image recognition at scale. CoRR (2020)

    Google Scholar 

  3. Fang, Y., et al.: EVA: exploring the limits of masked visual representation learning at scale (2022). https://doi.org/10.48550/ARXIV.2211.07636. https://arxiv.org/abs/2211.07636

  4. Heller, S., et al.: Interactive video retrieval evaluation at a distance: comparing sixteen interactive video search systems in a remote setting at the 10th video browser showdown. Int. J. Multim. Inf. Retr. 11(1), 1–18 (2022). https://doi.org/10.1007/s13735-021-00225-2

    Article  Google Scholar 

  5. Hendrycks, D., Zhao, K., Basart, S., Steinhardt, J., Song, D.: Natural adversarial examples (2019). https://doi.org/10.48550/ARXIV.1907.07174. https://arxiv.org/abs/1907.07174

  6. Hezel, N., Barthel, K.U.: Dynamic construction and manipulation of hierarchical quartic image graphs. In: Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval, ICMR 2018, pp. 513–516. Association for Computing Machinery, New York (2018)

    Google Scholar 

  7. Hezel, N., Schall, K., Jung, K., Barthel, K.U.: Efficient search and browsing of large-scale video collections with vibro. In: Þór Jónsson, B. (ed.) MMM 2022. LNCS, vol. 13142, pp. 487–492. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-98355-0_43

    Chapter  Google Scholar 

  8. Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. arXiv preprint arXiv:2103.14030 (2021)

  9. Lokoč, J., et al.: Interactive video retrieval in the age of effective joint embedding deep models: lessons from the 11th vbs. Multimedia Systems, 24 August 2023. https://doi.org/10.1007/s00530-023-01143-5

  10. Radford, A., et al.: Learning transferable visual models from natural language supervision. CoRR abs/2103.00020 (2021)

    Google Scholar 

  11. Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y

    Article  MathSciNet  Google Scholar 

  12. Schall, K., Bailer, W., Barthel, K.U., et al.: Interactive multimodal video search: an extended post-evaluation for the VBS 2022 competition, 11 September 2023. Preprint, currently under review. Available at Research Square https://doi.org/10.21203/rs.3.rs-3328018/v1

  13. Schall, K., Barthel, K.U., Hezel, N., Jung, K.: GPR1200: a benchmark for general-purpose content-based image retrieval. In: Þór Jónsson, B., et al. (eds.) MMM 2022, Part I. LNCS, vol. 13141, pp. 205–216. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-98358-1_17

  14. Schall, K., Barthel, K.U., Hezel, N., Jung, K.: Improving image encoders for general-purpose nearest neighbor search and classification. In: Proceedings of the 2023 ACM International Conference on Multimedia Retrieval, ICMR 2023, pp. 57–66. Association for Computing Machinery, New York (2023). https://doi.org/10.1145/3591106.3592266

  15. Schall, K., Hezel, N., Jung, K., Barthel, K.U.: Vibro: video browsing with semantic and visual image embeddings. In: Dang-Nguyen, D.T., et al. (eds.) MultiMedia Modeling, MMM 2023. LNCS, vol. 13833, pp. 665–670. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-27077-2_56

  16. Schuhmann, C., et al.: LAION-5B: an open large-scale dataset for training next generation image-text models (2022). https://doi.org/10.48550/ARXIV.2210.08402. https://arxiv.org/abs/2210.08402

  17. Truong, Q.T., et al.: Marine video kit: a new marine video dataset for content-based analysis and retrieval. In: Dang-Nguyen, D.T., et al. (eds.) MultiMedia Modeling - 29th International Conference, MMM 2023, Bergen, Norway, 9–12 January 2023, vol. 13833, pp. 539–550. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-27077-2_42

  18. Wortsman, M., et al.: Robust fine-tuning of zero-shot models (2021). https://doi.org/10.48550/ARXIV.2109.01903. https://arxiv.org/abs/2109.01903

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Konstantin Schall .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Schall, K., Hezel, N., Barthel, K.U., Jung, K. (2024). Optimizing the Interactive Video Retrieval Tool Vibro for the Video Browser Showdown 2024. In: Rudinac, S., et al. MultiMedia Modeling. MMM 2024. Lecture Notes in Computer Science, vol 14557. Springer, Cham. https://doi.org/10.1007/978-3-031-53302-0_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-53302-0_33

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-53301-3

  • Online ISBN: 978-3-031-53302-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics