Abstract
Vibro (Video Browser) is an interactive video retrieval system and the winner of the Video Browser Showdown 2022 and 2023. This paper gives an overview of the underlying concepts of this tool and highlights the changes that were implemented for the upcoming competition of 2024. Additionally, we propose a way to evaluate retrieval engine performance for the specific use case of known-item search, by using logged query data from previous competitions. This evaluation helps in finding an optimal embedding model candidate for image- and text-to-image retrieval and making the right decisions to improve the current system.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Berns, F., Rossetto, L., Schoeffmann, K., Beecks, C., Awad, G.: V3C1 dataset: an evaluation of content characteristics. In: Proceedings of the 2019 on International Conference on Multimedia Retrieval, ICMR 2019 (2019)
Dosovitskiy, A., et al.: An image is worth 16 \(\times \) 16 words: transformers for image recognition at scale. CoRR (2020)
Fang, Y., et al.: EVA: exploring the limits of masked visual representation learning at scale (2022). https://doi.org/10.48550/ARXIV.2211.07636. https://arxiv.org/abs/2211.07636
Heller, S., et al.: Interactive video retrieval evaluation at a distance: comparing sixteen interactive video search systems in a remote setting at the 10th video browser showdown. Int. J. Multim. Inf. Retr. 11(1), 1–18 (2022). https://doi.org/10.1007/s13735-021-00225-2
Hendrycks, D., Zhao, K., Basart, S., Steinhardt, J., Song, D.: Natural adversarial examples (2019). https://doi.org/10.48550/ARXIV.1907.07174. https://arxiv.org/abs/1907.07174
Hezel, N., Barthel, K.U.: Dynamic construction and manipulation of hierarchical quartic image graphs. In: Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval, ICMR 2018, pp. 513–516. Association for Computing Machinery, New York (2018)
Hezel, N., Schall, K., Jung, K., Barthel, K.U.: Efficient search and browsing of large-scale video collections with vibro. In: Þór Jónsson, B. (ed.) MMM 2022. LNCS, vol. 13142, pp. 487–492. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-98355-0_43
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. arXiv preprint arXiv:2103.14030 (2021)
Lokoč, J., et al.: Interactive video retrieval in the age of effective joint embedding deep models: lessons from the 11th vbs. Multimedia Systems, 24 August 2023. https://doi.org/10.1007/s00530-023-01143-5
Radford, A., et al.: Learning transferable visual models from natural language supervision. CoRR abs/2103.00020 (2021)
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y
Schall, K., Bailer, W., Barthel, K.U., et al.: Interactive multimodal video search: an extended post-evaluation for the VBS 2022 competition, 11 September 2023. Preprint, currently under review. Available at Research Square https://doi.org/10.21203/rs.3.rs-3328018/v1
Schall, K., Barthel, K.U., Hezel, N., Jung, K.: GPR1200: a benchmark for general-purpose content-based image retrieval. In: Þór Jónsson, B., et al. (eds.) MMM 2022, Part I. LNCS, vol. 13141, pp. 205–216. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-98358-1_17
Schall, K., Barthel, K.U., Hezel, N., Jung, K.: Improving image encoders for general-purpose nearest neighbor search and classification. In: Proceedings of the 2023 ACM International Conference on Multimedia Retrieval, ICMR 2023, pp. 57–66. Association for Computing Machinery, New York (2023). https://doi.org/10.1145/3591106.3592266
Schall, K., Hezel, N., Jung, K., Barthel, K.U.: Vibro: video browsing with semantic and visual image embeddings. In: Dang-Nguyen, D.T., et al. (eds.) MultiMedia Modeling, MMM 2023. LNCS, vol. 13833, pp. 665–670. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-27077-2_56
Schuhmann, C., et al.: LAION-5B: an open large-scale dataset for training next generation image-text models (2022). https://doi.org/10.48550/ARXIV.2210.08402. https://arxiv.org/abs/2210.08402
Truong, Q.T., et al.: Marine video kit: a new marine video dataset for content-based analysis and retrieval. In: Dang-Nguyen, D.T., et al. (eds.) MultiMedia Modeling - 29th International Conference, MMM 2023, Bergen, Norway, 9–12 January 2023, vol. 13833, pp. 539–550. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-27077-2_42
Wortsman, M., et al.: Robust fine-tuning of zero-shot models (2021). https://doi.org/10.48550/ARXIV.2109.01903. https://arxiv.org/abs/2109.01903
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Schall, K., Hezel, N., Barthel, K.U., Jung, K. (2024). Optimizing the Interactive Video Retrieval Tool Vibro for the Video Browser Showdown 2024. In: Rudinac, S., et al. MultiMedia Modeling. MMM 2024. Lecture Notes in Computer Science, vol 14557. Springer, Cham. https://doi.org/10.1007/978-3-031-53302-0_33
Download citation
DOI: https://doi.org/10.1007/978-3-031-53302-0_33
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-53301-3
Online ISBN: 978-3-031-53302-0
eBook Packages: Computer ScienceComputer Science (R0)