Abstract
This paper presents details of our frame-based Ad-hoc Video Search system with manually assisted querying that will be used for the Video Browser Showdown 2021 (VBS2021). The main contributions of our new system consist of an improved automatic keywording component, better visual feature vectors which have been fine-tuned for the task of image retrieval, and an improved visual presentation of the search results. Additionally, we use a more powerful joint textual/visual search engine based on Lucene, which can perform a search according to the temporal sequence of textual or visual properties of the video frames.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Asano, Y.M., Rupprecht, C., Vedaldi, A.: Self-labelling via simultaneous clustering and representation learning. In: ICLR. OpenReview.net (2020)
Barthel, K., Hezel, N., Schall, K., Jung, K.: Real-time visual navigation in huge image sets using similarity graphs. In: ACM Multimedia, pp. 2202–2204. ACM (2019)
Barthel, K.U., Hezel, N.: Visually exploring millions of images using image maps and graphs, chapter 11, pp. 289–315. John Wiley & Sons, Ltd. (2019)
Barthel, K.U., Hezel, N., Jung, K.: Visually browsing millions of images using image graphs. In: Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval, ICMR 2017, pp. 475–479. Association for Computing Machinery, New York (2017)
Cao, B., Araujo, A., Sim, J.: Unifying deep local and global features for image search. arXiv pp. arXiv-2001 (2020)
Chen, Y.C., et al.: Uniter: Universal image-text representation learning. In: ECCV (2020)
Deng, J., Guo, J., Xue, N., Zafeiriou, S.: Arcface: additive angular margin loss for deep face recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4690–4699 (2019)
Durand, T., Mehrasa, N., Mori, G.: Learning a deep convnet for multi-label classification with partial labels. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 647–657 (2019)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Kratochvíl, M., Veselỳ, P., Mejzlík, F., Lokoč, J.: Som-hunter: video browsing with relevance-to-som feedback loop. In: International Conference on Multimedia Modeling, pp. 790–795. Springer (2020)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Liu, L., et al.: Deep learning for generic object detection: a survey. Int. J. Comput. Vision 128(2), 261–318 (2020)
Radenović, F., Tolias, G., Chum, O.: Fine-tuning CNN image retrieval with no human annotation. IEEE Trans. Pattern Anal. Mach. Intell. 41(7), 1655–1668 (2018)
Rossetto, L., et al.: Interactive video retrieval in the age of deep learning - detailed evaluation of VBS 2019. IEEE Trans. Multimed. 23, 1 (2020)
Schall, K., Barthel, K.U., Hezel, N., Jung, K.: Deep aggregation of regional convolutional activations for content based image retrieval. In: 2019 IEEE 21st International Workshop on Multimedia Signal Processing (MMSP), pp. 1–6. IEEE (2019)
Strong, G., Gong, M.: Self-sorting map: an efficient algorithm for presenting multimedia data in structured layouts. IEEE Trans. Multimed. 16(4), 1045–1058 (2014)
Weyand, T., Araujo, A., Cao, B., Sim, J.: Google landmarks dataset v2-a large-scale benchmark for instance-level recognition and retrieval. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2575–2584 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Hezel, N., Schall, K., Jung, K., Barthel, K.U. (2021). Video Search with Sub-Image Keyword Transfer Using Existing Image Archives. In: Lokoč, J., et al. MultiMedia Modeling. MMM 2021. Lecture Notes in Computer Science(), vol 12573. Springer, Cham. https://doi.org/10.1007/978-3-030-67835-7_49
Download citation
DOI: https://doi.org/10.1007/978-3-030-67835-7_49
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67834-0
Online ISBN: 978-3-030-67835-7
eBook Packages: Computer ScienceComputer Science (R0)