Video Search with Sub-Image Keyword Transfer Using Existing Image Archives

Hezel, Nico; Schall, Konstantin; Jung, Klaus; Barthel, Kai Uwe

doi:10.1007/978-3-030-67835-7_49

Nico Hezel¹⁵,
Konstantin Schall¹⁵,
Klaus Jung¹⁵ &
…
Kai Uwe Barthel¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12573))

Included in the following conference series:

International Conference on Multimedia Modeling

1830 Accesses
5 Citations

Abstract

This paper presents details of our frame-based Ad-hoc Video Search system with manually assisted querying that will be used for the Video Browser Showdown 2021 (VBS2021). The main contributions of our new system consist of an improved automatic keywording component, better visual feature vectors which have been fine-tuned for the task of image retrieval, and an improved visual presentation of the search results. Additionally, we use a more powerful joint textual/visual search engine based on Lucene, which can perform a search according to the temporal sequence of textual or visual properties of the video frames.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Asano, Y.M., Rupprecht, C., Vedaldi, A.: Self-labelling via simultaneous clustering and representation learning. In: ICLR. OpenReview.net (2020)
Google Scholar
Barthel, K., Hezel, N., Schall, K., Jung, K.: Real-time visual navigation in huge image sets using similarity graphs. In: ACM Multimedia, pp. 2202–2204. ACM (2019)
Google Scholar
Barthel, K.U., Hezel, N.: Visually exploring millions of images using image maps and graphs, chapter 11, pp. 289–315. John Wiley & Sons, Ltd. (2019)
Google Scholar
Barthel, K.U., Hezel, N., Jung, K.: Visually browsing millions of images using image graphs. In: Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval, ICMR 2017, pp. 475–479. Association for Computing Machinery, New York (2017)
Google Scholar
Cao, B., Araujo, A., Sim, J.: Unifying deep local and global features for image search. arXiv pp. arXiv-2001 (2020)
Google Scholar
Chen, Y.C., et al.: Uniter: Universal image-text representation learning. In: ECCV (2020)
Google Scholar
Deng, J., Guo, J., Xue, N., Zafeiriou, S.: Arcface: additive angular margin loss for deep face recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4690–4699 (2019)
Google Scholar
Durand, T., Mehrasa, N., Mori, G.: Learning a deep convnet for multi-label classification with partial labels. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 647–657 (2019)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Kratochvíl, M., Veselỳ, P., Mejzlík, F., Lokoč, J.: Som-hunter: video browsing with relevance-to-som feedback loop. In: International Conference on Multimedia Modeling, pp. 790–795. Springer (2020)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Liu, L., et al.: Deep learning for generic object detection: a survey. Int. J. Comput. Vision 128(2), 261–318 (2020)
Article Google Scholar
Radenović, F., Tolias, G., Chum, O.: Fine-tuning CNN image retrieval with no human annotation. IEEE Trans. Pattern Anal. Mach. Intell. 41(7), 1655–1668 (2018)
Article Google Scholar
Rossetto, L., et al.: Interactive video retrieval in the age of deep learning - detailed evaluation of VBS 2019. IEEE Trans. Multimed. 23, 1 (2020)
Google Scholar
Schall, K., Barthel, K.U., Hezel, N., Jung, K.: Deep aggregation of regional convolutional activations for content based image retrieval. In: 2019 IEEE 21st International Workshop on Multimedia Signal Processing (MMSP), pp. 1–6. IEEE (2019)
Google Scholar
Strong, G., Gong, M.: Self-sorting map: an efficient algorithm for presenting multimedia data in structured layouts. IEEE Trans. Multimed. 16(4), 1045–1058 (2014)
Article Google Scholar
Weyand, T., Araujo, A., Cao, B., Sim, J.: Google landmarks dataset v2-a large-scale benchmark for instance-level recognition and retrieval. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2575–2584 (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

HTW Berlin, University of Applied Sciences - Visual Computing Group, Wilhelminenhofstraße 75, 12459, Berlin, Germany
Nico Hezel, Konstantin Schall, Klaus Jung & Kai Uwe Barthel

Authors

Nico Hezel
View author publications
You can also search for this author in PubMed Google Scholar
Konstantin Schall
View author publications
You can also search for this author in PubMed Google Scholar
Klaus Jung
View author publications
You can also search for this author in PubMed Google Scholar
Kai Uwe Barthel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nico Hezel .

Editor information

Editors and Affiliations

Charles University, Prague, Czech Republic
Jakub Lokoč
Charles University, Prague, Czech Republic
Tomáš Skopal
Klagenfurt University, Klagenfurt, Austria
Klaus Schoeffmann
CERTH-ITI, Thessaloniki, Greece
Vasileios Mezaris
Renmin University of China, Beijing, China
Xirong Li
CERTH-ITI, Thessaloniki, Greece
Stefanos Vrochidis
Queen Mary University of London, London, UK
Ioannis Patras

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hezel, N., Schall, K., Jung, K., Barthel, K.U. (2021). Video Search with Sub-Image Keyword Transfer Using Existing Image Archives. In: Lokoč, J., et al. MultiMedia Modeling. MMM 2021. Lecture Notes in Computer Science(), vol 12573. Springer, Cham. https://doi.org/10.1007/978-3-030-67835-7_49

Download citation

DOI: https://doi.org/10.1007/978-3-030-67835-7_49
Published: 21 January 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67834-0
Online ISBN: 978-3-030-67835-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics