VIVA: visual information retrieval in video archives

Mühling, Markus; Korfhage, Nikolaus; Pustu-Iren, Kader; Bars, Joanna; Knapp, Mario; Bellafkir, Hicham; Vogelbacher, Markus; Schneider, Daniel; Hörth, Angelika; Ewerth, Ralph; Freisleben, Bernd

doi:10.1007/s00799-022-00337-y

VIVA: visual information retrieval in video archives

Published: 10 September 2022

Volume 23, pages 319–333, (2022)
Cite this article

International Journal on Digital Libraries Aims and scope Submit manuscript

Markus Mühling ORCID: orcid.org/0000-0001-7391-264X¹,
Nikolaus Korfhage¹,
Kader Pustu-Iren²,
Joanna Bars⁴,
Mario Knapp¹,
Hicham Bellafkir¹,
Markus Vogelbacher¹,
Daniel Schneider¹,
Angelika Hörth⁴,
Ralph Ewerth^2,3 &
…
Bernd Freisleben¹

506 Accesses
3 Citations
Explore all metrics

Abstract

Video retrieval methods, e.g., for visual concept classification, person recognition, and similarity search, are essential to perform fine-grained semantic search in large video archives. However, such retrieval methods often have to be adapted to the users’ changing search requirements: which concepts or persons are frequently searched for, what research topics are currently important or will be relevant in the future? In this paper, we present VIVA, a software tool for building content-based video retrieval methods based on deep learning models. VIVA allows non-expert users to conduct visual information retrieval for concepts and persons in video archives and to add new people or concepts to the underlying deep learning models as new requirements arise. For this purpose, VIVA provides a novel semi-automatic data acquisition workflow including a web crawler, image similarity search, as well as review and user feedback components to reduce the time-consuming manual effort for collecting training samples. We present experimental retrieval results using VIVA for four use cases in the context of a historical video collection of the German Broadcasting Archive based on about 34,000 h of television recordings from the former German Democratic Republic (GDR). We evaluate the performance of deep learning models built using VIVA for 91 GDR specific concepts and 98 personalities from the former GDR as well as the performance of the image and person similarity search approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Memory benefits when actively, rather than passively, viewing images

Article 27 November 2023

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

Article Open access 06 February 2017

Recommendation system based on deep learning methods: a systematic review and new directions

Article 03 August 2019

Notes

References

Amato, G., Bolettieri, P., Carrara, F., Debole, F., Falchi, F., Gennaro, C., Vadicamo, L., Vairo, C.: The VISIONE video search system: exploiting off-the-shelf text search engines for large-scale video retrieval. J. Imaging 7(5), 76 (2021). https://doi.org/10.3390/jimaging7050076
Article Google Scholar
Amato, G., Falchi, F., Gennaro, C., Rabitti, F.: Searching and annotating 100m images with yfcc100m-hnfc6 and mi-file. In: Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing. pp. 1–4 (2017)
Andreadis, S., Moumtzidou, A., Gkountakos, K., Pantelidis, N., Apostolidis, K., Galanopoulos, D., Gialampoukidis, I., Vrochidis, S., Mezaris, V., Kompatsiaris, I.: VERGE in VBS 2021. In: Proceedings of the 27th International Conference on MultiMedia Modeling (MMM 2021). Lecture Notes in Computer Science, vol. 12573, pp. 398–404. Springer (2021) https://doi.org/10.1007/978-3-030-67835-7_35
Cao, Q., Shen, L., Xie, W., Parkhi, O.M., Zisserman, A.: Vggface2: a dataset for recognising faces across pose and age. In: Proceedings of the 13th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2018). pp. 67–74. IEEE (2018)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 248–255. IEEE (2009)
Deng, J., Guo, J., Xue, N., Zafeiriou, S.: ArcFace: additive angular margin loss for deep face recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4690–4699. IEEE (2019)
Deng, J., Guo, J., Zhou, Y., Yu, J., Kotsia, I., Zafeiriou, S.: Retinaface: single-stage dense face localisation in the wild. arXiv preprint arXiv:1905.00641 (2019)
Elsken, T., Metzen, J.H., Hutter, F.: Neural architecture search: a survey. J. Mach. Learn. Res. 20(1), 1–21 (2019)
MathSciNet MATH Google Scholar
Gasser, R., Rossetto, L., Schuldt, H.: Multimodal multimedia retrieval with vitrivr. In: Proceedings of the International Conference on Multimedia Retrieval (ICMR 2019). pp. 391–394. ACM (2019). https://doi.org/10.1145/3323873.3326921
Guo, Y., Zhang, L., Hu, Y., He, X., Gao, J.: MS-Celeb-1M: a dataset and benchmark for large-scale face recognition. In: Proceedings of 14th European Conference on Computer Vision. pp. 87–102. Lecture Notes in Computer Science, Springer (2016)
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Hu, B., Song, R., Wei, X., Yao, Y., Hua, X., Liu, Y.: PyRetri: A pytorch-based library for unsupervised image retrieval by deep convolutional neural networks. In: Proceedings of the 28th ACM International Conference on Multimedia. pp. 4461–4464. ACM (2020). https://doi.org/10.1145/3394171.3414537
Jegou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. IEEE Trans. Pattern Anal. Mach. Intell. 33(1), 117–128 (2010)
Article Google Scholar
Johnson, J., Douze, M., Jégou, H.: Billion-scale similarity search with GPUs. IEEE Trans. Big Data 7(3), 535–547 (2019)
Article Google Scholar
Kernighan, B.W., Lin, S.: An efficient heuristic procedure for partitioning graphs. Bell Syst. Tech. J. 49(2), 291–307 (1970)
Article MATH Google Scholar
King, D.E.: Dlib-ml: a machine learning toolkit. J. Mach. Learn. Res. 10, 1755–1758 (2009)
Google Scholar
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations (ICLR) (2015)
Korfhage, N., Mühling, M., Freisleben, B.: Intentional image similarity search. In: IAPR Workshop on Artificial Neural Networks in Pattern Recognition, pp. 23–35. Springer (2020)
Korfhage, N., Mühling, M., Freisleben, B.: ElasticHash: semantic image similarity search by deep hashing with elasticsearch. In: Proceedings of the International Conference on Computer Analysis of Images and Patterns (CAIP). pp 14–23. Springer (2021)
Kratochvíl, M., Mejzlík, F., Veselý, P., Soucek, T., Lokoc, J.: SOMHunter: Lightweight video search system with SOM-guided relevance feedback. In: Proceedings of the 28th International Conference on Multimedia (MM). pp. 4481–4484. ACM (2020), https://doi.org/10.1145/3394171.3414542
Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., Kamali, S., Popov, S., Malloci, M., Kolesnikov, A., et al.: The open images dataset v4. Int. J. Comput. Vis. 128(7), 1–26 (2020)
Article Google Scholar
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollar, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2980–2988 (2017)
Liu, C., Zoph, B., Neumann, M., Shlens, J., Hua, W., Li, L.J., Fei-Fei, L., Yuille, A., Huang, J., Murphy, K.: Progressive neural architecture search. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 19–34 (2018)
Lokoc, J., Schoeffmann, K., Bailer, W., Rossetto, L., Gurrin, C.: Interactive video retrieval in the age of deep learning. In: Proceedings of the International Conference on Multimedia Retrieval (ICMR), pp 2–4. ACM (2019). https://doi.org/10.1145/3323873.3326588
Mühling, M., Ewerth, R., Stadelmann, T., Zöfel, C., Shi, B., Freisleben, B.: University of Marburg at TRECVID 2007: shot boundary detection and high level feature extraction. In: TRECVID (2007)
Mühling, M., Meister, M., Korfhage, N., Wehling, J., Hörth, A., Ewerth, R., Freisleben, B.: Content-based video retrieval in historical collections of the German broadcasting archive. Int. J. Digit. Libr. 20(2), 167–183 (2019)
Article Google Scholar
Nguyen, P.A., Wu, J., Ngo, C., Francis, D., Huet, B.: VIREO@ video browser showdown 2020. In: Proceedings of the 26th International Conference on MultiMedia Modeling (MMM). Lecture Notes in Computer Science, vol. 11962, pp. 772–777. Springer (2020). https://doi.org/10.1007/978-3-030-37734-2_68
Norouzi, M., Punjani, A., Fleet, D.J.: Fast search in Hamming space with multi-index hashing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 3108–3115. IEEE (2012)
Pustu-Iren, K., Mühling, M., Korfhage, N., Bars, J., Bernhöft, S., Hörth, A., Freisleben, B., Ewerth, R.: Investigating correlations of inter-coder agreement and machine annotation performance for historical video data. In: Proceedings of the International Conference on Theory and Practice of Digital Libraries, pp. 107–114 (2019)
Rodrigues, J., Cristo, M., Colonna, J.G.: Deep hashing for multi-label image retrieval: a survey. Artif. Intell. Rev. 53(7), 5261–5307 (2020)
Article Google Scholar
Russell, B.C., Torralba, A., Murphy, K.P., Freeman, W.T.: LabelMe: a database and web-based tool for image annotation. Int. J. Comput. Vis. 77(1–3), 157–173 (2008)
Article Google Scholar
Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: A unified embedding for face recognition and clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 815–823 (2015)
Smeulders, A.W., Worring, M., Santini, S., Gupta, A., Jain, R.: Content-based image retrieval at the end of the early years. IEEE Trans. Pattern Anal. Mach. Intell. 22(12), 1349–1380 (2000)
Article Google Scholar
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 1–9 (2015)
Tan, M., Le, Q.: EfficientNet: Rethinking model scaling for convolutional neural networks. In: Proceedings of the International Conference on Machine Learning. pp. 6105–6114 (2019)
Wang, H., Wang, Y., Zhou, Z., Ji, X., Gong, D., Zhou, J., Li, Z., Liu, W.: CosFace: Large margin cosine loss for deep face recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 5265–5274. IEEE Computer Society (2018)
Wang, J., Yi, X., Guo, R., Jin, H., Xu, P., Li, S., Wang, X., Guo, X., Li, C., Xu, X., et al.: Milvus: A purpose-built vector data management system. In: Proceedings of the International Conference on Management of Data, pp 2614–2627 (2021)
Wang, J., Zhang, T., Sebe, N., Shen, H.T., et al.: A survey on learning to hash. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 769–790 (2017)
Article Google Scholar
Wang, J., Liu, W., Kumar, S., Chang, S.F.: Learning to hash for indexing big data: a survey. Proc. IEEE 104(1), 34–57 (2015)
Article Google Scholar
Yeager, L., Bernauer, J., Gray, A., Houston, M.: Digits: the deep learning GPU training system. In: ICML 2015 AutoML Workshop (2015)
Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., Torralba, A.: Places: a 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 40(6), 1452–1464 (2017)
Article Google Scholar

Download references

Acknowledgements

This work is financially supported by the German Research Foundation (DFG project number 388420599).

Author information

Authors and Affiliations

Department of Mathematics and Computer Science, University of Marburg, Hans-Meerwein-Str. 6, 35032, Marburg, Germany
Markus Mühling, Nikolaus Korfhage, Mario Knapp, Hicham Bellafkir, Markus Vogelbacher, Daniel Schneider & Bernd Freisleben
TIB – Leibniz Information Centre for Science and Technology, Welfengarten 1B, 30167, Hannover, Germany
Kader Pustu-Iren & Ralph Ewerth
L3S Research Center, Leibniz University Hannover, Appelstr. 4, 30167, Hannover, Germany
Ralph Ewerth
German Broadcasting Archive, Marlene-Dietrich-Allee 20, 14482, Potsdam, Germany
Joanna Bars & Angelika Hörth

Authors

Markus Mühling
View author publications
You can also search for this author in PubMed Google Scholar
Nikolaus Korfhage
View author publications
You can also search for this author in PubMed Google Scholar
Kader Pustu-Iren
View author publications
You can also search for this author in PubMed Google Scholar
Joanna Bars
View author publications
You can also search for this author in PubMed Google Scholar
Mario Knapp
View author publications
You can also search for this author in PubMed Google Scholar
Hicham Bellafkir
View author publications
You can also search for this author in PubMed Google Scholar
Markus Vogelbacher
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Schneider
View author publications
You can also search for this author in PubMed Google Scholar
Angelika Hörth
View author publications
You can also search for this author in PubMed Google Scholar
Ralph Ewerth
View author publications
You can also search for this author in PubMed Google Scholar
Bernd Freisleben
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Markus Mühling.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Mühling, M., Korfhage, N., Pustu-Iren, K. et al. VIVA: visual information retrieval in video archives. Int J Digit Libr 23, 319–333 (2022). https://doi.org/10.1007/s00799-022-00337-y

Download citation

Received: 15 March 2021
Revised: 27 June 2022
Accepted: 05 July 2022
Published: 10 September 2022
Issue Date: December 2022
DOI: https://doi.org/10.1007/s00799-022-00337-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

VIVA: visual information retrieval in video archives

Abstract

Access this article

Similar content being viewed by others

Memory benefits when actively, rather than passively, viewing images

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

Recommendation system based on deep learning methods: a systematic review and new directions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

VIVA: visual information retrieval in video archives

Abstract

Access this article

Similar content being viewed by others

Memory benefits when actively, rather than passively, viewing images

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

Recommendation system based on deep learning methods: a systematic review and new directions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation