Abstract
The W2VV++ model BoW variant integrated to VIRET and SOMHunter systems has proven its effectiveness in the previous Video Browser Showdown competition in 2020. As a next experimental interactive search prototype to benchmark, we consider a simple system relying on the more complex BERT variant of the W2VV++ model, accepting a rich text input. The input can be provided by keyboard or by speech processed by a third-party cloud service. The motivation for the more complex BERT variant is its good performance for rich text descriptions that can be provided for known-item search tasks. At the same time, users will be instructed to specify as rich text description about the searched scene as possible.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Note that we are aware that reverse events could be quite common within a single video as similar cuts may repeat frequently. Nonetheless, this would still reduce the task at hand to merely finding the right scene within a video.
- 2.
Note that padding is employed on the edges of individual videos.
References
Alateeq, A., Roantree, M., Gurrin, C.: Voxento: a prototype voice-controlled interactive search engine for lifelogs. In: Proceedings of the Third Annual Workshop on Lifelog Search Challenge, LSC 2020, pp. 77–81. ACM, New York (2020)
Blažek, A., Lokoč, J., Skopal, T.: Video retrieval with feature signature sketches. In: Traina, A.J.M., Traina, C., Cordeiro, R.L.F. (eds.) SISAP 2014. LNCS, vol. 8821, pp. 25–36. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11988-5_3
Cobârzan, C., et al.: Interactive video search tools: a detailed analysis of the video browser showdown 2015. Multimed. Tools Appl. 76(4), 5539–5571 (2016). https://doi.org/10.1007/s11042-016-3661-2
Hirzel, M., Schneider, S., Tangwongsan, K.: Sliding-window aggregation algorithms: tutorial. In: Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems, pp. 11–14. ACM (2017)
Klement, E.P., Mesiar, R., Pap, E.: Families of t-norms. In: Klement, E.P., Mesiar, R., Pap, E. (eds.) Triangular Norms, vol. 8, pp. 101–119. Springer, Dordrecht (2000). https://doi.org/10.1007/978-94-015-9540-7_4
Kratochvíl, M., Veselý, P., Mejzlík, F., Lokoč, J.: SOM-hunter: video browsing with relevance-to-SOM feedback loop. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11962, pp. 790–795. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37734-2_71
Li, X., Xu, C., Yang, G., Chen, Z., Dong, J.: W2VV++: fully deep learning for ad-hoc video search. In: Proceedings of the 27th ACM International Conference on Multimedia, MM 2019, Nice, France, 21–25 October 2019, pp. 1786–1794 (2019)
Lokoč, J., et al.: A W2VV++ case study with automated and interactive text-to-video retrieval. In: Proceedings of the 28th ACM International Conference on Multimedia, MM 2020. ACM, New York (2020)
Lokoč, J., Bailer, W., Schoeffmann, K., Münzer, B., Awad, G.: On influential trends in interactive video retrieval: video browser showdown 2015–2017. IEEE Trans. Multimed. 20(12), 3361–3376 (2018)
Lokoč, J., et al.: Interactive search or sequential browsing? A detailed analysis of the video browser showdown 2018. ACM Trans. Multimed. Comput. Commun. Appl. 15(1), 29:1–29:18 (2019)
Lokoč, J., Kovalčík, G., Souček, T., Moravec, J., Čech, P.: A framework for effective known-item search in video. In: Proceedings of the 27th ACM International Conference on Multimedia, MM 2019, pp. 1777–1785. ACM, New York (2019)
Lokoč, J., Kovalčík, G., Souček, T., Moravec, J., Čech, P.: VIRET: a video retrieval tool for interactive known-item search. In: Proceedings of the 2019 on International Conference on Multimedia Retrieval, ICMR 2019, pp. 177–181. ACM, New York (2019)
Mettes, P., Koelma, D.C., Snoek, C.G.M.: Shuffled imagenet banks for video event detection and search. ACM Trans. Multimed. Comput. Commun. Appl. (TOMM) 16(2), 1–21 (2020)
Nguyen, P.A., Wu, J., Ngo, C.-W., Francis, D., Huet, B.: VIREO @ video browser showdown 2020. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11962, pp. 772–777. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37734-2_68
Rossetto, L., Schuldt, H., Awad, G., Butt, A.A.: V3C – a research video collection. In: Kompatsiaris, I., Huet, B., Mezaris, V., Gurrin, C., Cheng, W.-H., Vrochidis, S. (eds.) MMM 2019. LNCS, vol. 11295, pp. 349–360. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-05710-7_29
Sauter, L., Amiri Parian, M., Gasser, R., Heller, S., Rossetto, L., Schuldt, H.: Combining boolean and multimedia retrieval in vitrivr for large-scale video search. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11962, pp. 760–765. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37734-2_66
Yuan, J., et al.: Video browser showdown by NUS. In: Schoeffmann, K., Merialdo, B., Hauptmann, A.G., Ngo, C.-W., Andreopoulos, Y., Breiteneder, C. (eds.) MMM 2012. LNCS, vol. 7131, pp. 642–645. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-27355-1_64
Acknowledgements
This paper has been supported by the Charles University Grant Agency (GA UK) project number 1310920, by Czech Science Foundation (GAČR) project 19-22071Y and by Charles University grant SVV-260588.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Peška, L., Kovalčík, G., Souček, T., Škrhák, V., Lokoč, J. (2021). W2VV++ BERT Model at VBS 2021. In: Lokoč, J., et al. MultiMedia Modeling. MMM 2021. Lecture Notes in Computer Science(), vol 12573. Springer, Cham. https://doi.org/10.1007/978-3-030-67835-7_46
Download citation
DOI: https://doi.org/10.1007/978-3-030-67835-7_46
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67834-0
Online ISBN: 978-3-030-67835-7
eBook Packages: Computer ScienceComputer Science (R0)