Skip to main content

W2VV++ BERT Model at VBS 2021

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12573))

Abstract

The W2VV++ model BoW variant integrated to VIRET and SOMHunter systems has proven its effectiveness in the previous Video Browser Showdown competition in 2020. As a next experimental interactive search prototype to benchmark, we consider a simple system relying on the more complex BERT variant of the W2VV++ model, accepting a rich text input. The input can be provided by keyboard or by speech processed by a third-party cloud service. The motivation for the more complex BERT variant is its good performance for rich text descriptions that can be provided for known-item search tasks. At the same time, users will be instructed to specify as rich text description about the searched scene as possible.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Note that we are aware that reverse events could be quite common within a single video as similar cuts may repeat frequently. Nonetheless, this would still reduce the task at hand to merely finding the right scene within a video.

  2. 2.

    Note that padding is employed on the edges of individual videos.

References

  1. Alateeq, A., Roantree, M., Gurrin, C.: Voxento: a prototype voice-controlled interactive search engine for lifelogs. In: Proceedings of the Third Annual Workshop on Lifelog Search Challenge, LSC 2020, pp. 77–81. ACM, New York (2020)

    Google Scholar 

  2. Blažek, A., Lokoč, J., Skopal, T.: Video retrieval with feature signature sketches. In: Traina, A.J.M., Traina, C., Cordeiro, R.L.F. (eds.) SISAP 2014. LNCS, vol. 8821, pp. 25–36. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11988-5_3

    Chapter  Google Scholar 

  3. Cobârzan, C., et al.: Interactive video search tools: a detailed analysis of the video browser showdown 2015. Multimed. Tools Appl. 76(4), 5539–5571 (2016). https://doi.org/10.1007/s11042-016-3661-2

    Article  Google Scholar 

  4. Hirzel, M., Schneider, S., Tangwongsan, K.: Sliding-window aggregation algorithms: tutorial. In: Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems, pp. 11–14. ACM (2017)

    Google Scholar 

  5. Klement, E.P., Mesiar, R., Pap, E.: Families of t-norms. In: Klement, E.P., Mesiar, R., Pap, E. (eds.) Triangular Norms, vol. 8, pp. 101–119. Springer, Dordrecht (2000). https://doi.org/10.1007/978-94-015-9540-7_4

    Chapter  MATH  Google Scholar 

  6. Kratochvíl, M., Veselý, P., Mejzlík, F., Lokoč, J.: SOM-hunter: video browsing with relevance-to-SOM feedback loop. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11962, pp. 790–795. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37734-2_71

    Chapter  Google Scholar 

  7. Li, X., Xu, C., Yang, G., Chen, Z., Dong, J.: W2VV++: fully deep learning for ad-hoc video search. In: Proceedings of the 27th ACM International Conference on Multimedia, MM 2019, Nice, France, 21–25 October 2019, pp. 1786–1794 (2019)

    Google Scholar 

  8. Lokoč, J., et al.: A W2VV++ case study with automated and interactive text-to-video retrieval. In: Proceedings of the 28th ACM International Conference on Multimedia, MM 2020. ACM, New York (2020)

    Google Scholar 

  9. Lokoč, J., Bailer, W., Schoeffmann, K., Münzer, B., Awad, G.: On influential trends in interactive video retrieval: video browser showdown 2015–2017. IEEE Trans. Multimed. 20(12), 3361–3376 (2018)

    Article  Google Scholar 

  10. Lokoč, J., et al.: Interactive search or sequential browsing? A detailed analysis of the video browser showdown 2018. ACM Trans. Multimed. Comput. Commun. Appl. 15(1), 29:1–29:18 (2019)

    Article  Google Scholar 

  11. Lokoč, J., Kovalčík, G., Souček, T., Moravec, J., Čech, P.: A framework for effective known-item search in video. In: Proceedings of the 27th ACM International Conference on Multimedia, MM 2019, pp. 1777–1785. ACM, New York (2019)

    Google Scholar 

  12. Lokoč, J., Kovalčík, G., Souček, T., Moravec, J., Čech, P.: VIRET: a video retrieval tool for interactive known-item search. In: Proceedings of the 2019 on International Conference on Multimedia Retrieval, ICMR 2019, pp. 177–181. ACM, New York (2019)

    Google Scholar 

  13. Mettes, P., Koelma, D.C., Snoek, C.G.M.: Shuffled imagenet banks for video event detection and search. ACM Trans. Multimed. Comput. Commun. Appl. (TOMM) 16(2), 1–21 (2020)

    Article  Google Scholar 

  14. Nguyen, P.A., Wu, J., Ngo, C.-W., Francis, D., Huet, B.: VIREO @ video browser showdown 2020. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11962, pp. 772–777. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37734-2_68

    Chapter  Google Scholar 

  15. Rossetto, L., Schuldt, H., Awad, G., Butt, A.A.: V3C – a research video collection. In: Kompatsiaris, I., Huet, B., Mezaris, V., Gurrin, C., Cheng, W.-H., Vrochidis, S. (eds.) MMM 2019. LNCS, vol. 11295, pp. 349–360. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-05710-7_29

    Chapter  Google Scholar 

  16. Sauter, L., Amiri Parian, M., Gasser, R., Heller, S., Rossetto, L., Schuldt, H.: Combining boolean and multimedia retrieval in vitrivr for large-scale video search. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11962, pp. 760–765. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37734-2_66

    Chapter  Google Scholar 

  17. Yuan, J., et al.: Video browser showdown by NUS. In: Schoeffmann, K., Merialdo, B., Hauptmann, A.G., Ngo, C.-W., Andreopoulos, Y., Breiteneder, C. (eds.) MMM 2012. LNCS, vol. 7131, pp. 642–645. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-27355-1_64

    Chapter  Google Scholar 

Download references

Acknowledgements

This paper has been supported by the Charles University Grant Agency (GA UK) project number 1310920, by Czech Science Foundation (GAČR) project 19-22071Y and by Charles University grant SVV-260588.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ladislav Peška .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Peška, L., Kovalčík, G., Souček, T., Škrhák, V., Lokoč, J. (2021). W2VV++ BERT Model at VBS 2021. In: Lokoč, J., et al. MultiMedia Modeling. MMM 2021. Lecture Notes in Computer Science(), vol 12573. Springer, Cham. https://doi.org/10.1007/978-3-030-67835-7_46

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-67835-7_46

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-67834-0

  • Online ISBN: 978-3-030-67835-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics