Skip to main content

V-FIRST 2.0: Video Event Retrieval with Flexible Textual-Visual Intermediary for VBS 2023

  • Conference paper
  • First Online:
MultiMedia Modeling (MMM 2023)

Abstract

In this paper, we present a new version of our interactive video retrieval system V-FIRST. Besides the existing features of querying by textual descriptions and visual examples, we propose the usage of an image generator that can generate images from a text prompt as a means to bridge the domain gap. We also include a novel referring expression segmentation module to highlight the objects in an image. This is the first step towards providing adequate explainability to retrieval results, ensuring that the system can be trusted and used in domain-specific and critical scenarios. Searching by a sequence of events is also a new addition, as it proves to be pivotal in finding events from memory. Furthermore, we improved our Optical Character Recognition capability, especially in the case of scene text. Finally, the inclusion of relevant feedback allows the user to explicitly refine the search space. All combined, our system has greatly improved user interaction, leveraging more explicit information and providing more tools for the user to work with.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Hezel, N., Schall, K., Jung, K., Barthel, K.U.: Efficient search and browsing of large-scale video collections with vibro. In: Þór Jónsson, B., et al. (eds.) MMM 2022. LNCS, vol. 13142, pp. 487–492. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-98355-0_43

    Chapter  Google Scholar 

  2. Hoang-Xuan, N., et al.: Flexible interactive retrieval SysTem 2.0 for visual lifelog exploration at LSC 2021 Submitted for review

    Google Scholar 

  3. Hoang-Xuan, N., et al.: Flexible interactive retrieval SysTem 3.0 for visual lifelog exploration at LSC 2022. In: Proceedings of the 5th Annual on Lifelog Search Challenge. LSC 2022, pp. 20–26. Association for Computing Machinery (2022). https://doi.org/10.1145/3512729.3533013

  4. Lokoč, J., Mejzlík, F., Souček, T., Dokoupil, P., Peška, L.: Video search with context-aware ranker and relevance feedback. In: Þór Jónsson, B., et al. (eds.) MMM 2022. LNCS, vol. 13142, pp. 505–510. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-98355-0_46

    Chapter  Google Scholar 

  5. Nguyen, E.R., Hoang-Xuan, N., Tran, M.T.: Visual-language transformer for referring video object segmentation. In: The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, YouTube-VOS

    Google Scholar 

  6. Nguyen, N., et al.: Dictionary-guided scene text recognition, pp. 7383–7392. https://openaccess.thecvf.com/content/CVPR2021/html/Nguyen_Dictionary-Guided_Scene_Text_Recognition_CVPR_2021_paper.html

  7. Nguyen, T.-N., Puangthamawathanakun, B., Healy, G., Nguyen, B.T., Gurrin, C., Caputo, A.: Videofall - a hierarchical search engine for VBS2022. In: Þór Jónsson, B., et al. (eds.) MMM 2022. LNCS, vol. 13142, pp. 518–523. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-98355-0_48

    Chapter  Google Scholar 

  8. Schoeffmann, K., Lokoč, J., Bailer, W.: 10 years of video browser showdown. In: Proceedings of the 2nd ACM International Conference on Multimedia in Asia. MMAsia 2020, pp. 1–3. Association for Computing Machinery (2021). https://doi.org/10.1145/3444685.3450215

  9. Tran, M.-T., et al.: V-FIRST: a flexible interactive retrieval system for video at VBS 2022. In: Þór Jónsson, B., et al. (eds.) MMM 2022. LNCS, vol. 13142, pp. 562–568. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-98355-0_55

    Chapter  Google Scholar 

  10. Trang-Trung, H.P., et al.: Flexible interactive retrieval SysTem 2.0 for visual lifelog exploration at LSC 2021. In: Proceedings of the 4th Annual on Lifelog Search Challenge. LSC 2021, Taipei, Taiwan, pp. 81–87. Association for Computing Machinery (2021). https://doi.org/10.1145/3463948.3469072

Download references

Acknowledgement

This research was funded by Vingroup and supported by Vingroup Innovation Foundation (VINIF) under project code VINIF.2019.DA19.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Nhat Hoang-Xuan or Minh-Triet Tran .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hoang-Xuan, N. et al. (2023). V-FIRST 2.0: Video Event Retrieval with Flexible Textual-Visual Intermediary for VBS 2023. In: Dang-Nguyen, DT., et al. MultiMedia Modeling. MMM 2023. Lecture Notes in Computer Science, vol 13833. Springer, Cham. https://doi.org/10.1007/978-3-031-27077-2_54

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-27077-2_54

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-27076-5

  • Online ISBN: 978-3-031-27077-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics