V-FIRST 2.0: Video Event Retrieval with Flexible Textual-Visual Intermediary for VBS 2023

Hoang-Xuan, Nhat; Nguyen, E-Ro; Nguyen-Ho, Thang-Long; Pham, Minh-Khoi; Nguyen, Quang-Thuc; Trang-Trung, Hoang-Phuc; Ninh, Van-Tu; Le, Tu-Khiem; Gurrin, Cathal; Tran, Minh-Triet

doi:10.1007/978-3-031-27077-2_54

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13833))

Included in the following conference series:

International Conference on Multimedia Modeling

1960 Accesses
4 Citations

Abstract

In this paper, we present a new version of our interactive video retrieval system V-FIRST. Besides the existing features of querying by textual descriptions and visual examples, we propose the usage of an image generator that can generate images from a text prompt as a means to bridge the domain gap. We also include a novel referring expression segmentation module to highlight the objects in an image. This is the first step towards providing adequate explainability to retrieval results, ensuring that the system can be trusted and used in domain-specific and critical scenarios. Searching by a sequence of events is also a new addition, as it proves to be pivotal in finding events from memory. Furthermore, we improved our Optical Character Recognition capability, especially in the case of scene text. Finally, the inclusion of relevant feedback allows the user to explicitly refine the search space. All combined, our system has greatly improved user interaction, leveraging more explicit information and providing more tools for the user to work with.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

VISIONE 5.0: Enhanced User Interface and AI Models for VBS2024

V-FIRST: A Flexible Interactive Retrieval System for Video at VBS 2022

Concept-Based Interactive Search System

References

Hezel, N., Schall, K., Jung, K., Barthel, K.U.: Efficient search and browsing of large-scale video collections with vibro. In: Þór Jónsson, B., et al. (eds.) MMM 2022. LNCS, vol. 13142, pp. 487–492. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-98355-0_43
Chapter Google Scholar
Hoang-Xuan, N., et al.: Flexible interactive retrieval SysTem 2.0 for visual lifelog exploration at LSC 2021 Submitted for review
Google Scholar
Hoang-Xuan, N., et al.: Flexible interactive retrieval SysTem 3.0 for visual lifelog exploration at LSC 2022. In: Proceedings of the 5th Annual on Lifelog Search Challenge. LSC 2022, pp. 20–26. Association for Computing Machinery (2022). https://doi.org/10.1145/3512729.3533013
Lokoč, J., Mejzlík, F., Souček, T., Dokoupil, P., Peška, L.: Video search with context-aware ranker and relevance feedback. In: Þór Jónsson, B., et al. (eds.) MMM 2022. LNCS, vol. 13142, pp. 505–510. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-98355-0_46
Chapter Google Scholar
Nguyen, E.R., Hoang-Xuan, N., Tran, M.T.: Visual-language transformer for referring video object segmentation. In: The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, YouTube-VOS
Google Scholar
Nguyen, N., et al.: Dictionary-guided scene text recognition, pp. 7383–7392. https://openaccess.thecvf.com/content/CVPR2021/html/Nguyen_Dictionary-Guided_Scene_Text_Recognition_CVPR_2021_paper.html
Nguyen, T.-N., Puangthamawathanakun, B., Healy, G., Nguyen, B.T., Gurrin, C., Caputo, A.: Videofall - a hierarchical search engine for VBS2022. In: Þór Jónsson, B., et al. (eds.) MMM 2022. LNCS, vol. 13142, pp. 518–523. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-98355-0_48
Chapter Google Scholar
Schoeffmann, K., Lokoč, J., Bailer, W.: 10 years of video browser showdown. In: Proceedings of the 2nd ACM International Conference on Multimedia in Asia. MMAsia 2020, pp. 1–3. Association for Computing Machinery (2021). https://doi.org/10.1145/3444685.3450215
Tran, M.-T., et al.: V-FIRST: a flexible interactive retrieval system for video at VBS 2022. In: Þór Jónsson, B., et al. (eds.) MMM 2022. LNCS, vol. 13142, pp. 562–568. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-98355-0_55
Chapter Google Scholar
Trang-Trung, H.P., et al.: Flexible interactive retrieval SysTem 2.0 for visual lifelog exploration at LSC 2021. In: Proceedings of the 4th Annual on Lifelog Search Challenge. LSC 2021, Taipei, Taiwan, pp. 81–87. Association for Computing Machinery (2021). https://doi.org/10.1145/3463948.3469072

Download references

Acknowledgement

This research was funded by Vingroup and supported by Vingroup Innovation Foundation (VINIF) under project code VINIF.2019.DA19.

Author information

Authors and Affiliations

University of Science, VNU-HCM, Ho Chi Minh City, Vietnam
Nhat Hoang-Xuan, E-Ro Nguyen, Thang-Long Nguyen-Ho, Minh-Khoi Pham, Quang-Thuc Nguyen, Hoang-Phuc Trang-Trung & Minh-Triet Tran
John von Neumann Institute, VNU-HCM, Ho Chi Minh City, Vietnam
Quang-Thuc Nguyen, Hoang-Phuc Trang-Trung & Minh-Triet Tran
Vietnam National University, Ho Chi Minh City, Vietnam
Nhat Hoang-Xuan, E-Ro Nguyen, Thang-Long Nguyen-Ho, Minh-Khoi Pham, Quang-Thuc Nguyen, Hoang-Phuc Trang-Trung & Minh-Triet Tran
Dublin City University, Dublin, Ireland
Minh-Khoi Pham, Van-Tu Ninh, Tu-Khiem Le & Cathal Gurrin

Authors

Nhat Hoang-Xuan
View author publications
You can also search for this author in PubMed Google Scholar
E-Ro Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Thang-Long Nguyen-Ho
View author publications
You can also search for this author in PubMed Google Scholar
Minh-Khoi Pham
View author publications
You can also search for this author in PubMed Google Scholar
Quang-Thuc Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Hoang-Phuc Trang-Trung
View author publications
You can also search for this author in PubMed Google Scholar
Van-Tu Ninh
View author publications
You can also search for this author in PubMed Google Scholar
Tu-Khiem Le
View author publications
You can also search for this author in PubMed Google Scholar
Cathal Gurrin
View author publications
You can also search for this author in PubMed Google Scholar
Minh-Triet Tran
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Nhat Hoang-Xuan or Minh-Triet Tran .

Editor information

Editors and Affiliations

University of Bergen, Bergen, Norway
Duc-Tien Dang-Nguyen
Dublin City University, Dublin, Ireland
Cathal Gurrin
Radboud University Nijmegen, Nijmegen, The Netherlands
Martha Larson
Dublin City University, Dublin, Ireland
Alan F. Smeaton
University of Amsterdam, Amsterdam, The Netherlands
Stevan Rudinac
National Institute of Information and Communications Technology, Tokyo, Japan
Minh-Son Dao
Department of Information Science and Media Studies, University of Bergen, Bergen, Norway
Christoph Trattner
La Trobe University, Melbourne, VIC, Australia
Phoebe Chen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hoang-Xuan, N. et al. (2023). V-FIRST 2.0: Video Event Retrieval with Flexible Textual-Visual Intermediary for VBS 2023. In: Dang-Nguyen, DT., et al. MultiMedia Modeling. MMM 2023. Lecture Notes in Computer Science, vol 13833. Springer, Cham. https://doi.org/10.1007/978-3-031-27077-2_54

Download citation

DOI: https://doi.org/10.1007/978-3-031-27077-2_54
Published: 29 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-27076-5
Online ISBN: 978-3-031-27077-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

V-FIRST 2.0: Video Event Retrieval with Flexible Textual-Visual Intermediary for VBS 2023