V-FIRST: A Flexible Interactive Retrieval System for Video at VBS 2022

Tran, Minh-Triet; Hoang-Xuan, Nhat; Trang-Trung, Hoang-Phuc; Le, Thanh-Cong; Tran, Mai-Khiem; Le, Minh-Quan; Le, Tu-Khiem; Ninh, Van-Tu; Gurrin, Cathal

doi:10.1007/978-3-030-98355-0_55

V-FIRST: A Flexible Interactive Retrieval System for Video at VBS 2022

Minh-Triet Tran ORCID: orcid.org/0000-0003-3046-3041^15,16,17,
Nhat Hoang-Xuan^15,17,
Hoang-Phuc Trang-Trung^15,16,17,
Thanh-Cong Le^15,16,17,
Mai-Khiem Tran^15,16,17,
Minh-Quan Le^15,17,
Tu-Khiem Le¹⁸,
Van-Tu Ninh¹⁸ &
…
Cathal Gurrin ORCID: orcid.org/0000-0003-2903-3968¹⁸

Conference paper
First Online: 15 March 2022

2010 Accesses
5 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13142))

Abstract

Video retrieval systems have a wide range of applications across multiple domains, therefore the development of user-friendly and efficient systems is necessary. For VBS 2022, we develop a flexible interactive system for video retrieval, namely V-FIRST, that supports two scenarios of usage: query with text descriptions and query with visual examples. We take advantage of both visual and temporal information from videos to extract concepts related to entities, events, scenes, activities, and motion trajectories for video indexing. Our system supports queries with keywords and sentence descriptions as V-FIRST can evaluate the semantic similarities between visual and textual embedding vectors. V-FIRST also allows users to express queries with visual impressions, such as sketches and 2D spatial maps of dominant colors. We use query expansion, elastic temporal video navigation, and intellisense for hints to further boost the performance of our system.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Amato, G., et al.: VISIONE at video browser showdown 2021. In: Lokoč, J., et al. (eds.) MMM 2021. LNCS, vol. 12573, pp. 473–478. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-67835-7_47
Chapter Google Scholar
Anderson, P., et al.: Bottom-up and top-down attention for image captioning and visual question answering. In: CVPR (2018)
Google Scholar
Heller, S., et al.: Towards explainable interactive multi-modal video retrieval with Vitrivr. In: Lokoč, J., et al. (eds.) MMM 2021. LNCS, vol. 12573, pp. 435–440. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-67835-7_41
Chapter Google Scholar
Kratochvíl, M., Veselý, P., Mejzlík, F., Lokoč, J.: SOM-hunter: video browsing with relevance-to-SOM feedback loop. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11962, pp. 790–795. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37734-2_71
Chapter Google Scholar
Liu, Y., Chen, H., Shen, C., He, T., Jin, L., Wang, L.: AbcNet: real-time scene text spotting with adaptive Bezier-curve network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020
Google Scholar
Nguyen, N., et al.: Dictionary-guided scene text recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7383–7392, June 2021
Google Scholar
Ressmann, A., Schoeffmann, K.: IVOS - the ITEC interactive video object search system at VBS2021. In: Lokoč, J., et al. (eds.) MMM 2021. LNCS, vol. 12573, pp. 479–483. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-67835-7_48
Chapter Google Scholar
Rossetto, L., et al.: VideoGraph – towards using knowledge graphs for interactive video retrieval. In: Lokoč, J., et al. (eds.) MMM 2021. LNCS, vol. 12573, pp. 417–422. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-67835-7_38
Chapter Google Scholar
Schoeffmann, K., Lokoc, J., Bailer, W.: 10 years of video browser showdown. In: Chua, T., et al. (eds.) MMAsia 2020: ACM Multimedia Asia, Virtual Event/Singapore, 7–9 March 2021, pp. 73:1–73:3. ACM (2020)
Google Scholar
Tan, M., Pang, R., Le, Q.V.: EfficientDet: scalable and efficient object detection. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10778–10787 (2020)
Google Scholar
Tran, D., et al.: A VR interface for browsing visual spaces at VBS2021, pp. 490–495 (2021)
Google Scholar
Tran, L.-D., et al.: A VR interface for browsing visual spaces at VBS2021. In: Lokoč, J., et al. (eds.) MMM 2021. LNCS, vol. 12573, pp. 490–495. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-67835-7_50
Chapter Google Scholar
Tran, M., et al.: FIRST - flexible interactive retrieval system for visual lifelog exploration at LSC 2020. In: Gurrin, C., et al. (eds.) Proceedings of the Third ACM Workshop on Lifelog Search Challenge, LSC@ICMR 2020, Dublin, Ireland, 8–11 June 2020, pp. 67–72. ACM (2020)
Google Scholar
Trang-Trung, H., Le, H., Tran, M.: Lifelog moment retrieval with self-attention based joint embedding model. In: Cappellato, L., Eickhoff, C., Ferro, N., Névéol, A. (eds.) Working Notes of CLEF 2020 - Conference and Labs of the Evaluation Forum, Thessaloniki, Greece, 22–25 September 2020. CEUR Workshop Proceedings, vol. 2696. CEUR-WS.org (2020). http://ceur-ws.org/Vol-2696/paper_60.pdf
Trang-Trung, H., et al.: Flexible interactive retrieval system 2.0 for visual lifelog exploration at LSC 2021. In: Gurrin, C., et al. (eds.) Proceedings of the 4th Annual on Lifelog Search Challenge, LSC@ICMR 2021, Taipei, Taiwan, 21 August 2021, pp. 81–87. ACM (2021)
Google Scholar
Vo, K., Yamazaki, K., Truong, S., Tran, M., Sugimoto, A., Le, N.: ABN: agent-aware boundary networks for temporal action proposal generation. IEEE Access 9, 126431–126445 (2021)
Article Google Scholar
Vo-Ho, V., Le, N., Yamazaki, K., Sugimoto, A., Tran, M.: Agent-environment network for temporal action proposal generation. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2021, Toronto, ON, Canada, 6–11 June 2021, pp. 2160–2164. IEEE (2021)
Google Scholar
Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., Torralba, A.: Places: a 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 40, 1452–1464 (2017)
Article Google Scholar

Download references

Acknowledgement

The team would like to thank Vinh-Hung Tran, Trong-Thang Pham for the enhanced captioning module; Trong-Tung Nguyen and Huu-Nghia Nguyen-Ho for the human-object interaction module; Tien-Phat Nguyen and Ba-Thinh Tran-Le for the moving trajectory retrieval method.

Hoang-Phuc Trang-Trung, Thanh-Cong Le, and Mai-Khiem Tran were funded by Vingroup Joint Stock Company and supported by the Domestic Master/ PhD Scholarship Programme of Vingroup Innovation Foundation (VINIF), Vingroup Big Data Institute (VINBIGDATA), code VINIF.2020.ThS.JVN.03, VINIF.2020. ThS.JVN.05, and VINIF.2020.ThS.JVN.06, respectively.

The work was funded by Gia Lam Urban Development and Investment Company Limited, Vingroup and supported by Vingroup Innovation Foundation (VINIF) under project code VINIF.2019.DA19.

Author information

Authors and Affiliations

University of Science, VNU-HCM, Ho Chi Minh City, Vietnam
Minh-Triet Tran, Nhat Hoang-Xuan, Hoang-Phuc Trang-Trung, Thanh-Cong Le, Mai-Khiem Tran & Minh-Quan Le
John von Neumann Institute, VNU-HCM, Ho Chi Minh City, Vietnam
Minh-Triet Tran, Hoang-Phuc Trang-Trung, Thanh-Cong Le & Mai-Khiem Tran
Vietnam National University, Ho Chi Minh City, Vietnam
Minh-Triet Tran, Nhat Hoang-Xuan, Hoang-Phuc Trang-Trung, Thanh-Cong Le, Mai-Khiem Tran & Minh-Quan Le
Dublin City University, Dublin, Ireland
Tu-Khiem Le, Van-Tu Ninh & Cathal Gurrin

Authors

Minh-Triet Tran
View author publications
You can also search for this author in PubMed Google Scholar
Nhat Hoang-Xuan
View author publications
You can also search for this author in PubMed Google Scholar
Hoang-Phuc Trang-Trung
View author publications
You can also search for this author in PubMed Google Scholar
Thanh-Cong Le
View author publications
You can also search for this author in PubMed Google Scholar
Mai-Khiem Tran
View author publications
You can also search for this author in PubMed Google Scholar
Minh-Quan Le
View author publications
You can also search for this author in PubMed Google Scholar
Tu-Khiem Le
View author publications
You can also search for this author in PubMed Google Scholar
Van-Tu Ninh
View author publications
You can also search for this author in PubMed Google Scholar
Cathal Gurrin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Minh-Triet Tran or Nhat Hoang-Xuan .

Editor information

Editors and Affiliations

IT University of Copenhagen, Copenhagen, Denmark
Björn Þór Jónsson
Dublin City University, Dublin, Ireland
Cathal Gurrin
University of Science, VNU-HCM, Ho Chi Minh City, Vietnam
Minh-Triet Tran
University of Bergen, Bergen, Norway
Duc-Tien Dang-Nguyen
National Tsing Hua University, Hsinchu, Taiwan
Anita Min-Chun Hu
Hanoi University of Science and Technology, Hanoi, Vietnam
Binh Huynh Thi Thanh
Median Technologies, Valbonne, France
Benoit Huet

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tran, MT. et al. (2022). V-FIRST: A Flexible Interactive Retrieval System for Video at VBS 2022. In: Þór Jónsson, B., et al. MultiMedia Modeling. MMM 2022. Lecture Notes in Computer Science, vol 13142. Springer, Cham. https://doi.org/10.1007/978-3-030-98355-0_55

Download citation

DOI: https://doi.org/10.1007/978-3-030-98355-0_55
Published: 15 March 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-98354-3
Online ISBN: 978-3-030-98355-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics