Waseda_Meisei_SoftBank at Video Browser Showdown 2024

Hori, Takayuki; Ueki, Kazuya; Suzuki, Yuma; Takushima, Hiroki; Tanoue, Hayato; Sato, Haruki; Takada, Takumi; Kumar, Aiswariya Manoj

doi:10.1007/978-3-031-53302-0_26

Takayuki Hori ORCID: orcid.org/0000-0001-8232-5922^15,16,
Kazuya Ueki ORCID: orcid.org/0009-0005-1691-1858¹⁴,
Yuma Suzuki¹⁵,
Hiroki Takushima¹⁵,
Hayato Tanoue¹⁵,
Haruki Sato¹⁵,
Takumi Takada¹⁵ &
…
Aiswariya Manoj Kumar¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14557))

Included in the following conference series:

International Conference on Multimedia Modeling

952 Accesses

Abstract

This paper presents our first interactive video browser system “System 4 Vision”. Because our system is based on the system that achieved the highest video retrieval accuracy in the AVS task of the TRECVID benchmark 2022, high retrieval accuracy can be expected in the Video Browser Showdown competition. Our system is characterized by the availability of rich text input, including complicated multiple conditions as queries, because our system uses the visual-semantic embedding method represented by Contrastive Language-Image Pre-training (CLIP).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

ViewsInsight2.0: Enhancing Video Retrieval for VBS 2025 with an Automatic Query Generator Powered by Large Language Models

Interactive video search tools: a detailed analysis of the video browser showdown 2015

Article Open access 23 July 2016

Interactive video retrieval in the age of effective joint embedding deep models: lessons from the 11th VBS

Article 24 August 2023

Notes

1.
https://github.com/mlfoundations/open_clip.

References

Frome, A., et al.: DeViSE: a deep visual-semantic embedding model. In: Proceedings of Advances in Neural Information Processing Systems (NIPS), vol. 26 (2013)
Google Scholar
Schoeffmann, K., Lokoč, J., Bailer, W.: 10 years of video browser showdown. In: MMAsia 2020: ACM Multimedia Asia (2022)
Google Scholar
Faghri, F., Fleet, D.J., Kiros, R., Fidler, S.: VSE++: improved visual-semantic embeddings. arXiv:1707.05612 (2017)
Lee, K.-H., Chen, X., Hua, G., Hu, H., He, X.: Stacked cross attention for image-text matching. In: Proceedings of European Conference on Computer Vision (ECCV) (2018)
Google Scholar
Liu, C., Mao, Z., Zhang, T., Xie, H., Wang, B., Zhang, Y.: Graph structured network for image-text matching. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Google Scholar
Radford, A., et al.: Learning transferable visual models from natural language supervision. arXiv:2103.00020 (2021)
Mu, N., Kirillov, A., Wagner, D., Xie, S.: SLIP: self-supervision meets language-image pre-training. arXiv:2112.12750 (2021)
Schuhmann, C., et al.: LAION-5B: an open large-scale dataset for training next generation image-text models. In: 36th Conference on Neural Information Processing Systems (NeurIPS) (2022)
Google Scholar
Ueki, K., Suzuki, Y., Takushima, H., Okamoto, H., Tanoue, H., Hori, T.: Waseda_Meisei_SoftBank at TRECVID 2022 ad-hoc video search. In: Notebook paper of the TRECVID 2022 Workshop (2022)
Google Scholar

Download references

Acknowledgments

This work was partially supported by the Telecommunications Advancement Foundation.

Author information

Authors and Affiliations

Meisei University, 2-1-1 Hodokubo, Hino, Tokyo, 191-8506, Japan
Kazuya Ueki
Softbank Corp, Kaigan 1-7-1, Minato-ku, Tokyo, 105-7529, Japan
Takayuki Hori, Yuma Suzuki, Hiroki Takushima, Hayato Tanoue, Haruki Sato, Takumi Takada & Aiswariya Manoj Kumar
Waseda University, 3-4-1, Okubo, Shinjuku-ku, Tokyo, 169-8555, Japan
Takayuki Hori

Authors

Takayuki Hori
View author publications
You can also search for this author in PubMed Google Scholar
Kazuya Ueki
View author publications
You can also search for this author in PubMed Google Scholar
Yuma Suzuki
View author publications
You can also search for this author in PubMed Google Scholar
Hiroki Takushima
View author publications
You can also search for this author in PubMed Google Scholar
Hayato Tanoue
View author publications
You can also search for this author in PubMed Google Scholar
Haruki Sato
View author publications
You can also search for this author in PubMed Google Scholar
Takumi Takada
View author publications
You can also search for this author in PubMed Google Scholar
Aiswariya Manoj Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Takayuki Hori .

Editor information

Editors and Affiliations

University of Amsterdam, Amsterdam, The Netherlands
Stevan Rudinac
Delft University of Technology, Delft, The Netherlands
Alan Hanjalic
Delft University of Technology, Delft, The Netherlands
Cynthia Liem
University of Amsterdam, Amsterdam, The Netherlands
Marcel Worring
Reykjavik University, Reykjavik, Iceland
Björn Þór Jónsson
Microsoft Research Lab – Asia, Beijing, China
Bei Liu
The University of Tokyo, Tokyo, Japan
Yoko Yamakata

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hori, T. et al. (2024). Waseda_Meisei_SoftBank at Video Browser Showdown 2024. In: Rudinac, S., et al. MultiMedia Modeling. MMM 2024. Lecture Notes in Computer Science, vol 14557. Springer, Cham. https://doi.org/10.1007/978-3-031-53302-0_26

Download citation

DOI: https://doi.org/10.1007/978-3-031-53302-0_26
Published: 29 January 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-53301-3
Online ISBN: 978-3-031-53302-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Waseda_Meisei_SoftBank at Video Browser Showdown 2024