Skip to main content

ViewsInsight: Enhancing Video Retrieval for VBS 2024 with a User-Friendly Interaction Mechanism

  • Conference paper
  • First Online:
MultiMedia Modeling (MMM 2024)

Abstract

ViewsInsight revolutionizes video content retrieval with its comprehensive suite of AI-powered features, enabling users to locate relevant videos using a variety of query types effortlessly. Its intelligent query description rewriting capability ensures precise video matching, while the visual example generation feature provides a powerful tool for refining search results. Additionally, the temporal query mechanism allows users to easily pinpoint specific video segments. The system’s intuitive chat-based interface seamlessly integrates these advanced features.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://milvus.io/.

  2. 2.

    https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html.

References

  1. Gurrin, C., et al.: Introduction to the sixth annual lifelog search challenge, LSC23. In: Proceedings of the 2023 ACM International Conference on Multimedia Retrieval, ICMR 2023, pp. 678–679. Association for Computing Machinery, New York (2023). https://doi.org/10.1145/3591106.3592304

  2. Hoang-Xuan, N., et al.: V-first 2.0: video event retrieval with flexible textual-visual intermediary for VBS 2023. In: Dang-Nguyen, D.T., et al. (eds.) MMM 2023, Part I. LNCS, vol. 13833, pp. 652–657. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-27077-254

    Chapter  Google Scholar 

  3. Li, J., Li, D., Xiong, C., Hoi, S.: BLIP: bootstrapping language-image pre-training for unified vision-language understanding and generation. In: International Conference on Machine Learning, pp. 12888–12900. PMLR (2022)

    Google Scholar 

  4. Lokoč, J., Vopálková, Z., Dokoupil, P., Peška, L.: Video search with clip and interactive text query reformulation. In: Dang-Nguyen, D.T., et al. (eds.) MMM 2023, Part I. LNCS, vol. 13833, pp. 628–633. Springer, Heidelberg (2023). https://doi.org/10.1007/978-3-031-27077-2_50

    Chapter  Google Scholar 

  5. Nguyen, T.N., et al.: Videoclip: an interactive clip-based video retrieval system at VBS 2023. In: Dang-Nguyen, D.T., et al. (eds.) MMM 2023, Part I. LNCS, vol. 13833, pp. 671–677. Springer, Heidelberg (2023). https://doi.org/10.1007/978-3-031-27077-2_57

    Chapter  Google Scholar 

  6. Nguyen-Dang, T.T., et al.: LifeInsight: an interactive lifelog retrieval system with comprehensive spatial insights and query assistance. In: Proceedings of the 6th Annual ACM Lifelog Search Challenge, LSC 2023, pp. 59–64. Association for Computing Machinery, New York (2023). https://doi.org/10.1145/3592573.3593106

  7. Plummer, B.A., Wang, L., Cervantes, C.M., Caicedo, J.C., Hockenmaier, J., Lazebnik, S.: Flickr30k entities: collecting region-to-phrase correspondences for richer image-to-sentence models. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2641–2649 (2015)

    Google Scholar 

  8. Radford, A., et al.: Learning Transferable Visual Models From Natural Language Supervision (2021)

    Google Scholar 

  9. Schoeffmann, K., Lokoc, J., Bailer, W.: 10 years of video browser showdown. In: Chua, T., et al. (eds.) MMAsia 2020: ACM Multimedia Asia, Virtual Event, Singapore, 7–9 March 2021, pp. 73:1–73:3. ACM (2020). https://doi.org/10.1145/3444685.3450215

  10. Schoeffmann, K., Stefanics, D., Leibetseder, A.: DiveXplore at the video browser showdown 2023. In: Dang-Nguyen, D.T., et al. (eds.) MMM 2023, Part I. LNCS, vol. 13833, pp. 684–689. Springer, Heidelberg (2023). https://doi.org/10.1007/978-3-031-27077-2_59

    Chapter  Google Scholar 

  11. Trong-Le, D., et al.: News event retrieval from large video collection in Ho Chi Minh City AI challenge 2023. In: The 12th International Symposium on Information and Communication Technology (SOICT 2023), Ho Chi Minh, Vietnam, 7–8 December 2023 (2023). https://doi.org/10.1145/3628797.3628940

Download references

Acknowledgment

This research was funded by Vingroup and supported by Vingroup Innovation Foundation (VINIF) under project code VINIF.2019.DA19.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Minh-Triet Tran .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Vuong, GH. et al. (2024). ViewsInsight: Enhancing Video Retrieval for VBS 2024 with a User-Friendly Interaction Mechanism. In: Rudinac, S., et al. MultiMedia Modeling. MMM 2024. Lecture Notes in Computer Science, vol 14557. Springer, Cham. https://doi.org/10.1007/978-3-031-53302-0_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-53302-0_38

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-53301-3

  • Online ISBN: 978-3-031-53302-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics