skip to main content
10.1145/3647649.3647716acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicigpConference Proceedingsconference-collections
research-article

Improving Video Retrieval Performance with Query Expansion Using ChatGPT

Published:03 May 2024Publication History

ABSTRACT

In this study, we investigated methods to improve video retrieval performance to detect more appropriate videos by expanding the input query sentences in the video retrieval task. For query expansion, we used ChatGPT, which can generate rich text, to create multiple query sentences with the same meaning but different expressions from the original query sentences. We conducted a large-scale video retrieval experiment using the latest pre-trained image-text embedding models and confirmed the effectiveness of improving the baseline accuracy.

References

  1. A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, I. Sutskever, “Learning Transferable Visual Models From Natural Language Supervision,” arXiv:2103.00020, 2021.Google ScholarGoogle Scholar
  2. C. Schuhmann, R. Beaumont, R. Vencu, C. Gordon, R. Wightman, M. Cherti, T. Coombes, A. Katta, C. Mullis, M. Wortsman, P. Schramowski, S. Kundurthy, K. Crowson, L. Schmidt, R. Kaczmarczyk, J. Jitsev, “LAION-5B: An open large-scale dataset for training next generation image-text models,” In 36th Conference on Neural Information Processing Systems (NeurIPS), 2022.Google ScholarGoogle Scholar
  3. G. Awad, K. Curtis, A. A. Butt, J. Fiscus, A. Godil, Y. Lee, A. Delgado, J. Zhang, E. Godard, B. Chocot, L. Diduch, J. Liu, Y. Graham, G. Quénot, “An overview on the evaluated video retrieval tasks at TRECVID 2022,” In Proc. of TRECVID 2022, 2022.Google ScholarGoogle Scholar
  4. K. Ueki, K. Hirakawa, K. Kikuchi, T. Ogawa, T. Kobayashi, “Waseda_Meisei at TRECVID 2017: Ad-hoc Video Search,” In Proc. of TRECVID 2017, 2017.Google ScholarGoogle Scholar
  5. A. Habibian, T. Mensink, and C. G. M. Snoek, “VideoStory: A New Multimedia Embedding for Few-Example Recognition and Translation of Events,” In Proc. of the ACM Conference on Multimedia, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Dong, X. Li, C. Xu, S. Ji, Y. He, G. Yang, and X. Wang, “Dual Encoding for Zero-Example Video Retrieval,” In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.Google ScholarGoogle ScholarCross RefCross Ref
  7. J. Xu, T. Mei, T. Yao, Y. Rui, “MSR-VTT: A Large Video Description Dataset for Bridging Video and Language,” In Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.Google ScholarGoogle ScholarCross RefCross Ref
  8. X. Wang, J. Wu, J. Chen, L. Li, Y.-F. Wang, W. Y. Wang, “VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research,” In Proc. of IEEE International Conference on Computer Vision (ICCV), 2019.Google ScholarGoogle ScholarCross RefCross Ref
  9. P. Sharma, N. Ding, S. Goodman, and R. Soricut, “Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning,” In Proc. of the 56th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 2556-2565, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  10. B. Thomee, D.A. Shamma, G. Friedland, B. Elizalde, K. Ni, D. Poland, D. Borth, and L. Li, “YFCC100M: The New Data in Multimedia Research,” Communications of the ACM, vol.59, no.2, pp.64-73, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C. Schuhmann, R. Beaumont, R. Vencu, C. Gordon, R. Wightman, M. Cherti, T. Coombes, A. Katta, C. Mullis, M. Wortsman, P. Schramowski, S. Kundurthy, K. Crowson, L. Schmidt, R. Kaczmarczyk, J. Jitsev, “LAION-5B: An open large-scale dataset for training next generation image-text models,” In 36th Conference on Neural Information Processing Systems (NeurIPS), 2022.Google ScholarGoogle Scholar
  12. N. Mu, A. Kirillov, D. Wagner, S. Xie, “SLIP: Self-supervision meets Language-Image Pre-training,” arXiv:2112.12750, 2021.Google ScholarGoogle Scholar
  13. K. Ueki, Y. Suzuki, H. Takushima, H. Okamoto, H. Tanoue, T. Hori, “Waseda_Meisei_SoftBank at TRECVID 2022 Ad-hoc Video Search,” In Proc. of TRECVID 2022, 2022.Google ScholarGoogle Scholar

Index Terms

  1. Improving Video Retrieval Performance with Query Expansion Using ChatGPT

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      ICIGP '24: Proceedings of the 2024 7th International Conference on Image and Graphics Processing
      January 2024
      480 pages
      ISBN:9798400716720
      DOI:10.1145/3647649

      Copyright © 2024 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 3 May 2024

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited
    • Article Metrics

      • Downloads (Last 12 months)5
      • Downloads (Last 6 weeks)5

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format