ABSTRACT
In this study, we investigated methods to improve video retrieval performance to detect more appropriate videos by expanding the input query sentences in the video retrieval task. For query expansion, we used ChatGPT, which can generate rich text, to create multiple query sentences with the same meaning but different expressions from the original query sentences. We conducted a large-scale video retrieval experiment using the latest pre-trained image-text embedding models and confirmed the effectiveness of improving the baseline accuracy.
- A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, I. Sutskever, “Learning Transferable Visual Models From Natural Language Supervision,” arXiv:2103.00020, 2021.Google Scholar
- C. Schuhmann, R. Beaumont, R. Vencu, C. Gordon, R. Wightman, M. Cherti, T. Coombes, A. Katta, C. Mullis, M. Wortsman, P. Schramowski, S. Kundurthy, K. Crowson, L. Schmidt, R. Kaczmarczyk, J. Jitsev, “LAION-5B: An open large-scale dataset for training next generation image-text models,” In 36th Conference on Neural Information Processing Systems (NeurIPS), 2022.Google Scholar
- G. Awad, K. Curtis, A. A. Butt, J. Fiscus, A. Godil, Y. Lee, A. Delgado, J. Zhang, E. Godard, B. Chocot, L. Diduch, J. Liu, Y. Graham, G. Quénot, “An overview on the evaluated video retrieval tasks at TRECVID 2022,” In Proc. of TRECVID 2022, 2022.Google Scholar
- K. Ueki, K. Hirakawa, K. Kikuchi, T. Ogawa, T. Kobayashi, “Waseda_Meisei at TRECVID 2017: Ad-hoc Video Search,” In Proc. of TRECVID 2017, 2017.Google Scholar
- A. Habibian, T. Mensink, and C. G. M. Snoek, “VideoStory: A New Multimedia Embedding for Few-Example Recognition and Translation of Events,” In Proc. of the ACM Conference on Multimedia, 2014.Google ScholarDigital Library
- J. Dong, X. Li, C. Xu, S. Ji, Y. He, G. Yang, and X. Wang, “Dual Encoding for Zero-Example Video Retrieval,” In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.Google ScholarCross Ref
- J. Xu, T. Mei, T. Yao, Y. Rui, “MSR-VTT: A Large Video Description Dataset for Bridging Video and Language,” In Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.Google ScholarCross Ref
- X. Wang, J. Wu, J. Chen, L. Li, Y.-F. Wang, W. Y. Wang, “VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research,” In Proc. of IEEE International Conference on Computer Vision (ICCV), 2019.Google ScholarCross Ref
- P. Sharma, N. Ding, S. Goodman, and R. Soricut, “Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning,” In Proc. of the 56th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 2556-2565, 2018.Google ScholarCross Ref
- B. Thomee, D.A. Shamma, G. Friedland, B. Elizalde, K. Ni, D. Poland, D. Borth, and L. Li, “YFCC100M: The New Data in Multimedia Research,” Communications of the ACM, vol.59, no.2, pp.64-73, 2016.Google ScholarDigital Library
- C. Schuhmann, R. Beaumont, R. Vencu, C. Gordon, R. Wightman, M. Cherti, T. Coombes, A. Katta, C. Mullis, M. Wortsman, P. Schramowski, S. Kundurthy, K. Crowson, L. Schmidt, R. Kaczmarczyk, J. Jitsev, “LAION-5B: An open large-scale dataset for training next generation image-text models,” In 36th Conference on Neural Information Processing Systems (NeurIPS), 2022.Google Scholar
- N. Mu, A. Kirillov, D. Wagner, S. Xie, “SLIP: Self-supervision meets Language-Image Pre-training,” arXiv:2112.12750, 2021.Google Scholar
- K. Ueki, Y. Suzuki, H. Takushima, H. Okamoto, H. Tanoue, T. Hori, “Waseda_Meisei_SoftBank at TRECVID 2022 Ad-hoc Video Search,” In Proc. of TRECVID 2022, 2022.Google Scholar
Index Terms
- Improving Video Retrieval Performance with Query Expansion Using ChatGPT
Recommendations
Improving query expansion using pseudo-relevant web knowledge for information retrieval
Highlights- Web knowledge-based query expansion technique uses the top N pseudo relevant web pages
AbstractIn the field of information retrieval, query expansion (QE) has long been used as a technique to deal with the fundamental issue of word mismatch between a user’s query and the target information. In the context of the relationship ...
Document expansion for image retrieval
RIAO '10: Adaptivity, Personalization and Fusion of Heterogeneous InformationSuccessful information retrieval requires effective matching between the user's search request and the contents of relevant documents. Often the request entered by a user may not use the same topic relevant terms as the authors' of these documents. One ...
Query expansion techniques for information retrieval: A survey
AbstractWith the ever increasing size of the web, relevant information extraction on the Internet with a query formed by a few keywords has become a big challenge. Query Expansion (QE) plays a crucial role in improving searches on the ...
Comments