ABSTRACT
Event retrieval from large collections of TV news videos is crucial for efficient information access, enabling researchers, journalists, and the general public to quickly locate and analyze relevant content amidst the vast sea of news coverage, facilitating informed decision-making and a comprehensive understanding of significant events. This paper presents an overview of the AI-driven video retrieval task in Ho Chi Minh City AI Challenge 2023. The competition draws inspiration from internationally recognized competitions, namely the Video Browser Showdown (VBS) and the Lifelog Search Challenge (LSC). Participants are tasked with developing AI models to retrieve specific video segments from a diverse dataset from reputable news channels. The dataset comprises a vast collection of videos, keyframes, object detections, CLIP features, and metadata. It is divided into three packs with a total of 1,270 videos, spanning approximately 360 hours of content. The challenge comprises two groups. Group A is open to students, researchers, and practitioners in artificial intelligence and information retrieval, emphasizing substantial knowledge and experience. Group B is tailored for high school students, focusing on nurturing interest, learning, and engagement among the next generation of AI enthusiasts. The wide variation in the content of queries challenged participants to demonstrate their adaptability and creativity in effectively retrieving diverse events from the extensive TV news video dataset. The winning teams showcased promising solutions by effectively harnessing artificial intelligence and information retrieval techniques to excel in event retrieval from a vast collection of TV news videos.
- Huy-Giap Bui, Minh-Huy Trinh, Canh-Toan Le, Quoc-Lam Vu, and Khac-Trieu Vo. 2023. Zero-shot Video Retrieval using CLIP with Temporally Ordered Multi-query Scoring. In The 12th International Symposium on Information and Communication Technology, SoICT 2023, Ho Chi Minh City, Vietnam, December 7-8, 2023. ACM.Google ScholarDigital Library
- Bao Tran Gia, Tuong Bui Cong Khanh, Khoa Tran Nhat, Kien Luu Trung, Thuyen Tran Doan, Khiem Le Tran Trong, Tien Do Van, and Thanh Ngo Duc. 2023. Integrating Multiple Models For Effective Video Retrieval and Multi-stage Search. In The 12th International Symposium on Information and Communication Technology, SoICT 2023, Ho Chi Minh City, Vietnam, December 7-8, 2023. ACM.Google Scholar
- Cathal Gurrin, Björn Þór Jónsson, Duc Tien Dang Nguyen, Graham Healy, Jakub Lokoc, Liting Zhou, Luca Rossetto, Minh-Triet Tran, Wolfgang Hürst, Werner Bailer, and Klaus Schoeffmann. 2023. Introduction to the Sixth Annual Lifelog Search Challenge, LSC’23. In Proceedings of the 2023 International Conference on Multimedia Retrieval (ICMR’23) (Thessaloniki, Greece) (ICMR ’23). Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3591106.3592304Google ScholarDigital Library
- Silvan Heller, Ralph Gasser, Mahnaz Parian-Scherb, Sanja Popovic, Luca Rossetto, Loris Sauter, Florian Spiess, and Heiko Schuldt. 2021. Interactive Multimodal Lifelog Retrieval with Vitrivr at LSC 2021. In Proceedings of the 4th Annual on Lifelog Search Challenge (Taipei, Taiwan) (LSC ’21). Association for Computing Machinery, New York, NY, USA, 35–39. https://doi.org/10.1145/3463948.3469062Google ScholarDigital Library
- Nhat Hoang-Xuan, Hoang-Phuc Trang-Trung, E.-Ro Nguyen, Thanh-Cong Le, Mai-Khiem Tran, Tu-Khiem Le, Van-Tu Ninh, Cathal Gurrin, and Minh-Triet Tran. 2022. Flexible Interactive Retrieval SysTem 3.0 for Visual Lifelog Exploration at LSC 2022. In LSC@ICMR 2022: Proceedings of the 5th Annual on Lifelog Search Challenge, Newark, NJ, USA, June 27 - 30, 2022, Cathal Gurrin, Graham Healy, Liting Zhou, Björn Þór Jónsson, Duc-Tien Dang-Nguyen, Jakub Lokoc, Minh-Triet Tran, Wolfgang Hürst, Luca Rossetto, and Klaus Schoeffmann (Eds.). ACM, 20–26. https://doi.org/10.1145/3512729.3533013Google ScholarDigital Library
- Junnan Li, Dongxu Li, Caiming Xiong, and Steven Hoi. 2022. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In International Conference on Machine Learning. PMLR, 12888–12900.Google Scholar
- Jakub Lokoč, Patrik Veselý, František Mejzlík, Gregor Kovalčík, Tomáš Souček, Luca Rossetto, Klaus Schoeffmann, Werner Bailer, Cathal Gurrin, Loris Sauter, Jaeyub Song, Stefanos Vrochidis, Jiaxin Wu, and Björn þóR Jónsson. 2021. Is the Reign of Interactive Search Eternal? Findings from the Video Browser Showdown 2020. ACM Trans. Multimedia Comput. Commun. Appl. 17, 3, Article 91 (jul 2021), 26 pages. https://doi.org/10.1145/3445031Google ScholarDigital Library
- B. E. Moore and J. J. Corso. 2020. FiftyOne. GitHub. Note: https://github.com/voxel51/fiftyone (2020).Google Scholar
- Thao-Nhu Nguyen, Tu-Khiem Le, Van-Tu Ninh, Minh-Triet Tran, Thanh Binh Nguyen, Graham Healy, Sinéad Smyth, Annalina Caputo, and Cathal Gurrin. 2022. LifeSeeker 4.0: An Interactive Lifelog Search Engine for LSC’22. In Proceedings of the 5th Annual on Lifelog Search Challenge (Newark, NJ, USA) (LSC ’22). Association for Computing Machinery, New York, NY, USA, 14–19. https://doi.org/10.1145/3512729.3533014Google ScholarDigital Library
- Thao-Nhu Nguyen, Tu-Khiem Le, Van-Tu Ninh, Minh-Triet Tran, Nguyen Thanh Binh, Graham Healy, Annalina Caputo, and Cathal Gurrin. 2021. LifeSeeker 3.0: An Interactive Lifelog Search Engine for LSC’21. In Proceedings of the 4th Annual on Lifelog Search Challenge (Taipei, Taiwan) (LSC ’21). Association for Computing Machinery, New York, NY, USA, 41–46. https://doi.org/10.1145/3463948.3469065Google ScholarDigital Library
- Tien-Thanh Nguyen-Dang, Xuan-Dang Thai, Gia-Huy Vuong, Van-Son Ho, Minh-Triet Tran, Van-Tu Ninh, Minh-Khoi Pham, Tu-Khiem Le, and Graham Healy. 2023. LifeInsight: An Interactive Lifelog Retrieval System with Comprehensive Spatial Insights and Query Assistance. In Proceedings of the 6th Annual ACM Lifelog Search Challenge (Thessaloniki, Greece) (LSC ’23). Association for Computing Machinery, New York, NY, USA, 59–64. https://doi.org/10.1145/3592573.3593106Google ScholarDigital Library
- Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. CoRR abs/2103.00020 (2021). arXiv:2103.00020https://arxiv.org/abs/2103.00020Google Scholar
- Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. arxiv:2103.00020 [cs.CV]Google Scholar
- Ricardo Ribiero, Alina Trifan, and Antonio J. R. Neves. 2022. MEMORIA: A Memory Enhancement and MOment RetrIeval Application for LSC 2022. In Proceedings of the 5th Annual on Lifelog Search Challenge (Newark, NJ, USA) (LSC ’22). Association for Computing Machinery, New York, NY, USA, 8–13. https://doi.org/10.1145/3512729.3533011Google ScholarDigital Library
- Luca Rossetto, Ralph Gasser, Loris Sauter, Abraham Bernstein, and Heiko Schuldt. 2021. A System for Interactive Multimedia Retrieval Evaluations. In MultiMedia Modeling, Jakub Lokoč, Tomáš Skopal, Klaus Schoeffmann, Vasileios Mezaris, Xirong Li, Stefanos Vrochidis, and Ioannis Patras (Eds.). Springer International Publishing, Cham, 385–390.Google Scholar
- Ly-Duyen Tran, Manh-Duy Nguyen, Nguyen Thanh Binh, Hyowon Lee, and Cathal Gurrin. 2021. Myscéal 2.0: A Revised Experimental Interactive Lifelog Retrieval System for LSC’21. Proceedings of the 4th Annual on Lifelog Search Challenge (2021).Google ScholarDigital Library
- Ly-Duyen Tran, Manh-Duy Nguyen, Binh Nguyen, Hyowon Lee, Liting Zhou, and Cathal Gurrin. 2022. E-Myscéal: Embedding-Based Interactive Lifelog Retrieval System for LSC’22. In Proceedings of the 5th Annual on Lifelog Search Challenge (Newark, NJ, USA) (LSC ’22). Association for Computing Machinery, New York, NY, USA, 32–37. https://doi.org/10.1145/3512729.3533012Google ScholarDigital Library
- Minh-Triet Tran, Thanh-An Nguyen, Quoc-Cuong Tran, Mai-Khiem Tran, Khanh Nguyen, Van-Tu Ninh, Tu-Khiem Le, Hoang-Phuc Trang-Trung, Hoang-Anh Le, Hai-Dang Nguyen, Trong-Le Do, Viet-Khoa Vo-Ho, and Cathal Gurrin. 2020. FIRST - Flexible Interactive Retrieval SysTem for Visual Lifelog Exploration at LSC 2020. In Proceedings of the Third ACM Workshop on Lifelog Search Challenge, LSC@ICMR 2020, Dublin, Ireland, June 8-11, 2020, Cathal Gurrin, Klaus Schöffmann, Björn Þór Jónsson, Duc-Tien Dang-Nguyen, Jakub Lokoc, Minh-Triet Tran, and Wolfgang Hürst (Eds.). ACM, 67–72. https://doi.org/10.1145/3379172.3391726Google ScholarDigital Library
- Minh-Nam Tran, Tuan-An To, Viet-Nhat Thai, Thanh-Duy Cao, and Trong-Tin Nguyen. 2023. AGAIN: A Multimodal Human-Centric Event Retrieval System using dual image-to-text representations. In The 12th International Symposium on Information and Communication Technology, SoICT 2023, Ho Chi Minh City, Vietnam, December 7-8, 2023. ACM.Google ScholarDigital Library
- Sieu Tran, Duc Minh Nguyen, Triet Huynh Minh Nguyen, Danh Phuc Ngo, Thu Minh Nguyen, Hao Anh Vo, Khiem Le, Tien Do, and Thanh Duc Ngo. 2023. Diverse Search Methods and Multi-Modal Fusion for High-Performance Video Retrieval. In The 12th International Symposium on Information and Communication Technology, SoICT 2023, Ho Chi Minh City, Vietnam, December 7-8, 2023. ACM.Google Scholar
- Gia Huy Vuong, Van-Son Ho, Tien-Thanh Nguyen-Dang, Xuan-Dang Thai, Van-Tu Ninh, Minh-Khoi Pham, Tu-Khiem Le, Graham Healy, and Minh-Triet Tran. 2023. NewsInsight: A Comprehensive Video Event Retrieval System with Spatial Insights and Query Assistance. In The 12th International Symposium on Information and Communication Technology, SoICT 2023, Ho Chi Minh City, Vietnam, December 7-8, 2023. ACM.Google Scholar
Index Terms
- News Event Retrieval from Large Video Collection in Ho Chi Minh City AI Challenge 2023
Recommendations
NewsInsight: A Comprehensive Video Event Retrieval System with Spatial Insights and Query Assistance
SOICT '23: Proceedings of the 12th International Symposium on Information and Communication TechnologyVideo event retrieval is the task of finding videos that are relevant to a given query. It is a challenging problem because videos are typically much larger than images, and they can contain a variety of different objects and scenes. However, there are ...
Improving video event retrieval by user feedback
In content based video retrieval videos are often indexed with semantic labels (concepts) using pre-trained classifiers. These pre-trained classifiers (concept detectors), are not perfect, and thus the labels are noisy. Additionally, the amount of pre-...
An Interactive System for Multimedia Retrieval in Video Collection with Temporal Integration
SOICT '23: Proceedings of the 12th International Symposium on Information and Communication TechnologyMultimedia retrieval in computer science is the process of obtaining text, images, videos, and audio segments, all in digital form relevant to an information need from a collection of these resources. With the ever-growing amount of data, scalable and ...
Comments