Vi-ATISO: An Effective Video Search Engine at AI Challenge HCMC 2023
Pages 960 - 965
Abstract
In this paper, we present the first version of Vi-ATISO, a fast and efficient video search engine on medium-scale datasets. The tool provides several search functions based on text-to-image retrieval, text-to-video retrieval, optical character recognition, and object detection algorithms. With diverse algorithms provided, our system can handle a larger amount of data from the AI Challenge HCMC 2023 and achieve good results. In addition, we feel confident that this search engine can be applied in practice because we also consider user experience during the development process.
References
[1]
Giuseppe Amato, Paolo Bolettieri, Fabio Carrara, Fabrizio Falchi, Claudio Gennaro, Nicola Messina, Lucia Vadicamo, and Claudio Vairo. 2023. VISIONE at Video Browser Showdown 2023. In MultiMedia Modeling: 29th International Conference, MMM 2023, Bergen, Norway, January 9–12, 2023, Proceedings, Part I (Bergen, Norway). Springer-Verlag, Berlin, Heidelberg, 615–621. https://doi.org/10.1007/978-3-031-27077-2_48
[2]
John Canny. 1986. A Computational Approach to Edge Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-8, 6 (1986), 679–698. https://doi.org/10.1109/TPAMI.1986.4767851
[3]
Han Fang, Pengfei Xiong, Luhui Xu, and Yu Chen. 2021. CLIP2Video: Mastering Video-Text Retrieval via Image CLIP. arXiv preprint arXiv:2106.11097 (2021).
[4]
Nhat Hoang-Xuan, E-Ro Nguyen, Thang-Long Nguyen-Ho, Minh-Khoi Pham, Quang-Thuc Nguyen, Hoang-Phuc Trang-Trung, Van-Tu Ninh, Tu-Khiem Le, Cathal Gurrin, and Minh-Triet Tran. 2023. V-FIRST 2.0: Video Event Retrieval with Flexible Textual-Visual Intermediary for VBS 2023. In MultiMedia Modeling, Duc-Tien Dang-Nguyen, Cathal Gurrin, Martha Larson, Alan F. Smeaton, Stevan Rudinac, Minh-Son Dao, Christoph Trattner, and Phoebe Chen (Eds.). Springer International Publishing, Cham, 652–657.
[5]
Tsung-Yi Lin, Michael Maire, Serge J. Belongie, Lubomir D. Bourdev, Ross B. Girshick, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. CoRR abs/1405.0312 (2014). arXiv:1405.0312http://arxiv.org/abs/1405.0312
[6]
Nicola Messina, Matteo Stefanini, Marcella Cornia, Lorenzo Baraldi, Fabrizio Falchi, Giuseppe Amato, and Rita Cucchiara. 2022. ALADIN: Distilling Fine-grained Alignment Scores for Efficient Image-Text Matching and Retrieval. In International Conference on Content-based Multimedia Indexing. 64–70.
[7]
George A. Miller. 1995. WordNet: A Lexical Database for English. Commun. ACM 38, 11 (nov 1995), 39–41. https://doi.org/10.1145/219717.219748
[8]
Dat Quoc Nguyen and Anh Tuan Nguyen. 2020. PhoBERT: Pre-trained language models for Vietnamese. In Findings of the Association for Computational Linguistics: EMNLP 2020. 1037–1042.
[9]
Liudmila Prokhorenkova and Aleksandr Shekhovtsov. 2020. Graph-Based Nearest Neighbor Search: From Practice to Theory. In Proceedings of the 37th International Conference on Machine Learning(ICML’20). JMLR.org, Article 723, 11 pages.
[10]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. CoRR abs/2103.00020 (2021). arXiv:2103.00020https://arxiv.org/abs/2103.00020
[11]
Jerome Revaud, Jon Almazan, Rafael Sampaio de Rezende, and Cesar Roberto de Souza. 2019. Learning with Average Precision: Training Image Retrieval with a Listwise Loss. In ICCV.
[12]
Loris Sauter, Ralph Gasser, Silvan Heller, Luca Rossetto, Colin Saladin, Florian Spiess, and Heiko Schuldt. 2023. Exploring Effective Interactive Text-Based Video Search in vitrivr. In MultiMedia Modeling, Duc-Tien Dang-Nguyen, Cathal Gurrin, Martha Larson, Alan F. Smeaton, Stevan Rudinac, Minh-Son Dao, Christoph Trattner, and Phoebe Chen (Eds.). Springer International Publishing, Cham, 646–651.
[13]
Konstantin Schall, Nico Hezel, Klaus Jung, and Kai Uwe Barthel. 2023. Vibro: Video Browsing with Semantic and Visual Image Embeddings. In MultiMedia Modeling, Duc-Tien Dang-Nguyen, Cathal Gurrin, Martha Larson, Alan F. Smeaton, Stevan Rudinac, Minh-Son Dao, Christoph Trattner, and Phoebe Chen (Eds.). Springer International Publishing, Cham, 665–670.
[14]
Wenhui Wang, Hangbo Bao, Li Dong, Johan Bjorck, Zhiliang Peng, Qiang Liu, Kriti Aggarwal, Owais Khan Mohammed, Saksham Singhal, Subhojit Som, and Furu Wei. 2023. Image as a foreign language: BEiT pretraining for vision and vision-language tasks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[15]
Wenhai Wang, Enze Xie, Xiaoge Song, Yuhang Zang, Wenjia Wang, Tong Lu, Gang Yu, and Chunhua Shen. 2019. Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network. CoRR abs/1908.05900 (2019). arXiv:1908.05900http://arxiv.org/abs/1908.05900
[16]
Haoyang Zhang, Ying Wang, Feras Dayoub, and Niko Sünderhauf. 2021. VarifocalNet: An IoU-aware Dense Object Detector. In CVPR.
Index Terms
- Vi-ATISO: An Effective Video Search Engine at AI Challenge HCMC 2023
Recommendations
BlazeSearch: A multimomal semantic search engine for retrieving in-video information for AI Challenge HCMC 2023
SOICT '23: Proceedings of the 12th International Symposium on Information and Communication TechnologyIn the world today, exploring information has become a critical part of modern life. As a result, search engines have shown their ability to enhance the knowledge-seeking process. However, these search engines still focus on searching for websites or ...
Comments
Information & Contributors
Information
Published In

December 2023
1058 pages
ISBN:9798400708916
DOI:10.1145/3628797
Copyright © 2023 ACM.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
Published: 07 December 2023
Check for updates
Author Tags
Qualifiers
- Research-article
- Research
- Refereed limited
Conference
SOICT 2023
SOICT 2023: The 12th International Symposium on Information and Communication Technology
December 7 - 8, 2023
Ho Chi Minh, Vietnam
Acceptance Rates
Overall Acceptance Rate 147 of 318 submissions, 46%
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 43Total Downloads
- Downloads (Last 12 months)23
- Downloads (Last 6 weeks)0
Reflects downloads up to 01 Mar 2025
Other Metrics
Citations
View Options
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign inFull Access
View options
View or Download as a PDF file.
PDFeReader
View online with eReader.
eReaderHTML Format
View this article in HTML Format.
HTML Format