research-article

An Interactive System for Multimedia Retrieval in Video Collection with Temporal Integration

Authors:
Kiet Pham Gia

The Saigon International University, Viet Nam

The Saigon International University, Viet Nam

0009-0003-6261-6177
View Profile

,
Hai Binh Tran Le

The Saigon International University, Viet Nam

The Saigon International University, Viet Nam

0009-0008-1679-4737
View Profile

,
Phi Long Nguyen Huynh

The Saigon International University, Viet Nam

The Saigon International University, Viet Nam

0009-0009-1919-6545
View Profile

,
Song Phuong Le Tran

The Saigon International University, Viet Nam

The Saigon International University, Viet Nam

0009-0003-0248-6171
View Profile

,
Long Pham Hoang

The Saigon International University, Viet Nam

The Saigon International University, Viet Nam

0009-0001-6848-7711
View Profile

,
Tri Pham Xuan

The Saigon International University, Viet Nam

The Saigon International University, Viet Nam

0009-0008-0938-3585
View Profile

,
Duong Tran Ham

The Saigon International University, Viet Nam

The Saigon International University, Viet Nam

0009-0008-2088-8023
View Profile

,
Tin Huynh Ngoc

The Saigon International University, Viet Nam

The Saigon International University, Viet Nam

0000-0002-9139-9891
View Profile

,
Kiem Hoang

The Saigon International University, Vietnam

The Saigon International University, Vietnam

0009-0007-4003-6736
View Profile

SOICT '23: Proceedings of the 12th International Symposium on Information and Communication TechnologyDecember 2023Pages 989–996https://doi.org/10.1145/3628797.3629019

Published:07 December 2023Publication History

SOICT '23: Proceedings of the 12th International Symposium on Information and Communication Technology

Pages 989–996

ABSTRACT

Multimedia retrieval in computer science is the process of obtaining text, images, videos, and audio segments, all in digital form relevant to an information need from a collection of these resources. With the ever-growing amount of data, scalable and interactive retrieval systems that can efficiently work on extensive data collections while maintaining high precision are in high demand by industries and researchers. This paper presents the Pumpkin system, an interactive multimedia retrieval system first used in The AI Challenge Ho Chi Minh City 2023, an annual video event and moment retrieval competition. The system is built and set in motion to handle the retrieval task in a video collection of considerable size and complexity by three primary methods: visual-text association search, object-based search, and audio speech instances search. Additionally, the system has an integrated temporal workflow to search for conceptually related shots in a sequential motion, which removes out-of-context while leveraging suitable results as the user inputs more details to the system. Our system also puts great emphasis on user experience by cooperating with a clean and intuitive interface design with simplified user-side functionality, allowing a more efficient process of information retrieval, whether primary or complex, in a huge collection of multimedia data.

References

2018. ITI-CERTH participation in TRECVID 2017. Zenodo. https://doi.org/10.5281/zenodo.1183440Google ScholarCross Ref
Ahmed Alateeq, Mark Roantree, and Cathal Gurrin. 2021. Voxento 2.0: A Prototype Voice-Controlled Interactive Search Engine for Lifelogs. In Proceedings of the 4th Annual on Lifelog Search Challenge (Taipei, Taiwan) (LSC ’21). Association for Computing Machinery, New York, NY, USA, 65–70. https://doi.org/10.1145/3463948.3469071Google ScholarDigital Library
Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, and Michael Auli. 2020. wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations. CoRR abs/2006.11477 (2020). arXiv:2006.11477https://arxiv.org/abs/2006.11477Google Scholar
Cathal Gurrin, Björn Þór Jónsson, Klaus Schöffmann, Duc-Tien Dang-Nguyen, Jakub Lokoč, Minh-Triet Tran, Wolfgang Hürst, Luca Rossetto, and Graham Healy. 2021. Introduction to the Fourth Annual Lifelog Search Challenge, LSC’21. In Proceedings of the 2021 International Conference on Multimedia Retrieval (Taipei, Taiwan) (ICMR ’21). Association for Computing Machinery, New York, NY, USA, 690–691. https://doi.org/10.1145/3460426.3470945Google ScholarDigital Library
Silvan Heller, Viktor Gsteiger, Werner Bailer, Cathal Gurrin, Björn Þór Jónsson, Jakub Lokoč, Andreas Leibetseder, František Mejzlík, Ladislav Peška, Luca Rossetto, Konstantin Schall, Klaus Schoeffmann, Heiko Schuldt, Florian Spiess, Ly-Duyen Tran, Lucia Vadicamo, Patrik Veselý, Stefanos Vrochidis, and Jiaxin Wu. 2022. Interactive video retrieval evaluation at a distance: comparing sixteen interactive video search systems in a remote setting at the 10th Video Browser Showdown. International Journal of Multimedia Information Retrieval 11, 1 (March 2022), 1–18.Google ScholarCross Ref
Maria Tysse Hordvik, Julie Sophie Teilstad Østby, Manoj Kesavulu, Thao-Nhu Nguyen, Tu-Khiem Le, and Duc-Tien Dang-Nguyen. 2023. LifeLens: Transforming Lifelog Search with Innovative UX/UI Design. In Proceedings of the 6th Annual ACM Lifelog Search Challenge (Thessaloniki, Greece) (LSC ’23). Association for Computing Machinery, New York, NY, USA, 1–6. https://doi.org/10.1145/3592573.3593096Google ScholarDigital Library
Tanuj Jain, Christopher Lennan, Zubin John, and Dat Tran. 2019. Imagededup. https://github.com/idealo/imagededup.Google Scholar
Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2019. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data 7, 3 (2019), 535–547.Google ScholarCross Ref
Omar Shahbaz Khan, Björn Þór Jónsson, Mathias Larsen, Liam Poulsen, Dennis C. Koelma, Stevan Rudinac, Marcel Worring, and Jan Zahálka. 2021. Exquisitor at the Video Browser Showdown 2021: Relationships Between Semantic Classifiers. In MultiMedia Modeling, Jakub Lokoč, Tomáš Skopal, Klaus Schoeffmann, Vasileios Mezaris, Xirong Li, Stefanos Vrochidis, and Ioannis Patras (Eds.). Springer International Publishing, Cham, 410–416.Google Scholar
Junnan Li, Dongxu Li, Silvio Savarese, and Steven Hoi. 2023. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597 (2023).Google Scholar
Xirong Li, Chaoxi Xu, Gang Yang, Zhineng Chen, and Jianfeng Dong. 2019. W2VV++: Fully Deep Learning for Ad-hoc Video Search. https://doi.org/10.1145/3343031.3350906Google ScholarDigital Library
Jakub Lokoč, Patrik Veselý, František Mejzlík, Gregor Kovalčík, Tomáš Souček, Luca Rossetto, Klaus Schoeffmann, Werner Bailer, Cathal Gurrin, Loris Sauter, Jaeyub Song, Stefanos Vrochidis, Jiaxin Wu, and Björn þóR Jónsson. 2021. Is the Reign of Interactive Search Eternal? Findings from the Video Browser Showdown 2020. ACM Trans. Multimedia Comput. Commun. Appl. 17, 3, Article 91 (jul 2021), 26 pages. https://doi.org/10.1145/3445031Google ScholarDigital Library
Jakub Lokoč, Gregor Kovalčík, Tomáš Souček, Jaroslav Moravec, and Přemysl Čech. 2019. VIRET: A Video Retrieval Tool for Interactive Known-item Search. 177–181. https://doi.org/10.1145/3323873.3325034Google ScholarDigital Library
Dat Quoc Nguyen and Anh Tuan Nguyen. 2020. PhoBERT: Pre-trained language models for Vietnamese. In Findings of the Association for Computational Linguistics: EMNLP 2020. 1037–1042.Google ScholarCross Ref
Dat Quoc Nguyen, Dai Quoc Nguyen, Thanh Vu, Mark Dras, and Mark Johnson. 2018. A Fast and Accurate Vietnamese Word Segmenter. In Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC 2018). 2582–2587.Google Scholar
Thao-Nhu Nguyen, Tu-Khiem Le, Van-Tu Ninh, Cathal Gurrin, Minh-Triet Tran, Thanh Binh Nguyen, Graham Healy, Annalina Caputo, and Sinead Smyth. 2023. E-LifeSeeker: An Interactive Lifelog Search Engine for LSC’23. In Proceedings of the 6th Annual ACM Lifelog Search Challenge (Thessaloniki, Greece) (LSC ’23). Association for Computing Machinery, New York, NY, USA, 13–17. https://doi.org/10.1145/3592573.3593098Google ScholarDigital Library
Thao-Nhu Nguyen, Tu-Khiem Le, Van-Tu Ninh, Minh-Triet Tran, Nguyen Thanh Binh, Graham Healy, Annalina Caputo, and Cathal Gurrin. 2021. LifeSeeker 3.0: An Interactive Lifelog Search Engine for LSC’21. In Proceedings of the 4th Annual on Lifelog Search Challenge (Taipei, Taiwan) (LSC ’21). Association for Computing Machinery, New York, NY, USA, 41–46. https://doi.org/10.1145/3463948.3469065Google ScholarDigital Library
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. arxiv:2103.00020 [cs.CV]Google Scholar
Jérôme Revaud, Matthijs Douze, Cordelia Schmid, and Hervé Jégou. 2013. Event Retrieval in Large Video Collections with Circulant Temporal Encoding. In 2013 IEEE Conference on Computer Vision and Pattern Recognition. 2459–2466. https://doi.org/10.1109/CVPR.2013.318Google ScholarDigital Library
Luca Rossetto, Ralph Gasser, Silvan Heller, Mahnaz Parian-Scherb, Loris Sauter, Florian Spiess, Heiko Schuldt, Ladislav Peška, Tomáš Souček, Miroslav Kratochvíl, František Mejzlík, Patrik Veselý, and Jakub Lokoč. 2021. On the User-Centric Comparative Remote Evaluation of Interactive Video Search Systems. IEEE MultiMedia 28, 4 (2021), 18–28. https://doi.org/10.1109/MMUL.2021.3066779Google ScholarDigital Library
Mingxing Tan and Quoc V. Le. 2020. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. arxiv:1905.11946 [cs.LG]Google Scholar
Ly-Duyen Tran, Manh-Duy Nguyen, Duc-Tien Dang-Nguyen, Silvan Heller, Florian Spiess, Jakub Lokoč, Ladislav Peška, Thao-Nhu Nguyen, Omar Shahbaz Khan, Aaron Duane, Björn þór Jónsson, Luca Rossetto, An-Zi Yen, Ahmed Alateeq, Naushad Alam, Minh-Triet Tran, Graham Healy, Klaus Schoeffmann, and Cathal Gurrin. 2023. Comparing Interactive Retrieval Approaches at the Lifelog Search Challenge 2021. IEEE Access 11 (2023), 30982–30995. https://doi.org/10.1109/ACCESS.2023.3248284Google ScholarCross Ref
Shangliang Xu, Xinxin Wang, Wenyu Lv, Qinyao Chang, Cheng Cui, Kaipeng Deng, Guanzhong Wang, Qingqing Dang, Shengyu Wei, Yuning Du, and Baohua Lai. 2022. PP-YOLOE: An evolved version of YOLO. arxiv:2203.16250 [cs.CV]Google Scholar

Index Terms

An Interactive System for Multimedia Retrieval in Video Collection with Temporal Integration
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. Interactive systems and tools
2. Information systems
  1. Information retrieval

Recommendations

Improving video event retrieval by user feedback

In content based video retrieval videos are often indexed with semantic labels (concepts) using pre-trained classifiers. These pre-trained classifiers (concept detectors), are not perfect, and thus the labels are noisy. Additionally, the amount of pre-...
Read More
News Event Retrieval from Large Video Collection in Ho Chi Minh City AI Challenge 2023
SOICT '23: Proceedings of the 12th International Symposium on Information and Communication Technology

Event retrieval from large collections of TV news videos is crucial for efficient information access, enabling researchers, journalists, and the general public to quickly locate and analyze relevant content amidst the vast sea of news coverage, ...
Read More
NewsInsight: A Comprehensive Video Event Retrieval System with Spatial Insights and Query Assistance
SOICT '23: Proceedings of the 12th International Symposium on Information and Communication Technology

Video event retrieval is the task of finding videos that are relevant to a given query. It is a challenging problem because videos are typically much larger than images, and they can contain a variety of different objects and scenes. However, there are ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

SOICT '23: Proceedings of the 12th International Symposium on Information and Communication Technology
December 2023
1058 pages
ISBN:9798400708916
DOI:10.1145/3628797

Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 December 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
information system
interactive retrieval
temporal search
video event retrieval
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate147of318submissions,46%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 17
  Total Downloads
- Downloads (Last 12 months)17
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

An Interactive System for Multimedia Retrieval in Video Collection with Temporal Integration

SOICT '23: Proceedings of the 12th International Symposium on Information and Communication Technology

ABSTRACT

References

Cited By

Index Terms

Recommendations

Improving video event retrieval by user feedback

News Event Retrieval from Large Video Collection in Ho Chi Minh City AI Challenge 2023

NewsInsight: A Comprehensive Video Event Retrieval System with Spatial Insights and Query Assistance

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

An Interactive System for Multimedia Retrieval in Video Collection with Temporal Integration

SOICT '23: Proceedings of the 12th International Symposium on Information and Communication Technology

ABSTRACT

References

Cited By

Index Terms

Recommendations

Improving video event retrieval by user feedback

News Event Retrieval from Large Video Collection in Ho Chi Minh City AI Challenge 2023

NewsInsight: A Comprehensive Video Event Retrieval System with Spatial Insights and Query Assistance

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media