skip to main content
10.1145/3628797.3629019acmotherconferencesArticle/Chapter ViewAbstractPublication PagessoictConference Proceedingsconference-collections
research-article

An Interactive System for Multimedia Retrieval in Video Collection with Temporal Integration

Published:07 December 2023Publication History

ABSTRACT

Multimedia retrieval in computer science is the process of obtaining text, images, videos, and audio segments, all in digital form relevant to an information need from a collection of these resources. With the ever-growing amount of data, scalable and interactive retrieval systems that can efficiently work on extensive data collections while maintaining high precision are in high demand by industries and researchers. This paper presents the Pumpkin system, an interactive multimedia retrieval system first used in The AI Challenge Ho Chi Minh City 2023, an annual video event and moment retrieval competition. The system is built and set in motion to handle the retrieval task in a video collection of considerable size and complexity by three primary methods: visual-text association search, object-based search, and audio speech instances search. Additionally, the system has an integrated temporal workflow to search for conceptually related shots in a sequential motion, which removes out-of-context while leveraging suitable results as the user inputs more details to the system. Our system also puts great emphasis on user experience by cooperating with a clean and intuitive interface design with simplified user-side functionality, allowing a more efficient process of information retrieval, whether primary or complex, in a huge collection of multimedia data.

References

  1. 2018. ITI-CERTH participation in TRECVID 2017. Zenodo. https://doi.org/10.5281/zenodo.1183440Google ScholarGoogle ScholarCross RefCross Ref
  2. Ahmed Alateeq, Mark Roantree, and Cathal Gurrin. 2021. Voxento 2.0: A Prototype Voice-Controlled Interactive Search Engine for Lifelogs. In Proceedings of the 4th Annual on Lifelog Search Challenge (Taipei, Taiwan) (LSC ’21). Association for Computing Machinery, New York, NY, USA, 65–70. https://doi.org/10.1145/3463948.3469071Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, and Michael Auli. 2020. wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations. CoRR abs/2006.11477 (2020). arXiv:2006.11477https://arxiv.org/abs/2006.11477Google ScholarGoogle Scholar
  4. Cathal Gurrin, Björn Þór Jónsson, Klaus Schöffmann, Duc-Tien Dang-Nguyen, Jakub Lokoč, Minh-Triet Tran, Wolfgang Hürst, Luca Rossetto, and Graham Healy. 2021. Introduction to the Fourth Annual Lifelog Search Challenge, LSC’21. In Proceedings of the 2021 International Conference on Multimedia Retrieval (Taipei, Taiwan) (ICMR ’21). Association for Computing Machinery, New York, NY, USA, 690–691. https://doi.org/10.1145/3460426.3470945Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Silvan Heller, Viktor Gsteiger, Werner Bailer, Cathal Gurrin, Björn Þór Jónsson, Jakub Lokoč, Andreas Leibetseder, František Mejzlík, Ladislav Peška, Luca Rossetto, Konstantin Schall, Klaus Schoeffmann, Heiko Schuldt, Florian Spiess, Ly-Duyen Tran, Lucia Vadicamo, Patrik Veselý, Stefanos Vrochidis, and Jiaxin Wu. 2022. Interactive video retrieval evaluation at a distance: comparing sixteen interactive video search systems in a remote setting at the 10th Video Browser Showdown. International Journal of Multimedia Information Retrieval 11, 1 (March 2022), 1–18.Google ScholarGoogle ScholarCross RefCross Ref
  6. Maria Tysse Hordvik, Julie Sophie Teilstad Østby, Manoj Kesavulu, Thao-Nhu Nguyen, Tu-Khiem Le, and Duc-Tien Dang-Nguyen. 2023. LifeLens: Transforming Lifelog Search with Innovative UX/UI Design. In Proceedings of the 6th Annual ACM Lifelog Search Challenge (Thessaloniki, Greece) (LSC ’23). Association for Computing Machinery, New York, NY, USA, 1–6. https://doi.org/10.1145/3592573.3593096Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Tanuj Jain, Christopher Lennan, Zubin John, and Dat Tran. 2019. Imagededup. https://github.com/idealo/imagededup.Google ScholarGoogle Scholar
  8. Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2019. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data 7, 3 (2019), 535–547.Google ScholarGoogle ScholarCross RefCross Ref
  9. Omar Shahbaz Khan, Björn Þór Jónsson, Mathias Larsen, Liam Poulsen, Dennis C. Koelma, Stevan Rudinac, Marcel Worring, and Jan Zahálka. 2021. Exquisitor at the Video Browser Showdown 2021: Relationships Between Semantic Classifiers. In MultiMedia Modeling, Jakub Lokoč, Tomáš Skopal, Klaus Schoeffmann, Vasileios Mezaris, Xirong Li, Stefanos Vrochidis, and Ioannis Patras (Eds.). Springer International Publishing, Cham, 410–416.Google ScholarGoogle Scholar
  10. Junnan Li, Dongxu Li, Silvio Savarese, and Steven Hoi. 2023. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597 (2023).Google ScholarGoogle Scholar
  11. Xirong Li, Chaoxi Xu, Gang Yang, Zhineng Chen, and Jianfeng Dong. 2019. W2VV++: Fully Deep Learning for Ad-hoc Video Search. https://doi.org/10.1145/3343031.3350906Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Jakub Lokoč, Patrik Veselý, František Mejzlík, Gregor Kovalčík, Tomáš Souček, Luca Rossetto, Klaus Schoeffmann, Werner Bailer, Cathal Gurrin, Loris Sauter, Jaeyub Song, Stefanos Vrochidis, Jiaxin Wu, and Björn þóR Jónsson. 2021. Is the Reign of Interactive Search Eternal? Findings from the Video Browser Showdown 2020. ACM Trans. Multimedia Comput. Commun. Appl. 17, 3, Article 91 (jul 2021), 26 pages. https://doi.org/10.1145/3445031Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Jakub Lokoč, Gregor Kovalčík, Tomáš Souček, Jaroslav Moravec, and Přemysl Čech. 2019. VIRET: A Video Retrieval Tool for Interactive Known-item Search. 177–181. https://doi.org/10.1145/3323873.3325034Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Dat Quoc Nguyen and Anh Tuan Nguyen. 2020. PhoBERT: Pre-trained language models for Vietnamese. In Findings of the Association for Computational Linguistics: EMNLP 2020. 1037–1042.Google ScholarGoogle ScholarCross RefCross Ref
  15. Dat Quoc Nguyen, Dai Quoc Nguyen, Thanh Vu, Mark Dras, and Mark Johnson. 2018. A Fast and Accurate Vietnamese Word Segmenter. In Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC 2018). 2582–2587.Google ScholarGoogle Scholar
  16. Thao-Nhu Nguyen, Tu-Khiem Le, Van-Tu Ninh, Cathal Gurrin, Minh-Triet Tran, Thanh Binh Nguyen, Graham Healy, Annalina Caputo, and Sinead Smyth. 2023. E-LifeSeeker: An Interactive Lifelog Search Engine for LSC’23. In Proceedings of the 6th Annual ACM Lifelog Search Challenge (Thessaloniki, Greece) (LSC ’23). Association for Computing Machinery, New York, NY, USA, 13–17. https://doi.org/10.1145/3592573.3593098Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Thao-Nhu Nguyen, Tu-Khiem Le, Van-Tu Ninh, Minh-Triet Tran, Nguyen Thanh Binh, Graham Healy, Annalina Caputo, and Cathal Gurrin. 2021. LifeSeeker 3.0: An Interactive Lifelog Search Engine for LSC’21. In Proceedings of the 4th Annual on Lifelog Search Challenge (Taipei, Taiwan) (LSC ’21). Association for Computing Machinery, New York, NY, USA, 41–46. https://doi.org/10.1145/3463948.3469065Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. arxiv:2103.00020 [cs.CV]Google ScholarGoogle Scholar
  19. Jérôme Revaud, Matthijs Douze, Cordelia Schmid, and Hervé Jégou. 2013. Event Retrieval in Large Video Collections with Circulant Temporal Encoding. In 2013 IEEE Conference on Computer Vision and Pattern Recognition. 2459–2466. https://doi.org/10.1109/CVPR.2013.318Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Luca Rossetto, Ralph Gasser, Silvan Heller, Mahnaz Parian-Scherb, Loris Sauter, Florian Spiess, Heiko Schuldt, Ladislav Peška, Tomáš Souček, Miroslav Kratochvíl, František Mejzlík, Patrik Veselý, and Jakub Lokoč. 2021. On the User-Centric Comparative Remote Evaluation of Interactive Video Search Systems. IEEE MultiMedia 28, 4 (2021), 18–28. https://doi.org/10.1109/MMUL.2021.3066779Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Mingxing Tan and Quoc V. Le. 2020. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. arxiv:1905.11946 [cs.LG]Google ScholarGoogle Scholar
  22. Ly-Duyen Tran, Manh-Duy Nguyen, Duc-Tien Dang-Nguyen, Silvan Heller, Florian Spiess, Jakub Lokoč, Ladislav Peška, Thao-Nhu Nguyen, Omar Shahbaz Khan, Aaron Duane, Björn þór Jónsson, Luca Rossetto, An-Zi Yen, Ahmed Alateeq, Naushad Alam, Minh-Triet Tran, Graham Healy, Klaus Schoeffmann, and Cathal Gurrin. 2023. Comparing Interactive Retrieval Approaches at the Lifelog Search Challenge 2021. IEEE Access 11 (2023), 30982–30995. https://doi.org/10.1109/ACCESS.2023.3248284Google ScholarGoogle ScholarCross RefCross Ref
  23. Shangliang Xu, Xinxin Wang, Wenyu Lv, Qinyao Chang, Cheng Cui, Kaipeng Deng, Guanzhong Wang, Qingqing Dang, Shengyu Wei, Yuning Du, and Baohua Lai. 2022. PP-YOLOE: An evolved version of YOLO. arxiv:2203.16250 [cs.CV]Google ScholarGoogle Scholar

Index Terms

  1. An Interactive System for Multimedia Retrieval in Video Collection with Temporal Integration

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        SOICT '23: Proceedings of the 12th International Symposium on Information and Communication Technology
        December 2023
        1058 pages
        ISBN:9798400708916
        DOI:10.1145/3628797

        Copyright © 2023 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 7 December 2023

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

        Acceptance Rates

        Overall Acceptance Rate147of318submissions,46%
      • Article Metrics

        • Downloads (Last 12 months)17
        • Downloads (Last 6 weeks)4

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format