skip to main content
10.1145/3628797.3628942acmotherconferencesArticle/Chapter ViewAbstractPublication PagessoictConference Proceedingsconference-collections
research-article

BlazeSearch: A multimomal semantic search engine for retrieving in-video information for AI Challenge HCMC 2023

Published: 07 December 2023 Publication History

Abstract

In the world today, exploring information has become a critical part of modern life. As a result, search engines have shown their ability to enhance the knowledge-seeking process. However, these search engines still focus on searching for websites or images. The capacity to find information in videos is extremely needed to experiment and study more in order to improve the power of search engines. In this study, we investigate the potentiality of in-video information search engines by introducing BlazeSearch, a multimodal search engine designed to search frames of video with simple input text. By leveraging the OpenCLIP model, which is superior for the image-text retrieval task, our search engine can be guaranteed reliability and accuracy. Furthermore, we optimize the searching speed and provide an easy-to-use, fully functional user interface for BlazeSearch, which can help users have a pleasant experience.

References

[1]
Naushad Alam, Yvette Graham, and Cathal Gurrin. 2023. Memento 3.0: An Enhanced Lifelog Search Engine for LSC’23. In Proceedings of the 6th Annual ACM Lifelog Search Challenge. 41–46.
[2]
Ahmed Alateeq, Mark Roantree, and Cathal Gurrin. 2023. Voxento 4.0: A More Flexible Visualisation and Control for Lifelogs. In Proceedings of the 6th Annual ACM Lifelog Search Challenge. 7–12.
[3]
Buckminster Fuller. [n. d.]. Dymaxion Chronofile. https://en.wikipedia.org/wiki/Dymaxion_Chronofile
[4]
Tran Ly Duyen, Nguyen Manh Duy, Nguyen Thanh Binh, Hyowon Lee, and Cathal Gurrin. 2020. Myscéal-an experimental interactive lifelog retrieval system for LSC’20. In Proc. ACM Workshop on Lifelog Search Challenge (LSC@ ICMR 2020). ACM, Dublin, Irelend.
[5]
Samir Yitzhak Gadre, Gabriel Ilharco, Alex Fang, Jonathan Hayase, Georgios Smyrnis, Thao Nguyen, Ryan Marten, Mitchell Wortsman, Dhruba Ghosh, Jieyu Zhang, Eyal Orgad, Rahim Entezari, Giannis Daras, Sarah Pratt, Vivek Ramanujan, Yonatan Bitton, Kalyani Marathe, Stephen Mussmann, Richard Vencu, Mehdi Cherti, Ranjay Krishna, Pang Wei Koh, Olga Saukh, Alexander Ratner, Shuran Song, Hannaneh Hajishirzi, Ali Farhadi, Romain Beaumont, Sewoong Oh, Alex Dimakis, Jenia Jitsev, Yair Carmon, Vaishaal Shankar, and Ludwig Schmidt. 2023. DataComp: In search of the next generation of multimodal datasets. arxiv:2304.14108 [cs.CV]
[6]
Silvan Heller, Ralph Gasser, Mahnaz Parian-Scherb, Sanja Popovic, Luca Rossetto, Loris Sauter, Florian Spiess, and Heiko Schuldt. 2021. Interactive multimodal lifelog retrieval with Vitrivr at LSC 2021. In Proceedings of the 4th Annual on Lifelog Search Challenge. 35–39.
[7]
Gabriel Ilharco, Mitchell Wortsman, Ross Wightman, Cade Gordon, Nicholas Carlini, Rohan Taori, Achal Dave, Vaishaal Shankar, Hongseok Namkoong, John Miller, Hannaneh Hajishirzi, Ali Farhadi, and Ludwig Schmidt. 2021. OpenCLIP. https://doi.org/10.5281/zenodo.5143773 If you use this software, please cite it as below.
[8]
Yu. A. Malkov and D. A. Yashunin. 2018. Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs. arxiv:1603.09320 [cs.DS]
[9]
Microsoft. 2001. MyLifeBits. https://en.wikipedia.org/wiki/MyLifeBits
[10]
Chinh Ngo, Trieu H. Trinh, Long Phan, Hieu Tran, Tai Dang, Hieu Nguyen, Minh Nguyen, and Minh-Thang Luong. 2022. MTet: Multi-domain Translation for English and Vietnamese. https://doi.org/10.48550/ARXIV.2210.05610
[11]
Thao-Nhu Nguyen, Tu-Khiem Le, Van-Tu Ninh, Cathal Gurrin, Minh-Triet Tran, Thanh Binh Nguyen, Graham Healy, Annalina Caputo, and Sinead Smyth. 2023. E-LifeSeeker: An Interactive Lifelog Search Engine for LSC’23. In Proceedings of the 6th Annual ACM Lifelog Search Challenge. 13–17.
[12]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. arxiv:2103.00020 [cs.CV]
[13]
Ricardo Ribiero, Alina Trifan, and Antonio JR Neves. 2022. MEMORIA: A Memory Enhancement and MOment RetrIeval Application for LSC 2022. In Proceedings of the 5th Annual on Lifelog Search Challenge. 8–13.
[14]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV) 115, 3 (2015), 211–252. https://doi.org/10.1007/s11263-015-0816-y
[15]
Florian Spiess and Heiko Schuldt. 2022. Multimodal Interactive Lifelog Retrieval with vitrivr-VR. In Proceedings of the 5th Annual on Lifelog Search Challenge. 38–42.
[16]
Ly-Duyen Tran, Manh-Duy Nguyen, Nguyen Thanh Binh, Hyowon Lee, and Cathal Gurrin. 2021. Myscéal 2.0: A Revised Experimental Interactive Lifelog Retrieval System for LSC’21. In Proceedings of the 4th Annual on Lifelog Search Challenge. 11–16.
[17]
Quang-Linh Tran, Ly-Duyen Tran, Binh Nguyen, and Cathal Gurrin. 2023. MemoriEase: An Interactive Lifelog Retrieval System for LSC’23. In Proceedings of the 6th Annual ACM Lifelog Search Challenge. 30–35.
[18]
Vannevar Bush. 1945. Memex. https://en.wikipedia.org/wiki/Memex
[19]
Ash Vardanian. 2022. USearch by Unum Cloud. https://doi.org/10.5281/zenodo.7949416

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
SOICT '23: Proceedings of the 12th International Symposium on Information and Communication Technology
December 2023
1058 pages
ISBN:9798400708916
DOI:10.1145/3628797
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 December 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. CLIP model
  2. image-text retrieval
  3. in-video information search
  4. lifelog events
  5. search engine
  6. searching speed optimization
  7. user experience enhancement

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

SOICT 2023

Acceptance Rates

Overall Acceptance Rate 147 of 318 submissions, 46%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 44
    Total Downloads
  • Downloads (Last 12 months)26
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media